git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, sandals@crustytoothpaste.net,
	steadmon@google.com, jrnieder@gmail.com, peff@peff.net,
	congdanhqx@gmail.com, phillip.wood123@gmail.com,
	emilyshaffer@google.com, sluongng@gmail.com,
	jonathantanmy@google.com,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v2 3/7] maintenance: add --scheduled option and config
Date: Wed, 26 Aug 2020 11:30:29 -0400	[thread overview]
Message-ID: <bd4e18b7-6265-73e7-bc1a-a7d647eafd0a@gmail.com> (raw)
In-Reply-To: <xmqqd03ez8pp.fsf@gitster.c.googlers.com>

On 8/25/2020 6:01 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> A user may want to run certain maintenance tasks based on frequency, not
>> conditions given in the repository. For example, the user may want to
>> perform a 'prefetch' task every hour, or 'gc' task every day. To assist,
>> update the 'git maintenance run --scheduled' command to check the config
>> for the last run of that task and add a number of seconds. The task
>> would then run only if the current time is beyond that minimum
>> timestamp.
>>
>> Add a '--scheduled' option to 'git maintenance run' to only run tasks
>> that have had enough time pass since their last run. This is done for
>> each enabled task by checking if the current timestamp is at least as
>> large as the sum of 'maintenance.<task>.lastRun' and
>> 'maintenance.<task>.schedule' in the Git config. This second value is
>> new to this commit, storing a number of seconds intended between runs.
>>
>> A user could then set up an hourly maintenance run with the following
>> cron table:
>>
>>   0 * * * * git -C <repo> maintenance run --scheduled
> 
> The scheme has one obvious drawback.  An hourly crontab entry means
> your maintenance.*.schedule that is finer grained than an hour
> increment will not run as expected.  You'd need to take all the
> schedule intervals and take their GCD to come up with the frequency
> of the single crontab entry.  

My intention for the *.schedule is that it is not an _exact_ frequency,
but instead a lower bound on the frequency. That can be shelved for now
as we discuss this setup:

> Wouldn't it make more sense to have N crontab entries for N tasks
> you want to run periodically, each with their own frequency
> controlled by crontab?  That way, you do not need to maintain
> maintenance.*.schedule configuration variables and the --scheduled
> option.  It might make maintenance.*.lastrun timestamps unneeded,
> which would be an added plus to simplify the system quite
> drastically.  Most importantly, that would be the way crontab users
> are most used to in order to schedule their periodical jobs, so it
> is one less thing to learn.

I had briefly considered setting up crontab entries for each task
(and possibly each repo) but ended up with these complications:

 1. Maintenance frequency differs by task, so we need to split the
    crontab by task. But we can't just split everything because we
    do not want multiple tasks running at the same time on one
    repository. We would need to group the tasks and have one entry
    saying "git maintenance run --task=<task1> --task=<task2> ..."
    for all tasks in the group.

 2. Different repositories might want different tasks at different
    frequencies, so we might need to split the crontab by repository.
    Again, we likely want to group repositories by these frequencies
    because a user could have 100 registered repositories and we don't
    really want to launch 100 parallel processes.

 3. If we want to stop maintenance, then restart it, we need to
    clear the crontab and repopulate it, which would require iterating
    through all "registered" repositories to read their config for
    frequencies.

 4. On macOS, editing the crontab doesn't require "sudo" but it _does_
    create a pop-up(!) to get permission from the user. It would be
    good to minimize how often we edit the crontab and instead use
    config edits to change frequencies.

With these things in mind, here is a suggested alternative design:

Let users specify a schedule frequency among this list: hourly, daily,
weekly, monthly. We then set the following* crontab:

	0 * * * * git for-each-repo --config=maintenance.repos maintenance run --scheduled=hourly
	0 0 * * * git for-each-repo --config=maintenance.repos maintenance run --scheduled=daily
	0 0 * * 0 git for-each-repo --config=maintenance.repos maintenance run --scheduled=weekly
	0 0 0 * * git for-each-repo --config=maintenance.repos maintenance run --scheduled=monthly

*Of course, there is some care around "$path/git --exec-path=$path"
that I drop for ease here.

Then, "git maintenance (start|stop)" can be just as simple as we have
now: write a fixed schedule every time.

The problem here is that cron will launch these processes in parallel,
and then our object-database lock will cause some to fail! If anyone
knows a simple way to tell cron "run hourly _except_ not at midnight"
then we could let the "daily" schedule also run the "hourly" jobs, for
instance. Hopefully that pattern could be extended to the weekly and
monthly collisions.

Alternatively, we could run every hour and then interpret from config
if the current "hour" matches one of the schedules ourselves. So, the
crontab would be this simple:

	0 * * * * git for-each-repo --config=maintenance.repos maintenance run --scheduled

and then we would internally decide "is this the midnight hour?" and
"is this the first day of the week?" and "is this the first day of the
month?" to detect if we should run the daily/weekly/monthly tasks. While
it adds more time-awareness into Git, it does avoid the parallel task
collisions. There are some concerns here related to long-running tasks
delaying sequential runs of "git -C <repo> maintenance run --scheduled"
causing the "is this the midnight hour?" queries to fail and having
nightly/weekly/monthly maintenance be skipped accidentally. This
motivates the *.lastRun config giving us some guarantee of _eventually_
running the tasks, just _not too frequently_.

I hope this launches a good discussion to help us find a good cron
schedule strategy. After we land on a suitable strategy, I'll summarize
all of these subtleties in the commit message for posterity.

Hopefully, the current way that I integrate with crontab and test that
integration (in PATCH 6/7) could also be reviewed in parallel with this
discussion. I'm very curious to see how that could be improved.

Thanks,
-Stolee

  reply	other threads:[~2020-08-26 15:30 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-19 17:16 [PATCH 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
2020-08-19 17:16 ` [PATCH 1/7] maintenance: optionally skip --auto process Derrick Stolee via GitGitGadget
2020-08-20  2:06   ` Đoàn Trần Công Danh
2020-08-20 12:12     ` Derrick Stolee
2020-08-19 17:16 ` [PATCH 3/7] maintenance: add --scheduled option and config Derrick Stolee via GitGitGadget
2020-08-20 14:51   ` Đoàn Trần Công Danh
2020-08-24 14:03     ` Derrick Stolee
2020-08-19 17:16 ` [PATCH 4/7] for-each-repo: run subcommands on configured repos Derrick Stolee via GitGitGadget
2020-08-20 15:00   ` Đoàn Trần Công Danh
2020-08-19 17:16 ` [PATCH 5/7] maintenance: add [un]register subcommands Derrick Stolee via GitGitGadget
2020-08-19 17:16 ` [PATCH 6/7] maintenance: add start/stop subcommands Derrick Stolee via GitGitGadget
2020-08-19 17:16 ` [PATCH 7/7] maintenance: recommended schedule in register/start Derrick Stolee via GitGitGadget
     [not found] ` <bdc27fa28ee70222ed3c7c9863746ace8ea835e4.1597857409.git.gitgitgadget@gmail.com>
2020-08-20 14:34   ` [PATCH 2/7] maintenance: store the "last run" time in config Đoàn Trần Công Danh
2020-08-25 18:39 ` [PATCH v2 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
2020-08-25 18:39   ` [PATCH v2 1/7] maintenance: optionally skip --auto process Derrick Stolee via GitGitGadget
2020-08-25 21:44     ` Junio C Hamano
2020-08-26 12:29       ` Derrick Stolee
2020-08-26 16:57         ` Junio C Hamano
2020-08-25 18:39   ` [PATCH v2 2/7] maintenance: store the "last run" time in config Derrick Stolee via GitGitGadget
2020-08-25 21:52     ` Junio C Hamano
2020-08-26 13:34       ` Derrick Stolee
2020-08-26 17:03         ` Junio C Hamano
2020-08-27 13:02           ` Derrick Stolee
2020-08-25 18:40   ` [PATCH v2 3/7] maintenance: add --scheduled option and config Derrick Stolee via GitGitGadget
2020-08-25 22:01     ` Junio C Hamano
2020-08-26 15:30       ` Derrick Stolee [this message]
2020-08-27 15:47         ` Derrick Stolee
2020-08-25 18:40   ` [PATCH v2 4/7] for-each-repo: run subcommands on configured repos Derrick Stolee via GitGitGadget
2020-08-25 22:19     ` Junio C Hamano
2020-08-26 16:03       ` Derrick Stolee
2020-08-25 18:40   ` [PATCH v2 5/7] maintenance: add [un]register subcommands Derrick Stolee via GitGitGadget
2020-08-25 18:40   ` [PATCH v2 6/7] maintenance: add start/stop subcommands Derrick Stolee via GitGitGadget
2020-08-25 18:40   ` [PATCH v2 7/7] maintenance: recommended schedule in register/start Derrick Stolee via GitGitGadget
2020-08-28 15:45   ` [PATCH v3 0/6] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
2020-08-28 15:45     ` [PATCH v3 1/6] maintenance: optionally skip --auto process Derrick Stolee via GitGitGadget
2020-08-28 15:45     ` [PATCH v3 2/6] maintenance: add --schedule option and config Derrick Stolee via GitGitGadget
2020-08-28 15:45     ` [PATCH v3 3/6] for-each-repo: run subcommands on configured repos Derrick Stolee via GitGitGadget
2020-08-28 15:45     ` [PATCH v3 4/6] maintenance: add [un]register subcommands Derrick Stolee via GitGitGadget
2020-08-28 15:45     ` [PATCH v3 5/6] maintenance: add start/stop subcommands Derrick Stolee via GitGitGadget
2020-08-28 15:45     ` [PATCH v3 6/6] maintenance: recommended schedule in register/start Derrick Stolee via GitGitGadget
2020-08-26 12:42 ` [PATCH 0/7] [RFC] Maintenance III: background maintenance Michal Suchánek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bd4e18b7-6265-73e7-bc1a-a7d647eafd0a@gmail.com \
    --to=stolee@gmail.com \
    --cc=congdanhqx@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=phillip.wood123@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    --cc=sluongng@gmail.com \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).