git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Josh Steadmon <steadmon@google.com>, git@vger.kernel.org
Cc: stolee@gmail.com
Subject: Re: [RFC PATCH v2 2/2] trace2: don't overload target directories
Date: Mon, 5 Aug 2019 11:34:11 -0400	[thread overview]
Message-ID: <b1009a3e-92c6-248c-8d15-f4bb5cc71a11@jeffhostetler.com> (raw)
In-Reply-To: <a779e272df958702c0df06ab58f1f6d6f8086a30.1564771000.git.steadmon@google.com>



On 8/2/2019 6:02 PM, Josh Steadmon wrote:
> trace2 can write files into a target directory. With heavy usage, this
> directory can fill up with files, causing difficulty for
> trace-processing systems.
> 
> This patch adds a config option (trace2.maxFiles) to set a maximum
> number of files that trace2 will write to a target directory. The
> following behavior is enabled when the maxFiles is set to a positive
> integer:
>    When trace2 would write a file to a target directory, first check
>    whether or not the directory is overloaded. A directory is overloaded
>    if there is a sentinel file declaring an overload, or if the number of
>    files exceeds trace2.maxFiles. If the latter, create a sentinel file
>    to speed up later overload checks.
> 
> The assumption is that a separate trace-processing system is dealing
> with the generated traces; once it processes and removes the sentinel
> file, it should be safe to generate new trace files again.
> 
> The default value for trace2.maxFiles is zero, which disables the
> overload check.
> 
> The config can also be overridden with a new environment variable:
> GIT_TRACE2_MAX_FILES.
> 
> Potential future work:
> * Write a message into the sentinel file (should match the requested
>    trace2 output format).
> * Add a performance test to make sure that contention between multiple
>    processes all writing to the same target directory does not become an
>    issue.


This looks much nicer than the V1 version.  Having it be a
real feature rather than a test feature helps.

I don't see anything wrong with this.  I do worry about the
overhead a bit.  If you really have that many files in the
target directory, having every command count them at startup
might be an issue.

As an alternative, you might consider doing something like
this:

[] have an option to make the target directory path expand to
    something like "<path>/yyyymmdd/" and create the per-process
    files as "<path>/yyyymmdd/<sid>".

If there are 0, 1 or 2 directories, logging is enabled.
We assume that the post-processor is keeping up and all is well.
We need to allow 2 so that we continue to log around midnight.

If there are 3 or more directories, logging is disabled.
The post-processor is more than 24 hours behind for whatever
reason.  We assume here that the post-processor will process
and delete the oldest-named directory, so it is a valid measure
of the backlog.

I suggest "yyyymmdd" here for simplicity in this discussion
as daily log rotation is common.  If that's still overloading,
you could make it a longer prefix of the <sid>.  And include
the hour, for example.

I suggest 3 as the cutoff lower bound, because we need to allow
2 for midnight rotation.  But you may want to increase it to
allow for someone to be offline for a long weekend, for example.

Anyway, this is just a suggestion.  It would give you the
throttling, but without the need for every command to count
the contents of the target directory.

And it would still allow your post-processor to operate in
near real-time on the contents of the current day's target
directory or to hang back if that causes too much contention.

Feel free to ignore this :-)

Jeff


  reply	other threads:[~2019-08-05 15:34 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-29 22:20 [RFC PATCH] trace2: don't overload target directories Josh Steadmon
2019-07-30 13:29 ` Derrick Stolee
2019-07-30 21:52   ` Josh Steadmon
2019-07-30 16:46 ` Jeff Hostetler
2019-07-30 22:01   ` Josh Steadmon
2019-07-30 22:02   ` Josh Steadmon
2019-07-30 18:00 ` Jeff Hostetler
2019-07-30 22:08   ` Josh Steadmon
2019-08-02 22:02 ` [RFC PATCH v2 0/2] " Josh Steadmon
2019-08-02 22:02   ` [RFC PATCH v2 1/2] docs: mention trace2 target-dir mode in git-config Josh Steadmon
2019-08-02 22:02   ` [RFC PATCH v2 2/2] trace2: don't overload target directories Josh Steadmon
2019-08-05 15:34     ` Jeff Hostetler [this message]
2019-08-05 18:17       ` Josh Steadmon
2019-08-05 18:01     ` SZEDER Gábor
2019-08-05 18:09       ` Josh Steadmon
2019-09-14  0:25 ` [RFC PATCH v3 0/3] " Josh Steadmon
2019-09-14  0:25   ` [RFC PATCH v3 1/3] docs: mention trace2 target-dir mode in git-config Josh Steadmon
2019-09-14  0:25   ` [RFC PATCH v3 2/3] trace2: don't overload target directories Josh Steadmon
2019-09-14  0:26   ` [RFC PATCH v3 3/3] trace2: write overload message to sentinel files Josh Steadmon
2019-09-16 12:07     ` Derrick Stolee
2019-09-16 14:11       ` Jeff Hostetler
2019-09-16 18:20         ` Josh Steadmon
2019-09-19 18:23           ` Jeff Hostetler
2019-09-19 22:47             ` Josh Steadmon
2019-09-20 15:59               ` Jeff Hostetler
2019-09-16 18:07       ` Josh Steadmon
2019-10-03 23:32 ` [PATCH v4 0/4] trace2: don't overload target directories Josh Steadmon
2019-10-03 23:32   ` [PATCH v4 1/4] docs: mention trace2 target-dir mode in git-config Josh Steadmon
2019-10-03 23:32   ` [PATCH v4 2/4] docs: clarify trace2 version invariants Josh Steadmon
2019-10-03 23:32   ` [PATCH v4 3/4] trace2: don't overload target directories Josh Steadmon
2019-10-04  0:25     ` Junio C Hamano
2019-10-04 21:57       ` Josh Steadmon
2019-10-04  9:12     ` Johannes Schindelin
2019-10-04 22:05       ` Josh Steadmon
2019-10-03 23:32   ` [PATCH v4 4/4] trace2: write overload message to sentinel files Josh Steadmon
2019-10-04 22:08 ` [PATCH v5 0/4] trace2: discard new traces if the target directory contains too many files Josh Steadmon
2019-10-04 22:08   ` [PATCH v5 1/4] docs: mention trace2 target-dir mode in git-config Josh Steadmon
2019-10-04 22:08   ` [PATCH v5 2/4] docs: clarify trace2 version invariants Josh Steadmon
2019-10-04 22:08   ` [PATCH v5 3/4] trace2: discard new traces if target directory has too many files Josh Steadmon
2019-10-04 22:08   ` [PATCH v5 4/4] trace2: write discard message to sentinel files Josh Steadmon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b1009a3e-92c6-248c-8d15-f4bb5cc71a11@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=steadmon@google.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).