git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
From: Jeff King <peff@peff.net>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Ævar Arnfjörð Bjarmason <avarab@gmail.com>,
	Johannes Sixt <j6t@kdbg.org>,
	Thomas Braun <thomas.braun@virtuell-zuhause.de>,
	Jeff Hostetler <git@jeffhostetler.com>,
	Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [RFC PATCH v1] telemetry design overview (part 1)
Date: Sat, 9 Jun 2018 02:51:32 -0400
Message-ID: <20180609065132.GD30224@sigill.intra.peff.net> (raw)
In-Reply-To: <CACsJy8BPhzs5M4peHN2HczmDxGmAuKZ0corzT66i+rJ2UQRTHQ@mail.gmail.com>

On Sat, Jun 09, 2018 at 07:03:53AM +0200, Duy Nguyen wrote:

> > > Am 08.06.2018 um 18:00 schrieb Thomas Braun:
> > >> I for my part would much rather prefer that to be a compile time
> > >> option so that I don't need to check on every git update on windows
> > >> if  this is now enabled or not.
> > >
> > > This exactly my concern, too! A compile-time option may make it a good
> > > deal less worrisome.
> >
> > Can you elaborate on how someone who can maintain inject malicious code
> > into your git package + config would be thwarted by this being some
> > compile-time option, wouldn't they just compile it in?
> 
> Look at this from a different angle. This is driven by the needs to
> collect telemetry in _controlled_ environment (mostly server side, I
> guess) and it should be no problem to make custom builds there for
> you. Not making it a compile-time option could force [1] linux distro
> to carry this function to everybody even if they don't use it (and
> it's kinda dangerous to misuse if you don't anonymize the data
> properly). I also prefer this a compile time option.

I actually think this could be useful for normal users, too. We have
GIT_TRACE for dumping debug output, and we sometimes ask users to turn
it on to help diagnose a problem (not to mention that we use it
ourselves).

AFAICT this telemetry data is the same thing, but for performance. When
somebody says "why does this command take so long", wouldn't it be nice
for us to be able to tell them to flip a switch that will collect data
on which operations are taking a long time?

> [1] Of course many distros can choose to patch it out. But it's the
> same argument as bringing this option in in the first place: you guys
> already have that code in private and now want to put it in stock git
> to reduce maintenance cost, why add extra cost on linux distro
> maintenance?

We don't do performance telemetry like this at GitHub, but we do collect
a few variables that we dump to a custom over a Unix socket (e.g., our
upload-pack records clone vs fetch and shallow vs non-shallow for each
request).

In my experience the maintenance burden is not in the "connect to a
socket" part, but the fact that you have to sprinkle the entry points
throughout the code (e.g., "set 'cloning' flag to 1" or "I'm entering
the pack generation phase of the fetch"). So I'd love to see us do that
sprinkling _once_ where everyone can benefit, whether they want better
tracing for debugging, telemetry across their installed base, or
whatever.

The mechanism to handle those calls is much easier to plug in and out,
then. And I don't have a huge preference for compile-time versus
run-time, or whether bits are in-tree or out-of-tree. Obviously we'd
have some basic "write to stderr or a file" consumer in-tree.

For myself, I'm happy with compile-time (I'm instrumenting the server
side, so I really only care about my specific build) and out-of-tree
(the protocol to our custom daemon consumer is not something anybody
else would care about anyway).

For people collecting from clients, I imagine it's more convenient to be
able to let them use official builds and just tweak a knob, even if they
_could_ build a custom Git and push it out to everybody. I don't know
anything about Windows event tracing, but if it's a standard facility
then I doubt we're talking about a huge maintenance burden to have it
in-tree as a configurable option.

So IMHO that really leaves the "is it too scary to have a config knob
that lets tracing go to this event facility versus to a file"? I guess I
just don't see it, as long as the user has to enable it explicitly. That
seems like complaining that GIT_TRACE could go to syslog. If you don't
want to store it, then don't turn on the feature.

-Peff

  parent reply index

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-07 14:53 git
2018-06-07 14:53 ` [RFC PATCH v1] telemetry: design documenation git
2018-06-08 11:06   ` Ævar Arnfjörð Bjarmason
2018-06-07 21:10 ` [RFC PATCH v1] telemetry design overview (part 1) Johannes Sixt
2018-06-08  9:07   ` Jeff King
2018-06-08 16:00     ` Thomas Braun
2018-06-08 22:01       ` Johannes Sixt
2018-06-08 22:20         ` Ævar Arnfjörð Bjarmason
2018-06-09  5:03           ` Duy Nguyen
2018-06-09  6:31             ` Ævar Arnfjörð Bjarmason
2018-06-09  6:56               ` Jeff King
2018-06-09 20:05                 ` Johannes Schindelin
2018-06-11  5:56                   ` Jeff King
2018-06-09  7:31               ` Duy Nguyen
2018-06-09  6:51             ` Jeff King [this message]
2018-06-09  7:04               ` Johannes Sixt
2018-06-09  7:31                 ` Jeff King
2018-06-12 16:04               ` Junio C Hamano
2018-06-09  6:56           ` Johannes Sixt
2018-06-09 20:43             ` Johannes Schindelin
2018-06-09 22:44               ` Johannes Sixt
2018-06-11  6:08                 ` Jeff King
2018-06-10  0:00             ` brian m. carlson
2018-06-11  6:14               ` Jeff King
2018-06-11  8:30                 ` Jeff King
2018-06-08  9:40   ` Ævar Arnfjörð Bjarmason
2018-06-08 15:46     ` Duy Nguyen

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180609065132.GD30224@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=jeffhost@microsoft.com \
    --cc=pclouds@gmail.com \
    --cc=thomas.braun@virtuell-zuhause.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox