git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Georgios Kontaxis via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org,
	Georgios Kontaxis <geko1702+commits@99rst.org>
Subject: Re: [PATCH] gitweb: redacted e-mail addresses feature.
Date: Sun, 21 Mar 2021 01:27:42 +0000	[thread overview]
Message-ID: <YFahDtIkSmpLD2oZ@camp.crustytoothpaste.net> (raw)
In-Reply-To: <8735wpz699.fsf@evledraar.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3268 bytes --]

On 2021-03-21 at 00:42:58, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sun, Mar 21 2021, Georgios Kontaxis via GitGitGadget wrote:
> 
> > From: Georgios Kontaxis <geko1702+commits@99rst.org>
> >
> > Gitweb extracts content from the Git log and makes it accessible
> > over HTTP. As a result, e-mail addresses found in commits are
> > exposed to web crawlers. This may result in unsolicited messages.
> > This is a feature for redacting e-mail addresses from the generated
> > HTML content.
> >
> > This feature does not prevent someone from downloading the
> > unredacted commit log and extracting information from it.
> > It aims to hinder the low-effort bulk collection of e-mail
> > addresses by web crawlers.
> 
> So web crawlers that aren't going to obey robots.txt?
> 
> I'm not opposed to this feature, but a glance at gitweb's documentation
> seems to show that we don't discuss how to set robots.txt up for it at
> all.
> 
> Perhaps having that in the docs or otherwise in the default setup would
> get us most of the win of this feature?

I'm going to guess that the two features are orthogonal.  robots.txt is
great for communicating to well-meaning actors what you do and don't
want crawled.  For example, one might ask a web crawler not to crawl
individual commits because that creates excessive load on the server.

This option is about preventing email harvesting, usually for the
purposes of sending spam.  Spam is email abuse and all reasonable people
know it's unacceptable, so by definition the people doing this are bad
actors and are not likely to honor the robots.txt.  As someone who runs
his own mail server, that is certainly my experience.

So I am in favor of this feature.  I think it mirrors what many other
tools do in this space and having it as an option is valuable.

> > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> > index 7963a79ba98b..10653d8670a8 100644
> > --- a/Documentation/gitweb.conf.txt
> > +++ b/Documentation/gitweb.conf.txt
> > @@ -896,6 +896,18 @@ same as of the snippet above:
> >  It is an error to specify a ref that does not pass "git check-ref-format"
> >  scrutiny. Duplicated values are filtered.
> >  
> > +email_privacy::
> > +    Redact e-mail addresses from the generated HTML, etc. content.
> > +    This hides e-mail addresses found in the commit log from web crawlers.
> > +    Enabled by default.
> > ++
> > +It is highly recommended to keep this feature enabled unless web crawlers
> > +are hindered in some other way. You can disable this feature as shown below:
> > ++
> > +---------------------------------------------------------------------------
> > +$feature{'email_privacy'}{'default'} = [0];
> > +---------------------------------------------------------------------------
> 
> I think there's plenty of gitweb users that are going to be relying on
> the current behavior, so doesn't it make more sense for this to be
> opt-in rather than opt-out?

I agree this make sense as an opt-in feature.  While many people will
want to enable it, users who are performing an upgrade won't necessarily
want the behavior to change right away.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

  reply	other threads:[~2021-03-21  1:29 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-20 23:42 [PATCH] gitweb: redacted e-mail addresses feature Georgios Kontaxis via GitGitGadget
2021-03-21  0:42 ` Ævar Arnfjörð Bjarmason
2021-03-21  1:27   ` brian m. carlson [this message]
2021-03-21  3:30   ` Georgios Kontaxis
2021-03-21  3:32 ` [PATCH v2] " Georgios Kontaxis via GitGitGadget
2021-03-21 17:28   ` [PATCH v3] " Georgios Kontaxis via GitGitGadget
2021-03-21 18:26     ` Ævar Arnfjörð Bjarmason
2021-03-21 18:48       ` Junio C Hamano
2021-03-21 19:48       ` Georgios Kontaxis
2021-03-21 18:42     ` Junio C Hamano
2021-03-21 18:57       ` Junio C Hamano
2021-03-21 19:05         ` Junio C Hamano
2021-03-21 20:07       ` Georgios Kontaxis
2021-03-21 22:17         ` Junio C Hamano
2021-03-21 23:14           ` Georgios Kontaxis
2021-03-22  4:25             ` Junio C Hamano
2021-03-22  6:57     ` [PATCH v4] " Georgios Kontaxis via GitGitGadget
2021-03-22 18:32       ` Junio C Hamano
2021-03-22 18:58         ` Georgios Kontaxis
2021-03-28  1:41           ` Junio C Hamano
2021-03-28 21:43             ` Georgios Kontaxis
2021-03-28 22:35               ` Junio C Hamano
2021-03-23  4:27         ` Georgios Kontaxis
2021-03-27  3:56       ` [PATCH v5] " Georgios Kontaxis via GitGitGadget
2021-03-28 23:26         ` [PATCH v6] " Georgios Kontaxis via GitGitGadget
2021-03-29 20:00           ` Junio C Hamano
2021-03-31 21:14             ` Junio C Hamano
2021-04-06  0:56             ` Junio C Hamano
2021-04-08 22:43           ` Ævar Arnfjörð Bjarmason
2021-04-08 22:51             ` Junio C Hamano
2021-03-29  1:47         ` [PATCH v5] " Eric Wong
2021-03-29  3:17           ` Georgios Kontaxis
2021-04-08 17:16             ` Eric Wong
2021-04-08 21:04               ` Junio C Hamano
2021-04-08 21:19                 ` Eric Wong
2021-04-08 22:45                   ` Ævar Arnfjörð Bjarmason
2021-04-08 22:54                     ` Junio C Hamano
2021-03-21  6:00 ` [PATCH] " Junio C Hamano
2021-03-21  6:18   ` Junio C Hamano
2021-03-21  6:43   ` Georgios Kontaxis
2021-03-21 16:55     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YFahDtIkSmpLD2oZ@camp.crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=avarab@gmail.com \
    --cc=geko1702+commits@99rst.org \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).