git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Georgios Kontaxis via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	"Georgios Kontaxis" <geko1702+commits@99rst.org>
Subject: Re: [PATCH v6] gitweb: redacted e-mail addresses feature.
Date: Mon, 29 Mar 2021 13:00:17 -0700	[thread overview]
Message-ID: <xmqq5z19k9wu.fsf@gitster.g> (raw)
In-Reply-To: <pull.910.v6.git.1616973963862.gitgitgadget@gmail.com> (Georgios Kontaxis via GitGitGadget's message of "Sun, 28 Mar 2021 23:26:03 +0000")

"Georgios Kontaxis via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Georgios Kontaxis <geko1702+commits@99rst.org>
>
> Gitweb extracts content from the Git log and makes it accessible
> over HTTP. As a result, e-mail addresses found in commits are
> exposed to web crawlers and they may not respect robots.txt.
> This can result in unsolicited messages.
>
> Introduce an 'email-privacy' feature which redacts e-mail addresses
> from the generated HTML content. Specifically, obscure addresses
> retrieved from the the author/committer and comment sections of the
> Git log. The feature is off by default.
>
> This feature does not prevent someone from downloading the
> unredacted commit log, e.g., by cloning the repository, and
> extracting information from it. It aims to hinder the low-
> effort, bulk collection of e-mail addresses by web crawlers.
>
> Signed-off-by: Georgios Kontaxis <geko1702+commits@99rst.org>
> ---

> @@ -751,6 +751,17 @@ default font sizes or lineheights are changed (e.g. via adding extra
>  CSS stylesheet in `@stylesheets`), it may be appropriate to change
>  these values.
>  
> +email-privacy::
> +	Redact e-mail addresses from the generated HTML, etc. content.
> +	This obscures e-mail addresses retrieved from the author/committer
> +	and comment sections of the Git log.
> +	It is meant to hinder web crawlers that harvest and abuse addresses.
> +	Such crawlers may not respect robots.txt.
> +	Note that users and user tools also see the addresses as redacted.
> +	If Gitweb is not the final step in a workflow then subsequent steps
> +	may misbehave because of the redacted information they receive.
> +	Disabled by default.

OK.  I still think everything after "Note that" is a bit redandant
(as any intelligent reader can read what the feature does and reach
its natural consequence themselves), but I do not strongly oppose
leaving it in.

> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 0959a782eccb..01c6faf88006 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -569,6 +569,15 @@ sub evaluate_uri {
>  		'sub' => \&feature_extra_branch_refs,
>  		'override' => 0,
>  		'default' => []},
> +
> +	# Redact e-mail addresses.
> +
> +	# To enable system wide have in $GITWEB_CONFIG
> +	# $feature{'email-privacy'}{'default'} = [1];
> +	'email-privacy' => {
> +		'sub' => sub { feature_bool('email-privacy', @_) },
> +		'override' => 1,
> +		'default' => [0]},
>  );

Sensible.

While reviewing this part, I noticed a few things

 - A few other features explain how to toggle it system wide and
   nothing else, like this one does.

 - Others explain, in addition, talk about how to toggle per
   repository (which may necessitate explaining how to toggle the
   override bit if the default is false).

It might be a good idea to standardise the explanation somehow.

This is just an observation (in other words, nothing needs to be
done in this patch).


> @@ -3449,6 +3458,13 @@ sub parse_date {
>  	return %date;
>  }
>  
> +sub hide_mailaddrs_if_private {
> +	my $line = shift;
> +	return $line unless gitweb_check_feature('email-privacy');
> +	$line =~ s/<[^@>]+@[^>]+>/<redacted>/ig;
> +	return $line;
> +}

OK.  The "this catches AUTHOR_EMAIL" pattern is a bit embarrassing
(my fault); "s/<[^@>]+@[-a-z0-9.]+>/<redacted>/ig" might be less
embarrassing but I do not care deeply enough either way.  The version
in this patch is probably closer to what ident.c uses anyway ;-)

> @@ -7489,7 +7506,8 @@ sub git_log_generic {
>  			         -accesskey => "n", -title => "Alt-n"}, "next");
>  	}
>  	my $patch_max = gitweb_get_feature('patches');
> -	if ($patch_max && !defined $file_name) {
> +	if ($patch_max && !defined $file_name &&
> +		!gitweb_check_feature('email-privacy')) {
>  		if ($patch_max < 0 || @commitlist <= $patch_max) {

An observation unrelated to this change.  I think checking for
negative patch_max is a bug in the original code.  Everywhere else,
the way to disable the 'patch' view is to set it to 0, not negative,
and this block is already protected with "if ($patch_max".

> @@ -7550,7 +7568,8 @@ sub git_commit {
>  			} @$parents ) .
>  			')';
>  	}
> -	if (gitweb_check_feature('patches') && @$parents <= 1) {
> +	if (gitweb_check_feature('patches') && @$parents <= 1 &&
> +		!gitweb_check_feature('email-privacy')) {
>  		$formats_nav .= " | " .
>  			$cgi->a({-href => href(action=>"patch", -replay=>1)},
>  				"patch");
> @@ -7863,7 +7882,8 @@ sub git_commitdiff {
>  		$formats_nav =
>  			$cgi->a({-href => href(action=>"commitdiff_plain", -replay=>1)},
>  			        "raw");
> -		if ($patch_max && @{$co{'parents'}} <= 1) {
> +		if ($patch_max && @{$co{'parents'}} <= 1 &&
> +			!gitweb_check_feature('email-privacy')) {
>  			$formats_nav .= " | " .
>  				$cgi->a({-href => href(action=>"patch", -replay=>1)},
>  					"patch");

I see your wish is "we do not show any link to a patch page, but
anybody who knows what a link to the patch page looks like can craft
and ask", but I am not sure if that is worth the above three hunks.
It still feels more robust to just disable the 'patches' feature
when the email-privacy feature is enabled, but that may be just me.

Will queue as-is.  Input from those who are more adept at Perl
and/or interested in helping polish gitweb are still welcome, but at
my level of interest on the topic, this version looks as good as it
gets ;-)

Thanks.

  reply	other threads:[~2021-03-29 20:01 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-20 23:42 [PATCH] gitweb: redacted e-mail addresses feature Georgios Kontaxis via GitGitGadget
2021-03-21  0:42 ` Ævar Arnfjörð Bjarmason
2021-03-21  1:27   ` brian m. carlson
2021-03-21  3:30   ` Georgios Kontaxis
2021-03-21  3:32 ` [PATCH v2] " Georgios Kontaxis via GitGitGadget
2021-03-21 17:28   ` [PATCH v3] " Georgios Kontaxis via GitGitGadget
2021-03-21 18:26     ` Ævar Arnfjörð Bjarmason
2021-03-21 18:48       ` Junio C Hamano
2021-03-21 19:48       ` Georgios Kontaxis
2021-03-21 18:42     ` Junio C Hamano
2021-03-21 18:57       ` Junio C Hamano
2021-03-21 19:05         ` Junio C Hamano
2021-03-21 20:07       ` Georgios Kontaxis
2021-03-21 22:17         ` Junio C Hamano
2021-03-21 23:14           ` Georgios Kontaxis
2021-03-22  4:25             ` Junio C Hamano
2021-03-22  6:57     ` [PATCH v4] " Georgios Kontaxis via GitGitGadget
2021-03-22 18:32       ` Junio C Hamano
2021-03-22 18:58         ` Georgios Kontaxis
2021-03-28  1:41           ` Junio C Hamano
2021-03-28 21:43             ` Georgios Kontaxis
2021-03-28 22:35               ` Junio C Hamano
2021-03-23  4:27         ` Georgios Kontaxis
2021-03-27  3:56       ` [PATCH v5] " Georgios Kontaxis via GitGitGadget
2021-03-28 23:26         ` [PATCH v6] " Georgios Kontaxis via GitGitGadget
2021-03-29 20:00           ` Junio C Hamano [this message]
2021-03-31 21:14             ` Junio C Hamano
2021-04-06  0:56             ` Junio C Hamano
2021-04-08 22:43           ` Ævar Arnfjörð Bjarmason
2021-04-08 22:51             ` Junio C Hamano
2021-03-29  1:47         ` [PATCH v5] " Eric Wong
2021-03-29  3:17           ` Georgios Kontaxis
2021-04-08 17:16             ` Eric Wong
2021-04-08 21:04               ` Junio C Hamano
2021-04-08 21:19                 ` Eric Wong
2021-04-08 22:45                   ` Ævar Arnfjörð Bjarmason
2021-04-08 22:54                     ` Junio C Hamano
2021-03-21  6:00 ` [PATCH] " Junio C Hamano
2021-03-21  6:18   ` Junio C Hamano
2021-03-21  6:43   ` Georgios Kontaxis
2021-03-21 16:55     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq5z19k9wu.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=avarab@gmail.com \
    --cc=geko1702+commits@99rst.org \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).