unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* glibc git commit hooks update.
@ 2019-04-15 19:29 Carlos O'Donell
  2019-04-15 19:55 ` Zack Weinberg
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Carlos O'Donell @ 2019-04-15 19:29 UTC (permalink / raw)
  To: libc-alpha
  Cc: Mark Wielaard, Nick Clifton, Jeff Law, Frank Ch. Eigler,
	Pedro Alves, Florian Weimer

Dear Community,

Florian Weimer and myself have upgraded the glibc commit hooks
to use the common hooks from AdaCore scripts with Red Hat
customizations. These hooks are also being used by binutils
and valgrind and is part of the common infrastructure on
sourceware (/sourceware/projects/src-home/git-hooks).

Our goals were to improve developer workflow, particularly
around user branches and bugzilla noise. We wanted users to
be able to experiment quickly, throw away branches, and
present clean branches to maintainers for review. This meant
rebases and no bugzilla noise.

Concrete goals were:

* User branches should allow rebases.
* User branch updates should not send messages to bugzilla.
* Ensure merges are disabled to master and release branches.
* Less verbose bugzilla updates.

We completed the transition including commissioning tests.

You will immediately see the following benefits:

* User branches (non-master, non-release branches) can now use
   non-fast-forward merges. You do not need to delete and recreate
   your branch.

* User branch commits do not generate emails, or bugzilla updates
   for any reason. Previously we would update bugzilla when any
   branches had commits that mentioned the bug, and this is no
   longer true.

* glibc-cvs and bugzilla will now use a more succinct form of
   update description that includes who did the push, and a very
   succinct description of the single commit.

   e.g. https://sourceware.org/bugzilla/show_bug.cgi?id=16573#c5

   e.g. https://www.sourceware.org/ml/glibc-cvs/2019-q2/msg00038.html

   Note: Scripts that parse this output may need updating.

* Style check applies only to sources files and make files and
   avoids data files by using known extensions. You can now commit
   whitespace changes to data files.

   e.g. *.[ch], *.cpp, *.cc, *.[Ss], *.py, *.awk, manual/*,
        scripts/*, *.mk, and */Make*

Limitations:

* Sending an email and updating bugzilla are the same process,
   and so you either get both or you get none. You cannot have
   commit emails without bugzilla updates. Therefore we have opted
   to have only commit emails and bugzilla updates for release and
   master. This can be fixed but it needs extending in the existing
   scripts to make the two operations distinct. For example if anyone
   wants user branches to generate email commit messages then we'll
   need to work on this.

* Style check is applied to the whole file not the diff of the changes.
   As of today all the source files are clean. This should not make a
   difference to the project, but it is a difference in the way the
   hooks operate.

Commissioning:

* Tested that user branches can be rebased.
* Tested that user branch commits do not generate emails or bugzilla updates.

It's a little noisy locally:
remote: ----------------------------------------------------------------------
remote: --  The hooks.no-emails config option contains `refs/heads/(?!master|release.*)',
remote: --  which matches the name of the reference being updated
remote: --  (refs/heads/fw/bug21242).
remote: --
remote: --  Commit emails will therefore not be sent.
remote: ----------------------------------------------------------------------

* Tested that merge commits are not allowed on master and release branches.

remote: *** Merge commits are not allowed on refs/heads/master.
remote: *** The commit that caused this error is:
remote: ***
remote: ***     commit 68d5c2453a221ed6384c3e78a75e8b443b0c56ad
remote: ***     Subject: Test commit
remote: ***
remote: *** Hint: Consider using "git cherry-pick" instead of "git merge",
remote: ***       or "git pull --rebase" instead of "git pull".

* Verified format of output to bugzilla and glibc-cvs is succinct.

   Examples:

   Before: https://sourceware.org/bugzilla/show_bug.cgi?id=16573#c3
   After: https://sourceware.org/bugzilla/show_bug.cgi?id=16573#c5

   Before: https://www.sourceware.org/ml/glibc-cvs/2019-q2/msg00031.html
   After: https://www.sourceware.org/ml/glibc-cvs/2019-q2/msg00037.html

Thank you very much for your patience.

It is our sincerest hope that these changes will make developing
on glibc much better.

If you have any problems with these changes please reach out
to me or Florian to discuss the update.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-04-15 19:29 glibc git commit hooks update Carlos O'Donell
@ 2019-04-15 19:55 ` Zack Weinberg
  2019-04-15 20:16 ` DJ Delorie
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 26+ messages in thread
From: Zack Weinberg @ 2019-04-15 19:55 UTC (permalink / raw)
  To: Carlos O'Donell, Florian Weimer; +Cc: libc-alpha

On Mon, Apr 15, 2019 at 3:29 PM Carlos O'Donell <codonell@redhat.com> wrote:
> Florian Weimer and myself have upgraded the glibc commit hooks
> to use the common hooks from AdaCore scripts with Red Hat
> customizations. These hooks are also being used by binutils
> and valgrind and is part of the common infrastructure on
> sourceware (/sourceware/projects/src-home/git-hooks).
...
> * User branches (non-master, non-release branches) can now use
>    non-fast-forward merges. You do not need to delete and recreate
>    your branch.

This is going to make my glibc development work significantly easier.
Thank you.

zw

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-04-15 19:29 glibc git commit hooks update Carlos O'Donell
  2019-04-15 19:55 ` Zack Weinberg
@ 2019-04-15 20:16 ` DJ Delorie
  2019-04-15 20:22   ` Florian Weimer
  2019-04-15 21:08 ` Adhemerval Zanella
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: DJ Delorie @ 2019-04-15 20:16 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha


For the benefit of those of us who aren't git-savvy, could you please
summarize what a long-term in-branch development process should look
like?  I.e. one that's optimal for our new configuration on sourceware?

I recall going through that with the tcache work, constantly merging
master into the branch, flooding irc, messing up my local repo, and I
recall it was... suboptimal.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-04-15 20:16 ` DJ Delorie
@ 2019-04-15 20:22   ` Florian Weimer
  2019-04-15 20:24     ` DJ Delorie
  0 siblings, 1 reply; 26+ messages in thread
From: Florian Weimer @ 2019-04-15 20:22 UTC (permalink / raw)
  To: DJ Delorie; +Cc: Carlos O'Donell, libc-alpha

* DJ Delorie:

> For the benefit of those of us who aren't git-savvy, could you please
> summarize what a long-term in-branch development process should look
> like?  I.e. one that's optimal for our new configuration on sourceware?
>
> I recall going through that with the tcache work, constantly merging
> master into the branch, flooding irc, messing up my local repo, and I
> recall it was... suboptimal.

You can do whatever you want on the private branches.  IRC, email,
Bugzilla updates are all disabled as of today.

In theory, too many private branches could cause repository bloat, but
given the amount of history we carry around anyway, I don't expect
this to be a substantial issue.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-04-15 20:22   ` Florian Weimer
@ 2019-04-15 20:24     ` DJ Delorie
  2019-04-15 20:28       ` Florian Weimer
  0 siblings, 1 reply; 26+ messages in thread
From: DJ Delorie @ 2019-04-15 20:24 UTC (permalink / raw)
  To: Florian Weimer; +Cc: codonell, libc-alpha

Florian Weimer <fw@deneb.enyo.de> writes:
> You can do whatever you want on the private branches.  IRC, email,
> Bugzilla updates are all disabled as of today.

The fact that I can do whatever I want is the *problem*, not the
*solution*.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-04-15 20:24     ` DJ Delorie
@ 2019-04-15 20:28       ` Florian Weimer
  2019-04-15 20:33         ` DJ Delorie
  0 siblings, 1 reply; 26+ messages in thread
From: Florian Weimer @ 2019-04-15 20:28 UTC (permalink / raw)
  To: DJ Delorie; +Cc: codonell, libc-alpha

* DJ Delorie:

> Florian Weimer <fw@deneb.enyo.de> writes:
>> You can do whatever you want on the private branches.  IRC, email,
>> Bugzilla updates are all disabled as of today.
>
> The fact that I can do whatever I want is the *problem*, not the
> *solution*.

Sorry, I don't understand.  You voiced concerns about the interference
with IRC and Bugzilla, and we solved that today.

Private branches were so painful to use in the past that we only used
them so rarely that no common practices could evolve.  Hopefully,
that's going to change now, and we can look back in a few months and
make some recommendations about what works best.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-04-15 20:28       ` Florian Weimer
@ 2019-04-15 20:33         ` DJ Delorie
  2019-04-15 20:39           ` Florian Weimer
  0 siblings, 1 reply; 26+ messages in thread
From: DJ Delorie @ 2019-04-15 20:33 UTC (permalink / raw)
  To: Florian Weimer; +Cc: codonell, libc-alpha

Florian Weimer <fw@deneb.enyo.de> writes:
> Sorry, I don't understand.  You voiced concerns about the interference
> with IRC and Bugzilla, and we solved that today.

Yes, and also changed the rules about what's allowed in a private
branch.  Given the relaxed rules and isolation from irc/mail, there
would be a "new optimum" in best practices.  I hope ;-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-04-15 20:33         ` DJ Delorie
@ 2019-04-15 20:39           ` Florian Weimer
  2019-04-15 21:47             ` Yann Droneaud
  0 siblings, 1 reply; 26+ messages in thread
From: Florian Weimer @ 2019-04-15 20:39 UTC (permalink / raw)
  To: DJ Delorie; +Cc: codonell, libc-alpha

* DJ Delorie:

> Florian Weimer <fw@deneb.enyo.de> writes:
>> Sorry, I don't understand.  You voiced concerns about the interference
>> with IRC and Bugzilla, and we solved that today.
>
> Yes, and also changed the rules about what's allowed in a private
> branch.  Given the relaxed rules and isolation from irc/mail, there
> would be a "new optimum" in best practices.  I hope ;-)

Yes, but I don't know what these practices will look like.  I
personally have little experience with projects that use a shared
repository with private branches.

I plan to push most of my unmerged local branches to sourceware in the
coming days.  It remains to be seen if this possibility will lead to
changes in our patch review procedures.  It could be quite helpful to
be able to see exactly what is proposed for commit, with all details
(commit message, author, author date, and so on).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-04-15 19:29 glibc git commit hooks update Carlos O'Donell
  2019-04-15 19:55 ` Zack Weinberg
  2019-04-15 20:16 ` DJ Delorie
@ 2019-04-15 21:08 ` Adhemerval Zanella
  2019-04-17 22:21 ` Joseph Myers
  2019-05-23 10:05 ` Andreas Schwab
  4 siblings, 0 replies; 26+ messages in thread
From: Adhemerval Zanella @ 2019-04-15 21:08 UTC (permalink / raw)
  To: libc-alpha



On 15/04/2019 16:29, Carlos O'Donell wrote:
> Dear Community,
> 
> Florian Weimer and myself have upgraded the glibc commit hooks
> to use the common hooks from AdaCore scripts with Red Hat
> customizations. These hooks are also being used by binutils
> and valgrind and is part of the common infrastructure on
> sourceware (/sourceware/projects/src-home/git-hooks).
>

Thanks for doing that.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-04-15 20:39           ` Florian Weimer
@ 2019-04-15 21:47             ` Yann Droneaud
  2019-04-16  7:06               ` Florian Weimer
  0 siblings, 1 reply; 26+ messages in thread
From: Yann Droneaud @ 2019-04-15 21:47 UTC (permalink / raw)
  To: Florian Weimer, DJ Delorie; +Cc: codonell, libc-alpha

Hi,

Le lundi 15 avril 2019 à 22:39 +0200, Florian Weimer a écrit :
> * DJ Delorie:
> 
> > Florian Weimer <fw@deneb.enyo.de> writes:
> > > Sorry, I don't understand.  You voiced concerns about the
> > > interference
> > > with IRC and Bugzilla, and we solved that today.
> > 
> > Yes, and also changed the rules about what's allowed in a private
> > branch.  Given the relaxed rules and isolation from irc/mail, there
> > would be a "new optimum" in best practices.  I hope ;-)
> 
> Yes, but I don't know what these practices will look like.  I
> personally have little experience with projects that use a shared
> repository with private branches.
> 

Not sure of the meaning of "private branches" in the context of a
shared repository.

Do you mean personal branches in shared repository ?

Regards.

-- 
Yann Droneaud
OPTEYA



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-04-15 21:47             ` Yann Droneaud
@ 2019-04-16  7:06               ` Florian Weimer
  0 siblings, 0 replies; 26+ messages in thread
From: Florian Weimer @ 2019-04-16  7:06 UTC (permalink / raw)
  To: Yann Droneaud; +Cc: DJ Delorie, codonell, libc-alpha

* Yann Droneaud:

> Hi,
>
> Le lundi 15 avril 2019 à 22:39 +0200, Florian Weimer a écrit :
>> * DJ Delorie:
>> 
>> > Florian Weimer <fw@deneb.enyo.de> writes:
>> > > Sorry, I don't understand.  You voiced concerns about the
>> > > interference
>> > > with IRC and Bugzilla, and we solved that today.
>> > 
>> > Yes, and also changed the rules about what's allowed in a private
>> > branch.  Given the relaxed rules and isolation from irc/mail, there
>> > would be a "new optimum" in best practices.  I hope ;-)
>> 
>> Yes, but I don't know what these practices will look like.  I
>> personally have little experience with projects that use a shared
>> repository with private branches.
>> 
>
> Not sure of the meaning of "private branches" in the context of a
> shared repository.
>
> Do you mean personal branches in shared repository ?

Yes, sorry for the confusion.  Not necessarily personal, but
non-master/non-official branches.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-04-15 19:29 glibc git commit hooks update Carlos O'Donell
                   ` (2 preceding siblings ...)
  2019-04-15 21:08 ` Adhemerval Zanella
@ 2019-04-17 22:21 ` Joseph Myers
  2019-04-18  0:20   ` Carlos O'Donell
  2019-05-23 10:05 ` Andreas Schwab
  4 siblings, 1 reply; 26+ messages in thread
From: Joseph Myers @ 2019-04-17 22:21 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: libc-alpha, Mark Wielaard, Nick Clifton, Jeff Law,
	Frank Ch. Eigler, Pedro Alves, Florian Weimer

On Mon, 15 Apr 2019, Carlos O'Donell wrote:

> * Sending an email and updating bugzilla are the same process,
>   and so you either get both or you get none. You cannot have
>   commit emails without bugzilla updates. Therefore we have opted
>   to have only commit emails and bugzilla updates for release and
>   master. This can be fixed but it needs extending in the existing
>   scripts to make the two operations distinct. For example if anyone
>   wants user branches to generate email commit messages then we'll
>   need to work on this.

I think user branches should generate emails (as I see it the point of 
having such branches in this repository at all is visibility of the work 
going on there, and that should include generating emails).

I'm not clear on whether the hooks were configured with any limit on the 
size of diffs mailed, but if there is such a limit it should be at least 
several MB.

One problem with the old hooks was that they did not generate a 
Content-Type header, which was unhelpful when the mails also weren't pure 
ASCII.  I see the new ones are generating 'Content-Type: text/plain; 
charset="us-ascii"'.  Will they also be smart about specifying an 
appropriate character set (so UTF-8 if the commit message / author name / 
diff contents are valid UTF-8 but not ASCII, for example)?

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-04-17 22:21 ` Joseph Myers
@ 2019-04-18  0:20   ` Carlos O'Donell
  2019-04-18 12:38     ` Joseph Myers
  2019-05-16 19:35     ` Joseph Myers
  0 siblings, 2 replies; 26+ messages in thread
From: Carlos O'Donell @ 2019-04-18  0:20 UTC (permalink / raw)
  To: Joseph Myers
  Cc: libc-alpha, Mark Wielaard, Nick Clifton, Jeff Law,
	Frank Ch. Eigler, Pedro Alves, Florian Weimer

On 4/17/19 6:21 PM, Joseph Myers wrote:
> On Mon, 15 Apr 2019, Carlos O'Donell wrote:
> 
>> * Sending an email and updating bugzilla are the same process,
>>    and so you either get both or you get none. You cannot have
>>    commit emails without bugzilla updates. Therefore we have opted
>>    to have only commit emails and bugzilla updates for release and
>>    master. This can be fixed but it needs extending in the existing
>>    scripts to make the two operations distinct. For example if anyone
>>    wants user branches to generate email commit messages then we'll
>>    need to work on this.
> 
> I think user branches should generate emails (as I see it the point of
> having such branches in this repository at all is visibility of the work
> going on there, and that should include generating emails).

If we want user branches to generate emails *and* not update bugzilla
then we can do that. If you are requesting this then we'll have to work
with upstream AdaCore and the other projects to add this feature.

> I'm not clear on whether the hooks were configured with any limit on the
> size of diffs mailed, but if there is such a limit it should be at least
> several MB.

The default is 100KiB.

I have just pushed an update to allow up to 5MiB.

> One problem with the old hooks was that they did not generate a
> Content-Type header, which was unhelpful when the mails also weren't pure
> ASCII.  I see the new ones are generating 'Content-Type: text/plain;
> charset="us-ascii"'.  Will they also be smart about specifying an
> appropriate character set (so UTF-8 if the commit message / author name /
> diff contents are valid UTF-8 but not ASCII, for example)?

All of this is handled by python's email package and some handling
on the hooks part. If everything is ASCII then we don't do any special
encoding and just send ASCII. Otherwise we choose UTF-8 first and if
that fails a decode test then we fallback to ISO-8859-1.

See: glibc.git/hooks/updates/emails.py.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-04-18  0:20   ` Carlos O'Donell
@ 2019-04-18 12:38     ` Joseph Myers
  2019-04-18 12:46       ` Florian Weimer
  2019-05-16 19:35     ` Joseph Myers
  1 sibling, 1 reply; 26+ messages in thread
From: Joseph Myers @ 2019-04-18 12:38 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: libc-alpha, Mark Wielaard, Nick Clifton, Jeff Law,
	Frank Ch. Eigler, Pedro Alves, Florian Weimer

On Wed, 17 Apr 2019, Carlos O'Donell wrote:

> > I think user branches should generate emails (as I see it the point of
> > having such branches in this repository at all is visibility of the work
> > going on there, and that should include generating emails).
> 
> If we want user branches to generate emails *and* not update bugzilla
> then we can do that. If you are requesting this then we'll have to work
> with upstream AdaCore and the other projects to add this feature.

I think that's an appropriate configuration - generate emails to glibc-cvs 
but without Bugzilla updates for user branches.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-04-18 12:38     ` Joseph Myers
@ 2019-04-18 12:46       ` Florian Weimer
  2019-04-18 14:45         ` Joseph Myers
  0 siblings, 1 reply; 26+ messages in thread
From: Florian Weimer @ 2019-04-18 12:46 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Carlos O'Donell, libc-alpha, Mark Wielaard, Nick Clifton,
	Jeff Law, Frank Ch. Eigler, Pedro Alves

* Joseph Myers:

> On Wed, 17 Apr 2019, Carlos O'Donell wrote:
>
>> > I think user branches should generate emails (as I see it the point of
>> > having such branches in this repository at all is visibility of the work
>> > going on there, and that should include generating emails).
>> 
>> If we want user branches to generate emails *and* not update bugzilla
>> then we can do that. If you are requesting this then we'll have to work
>> with upstream AdaCore and the other projects to add this feature.
>
> I think that's an appropriate configuration - generate emails to glibc-cvs 
> but without Bugzilla updates for user branches.

Makes sense to me.

I will try to raise the issue with the hook scripts maintainer.  Is it
okay to leave the new hooks in place with the current behavior?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-04-18 12:46       ` Florian Weimer
@ 2019-04-18 14:45         ` Joseph Myers
  2019-05-16 10:13           ` Florian Weimer
  0 siblings, 1 reply; 26+ messages in thread
From: Joseph Myers @ 2019-04-18 14:45 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Carlos O'Donell, libc-alpha, Mark Wielaard, Nick Clifton,
	Jeff Law, Frank Ch. Eigler, Pedro Alves

On Thu, 18 Apr 2019, Florian Weimer wrote:

> Makes sense to me.
> 
> I will try to raise the issue with the hook scripts maintainer.  Is it
> okay to leave the new hooks in place with the current behavior?

Sure.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-04-18 14:45         ` Joseph Myers
@ 2019-05-16 10:13           ` Florian Weimer
  0 siblings, 0 replies; 26+ messages in thread
From: Florian Weimer @ 2019-05-16 10:13 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Carlos O'Donell, libc-alpha, Mark Wielaard, Nick Clifton,
	Jeff Law, Frank Ch. Eigler, Pedro Alves

* Joseph Myers:

> On Thu, 18 Apr 2019, Florian Weimer wrote:
>
>> Makes sense to me.
>> 
>> I will try to raise the issue with the hook scripts maintainer.  Is it
>> okay to leave the new hooks in place with the current behavior?
>
> Sure.

I believe this is now fixed.  I installed a wrapper script that parses
the message that is about to be posted to Bugzilla, and skips calling
the actual script if it refers to a non-official branch.  While this is
a bit hackish, it avoids changes to the generic hook scripts.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-04-18  0:20   ` Carlos O'Donell
  2019-04-18 12:38     ` Joseph Myers
@ 2019-05-16 19:35     ` Joseph Myers
  2019-05-16 20:03       ` Carlos O'Donell
  2019-05-16 20:57       ` Florian Weimer
  1 sibling, 2 replies; 26+ messages in thread
From: Joseph Myers @ 2019-05-16 19:35 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: libc-alpha, Mark Wielaard, Nick Clifton, Jeff Law,
	Frank Ch. Eigler, Pedro Alves, Florian Weimer

[-- Attachment #1: Type: text/plain, Size: 982 bytes --]

On Wed, 17 Apr 2019, Carlos O'Donell wrote:

> > One problem with the old hooks was that they did not generate a
> > Content-Type header, which was unhelpful when the mails also weren't pure
> > ASCII.  I see the new ones are generating 'Content-Type: text/plain;
> > charset="us-ascii"'.  Will they also be smart about specifying an
> > appropriate character set (so UTF-8 if the commit message / author name /
> > diff contents are valid UTF-8 but not ASCII, for example)?
> 
> All of this is handled by python's email package and some handling
> on the hooks part. If everything is ASCII then we don't do any special
> encoding and just send ASCII. Otherwise we choose UTF-8 first and if
> that fails a decode test then we fallback to ISO-8859-1.

This does not seem to be working as intended.  See e.g. 
https://sourceware.org/ml/glibc-cvs/2019-q2/msg00147.html (UTF-8 bytes 
marked as us-ascii, so ’ appears as â??, for example).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-05-16 19:35     ` Joseph Myers
@ 2019-05-16 20:03       ` Carlos O'Donell
  2019-05-16 20:27         ` Joseph Myers
  2019-05-16 20:57       ` Florian Weimer
  1 sibling, 1 reply; 26+ messages in thread
From: Carlos O'Donell @ 2019-05-16 20:03 UTC (permalink / raw)
  To: Joseph Myers
  Cc: libc-alpha, Mark Wielaard, Nick Clifton, Jeff Law,
	Frank Ch. Eigler, Pedro Alves, Florian Weimer

On 5/16/19 3:35 PM, Joseph Myers wrote:
> On Wed, 17 Apr 2019, Carlos O'Donell wrote:
> 
>>> One problem with the old hooks was that they did not generate a
>>> Content-Type header, which was unhelpful when the mails also weren't pure
>>> ASCII.  I see the new ones are generating 'Content-Type: text/plain;
>>> charset="us-ascii"'.  Will they also be smart about specifying an
>>> appropriate character set (so UTF-8 if the commit message / author name /
>>> diff contents are valid UTF-8 but not ASCII, for example)?
>>
>> All of this is handled by python's email package and some handling
>> on the hooks part. If everything is ASCII then we don't do any special
>> encoding and just send ASCII. Otherwise we choose UTF-8 first and if
>> that fails a decode test then we fallback to ISO-8859-1.
> 
> This does not seem to be working as intended.  See e.g. 
> https://sourceware.org/ml/glibc-cvs/2019-q2/msg00147.html (UTF-8 bytes 
> marked as us-ascii, so ’ appears as â??, for example).

The outbound message is marked:
Content-Type: text/plain; charset="us-ascii"

So it makes sense that the UTF-8 characters are not correctly
parsed. The API used here is the MIMEText one and it defaults
always to "us-ascii", so this needs changing.

All files in glibc sources should be UTF-8 unless they are testing
input files that have specific encodings, and even then we cannot
fix all cases e.g. tst-langinfo.sh and tst-fnmatch.input have
multiple mixed encodings (and are almost impossible to edit with
an editor).

Therefore I think all projects should just default to UTF-8
encoded output for emails going to the list.

Thoughts on defaulting to UTF-8?

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-05-16 20:03       ` Carlos O'Donell
@ 2019-05-16 20:27         ` Joseph Myers
  2019-05-16 20:51           ` Carlos O'Donell
  0 siblings, 1 reply; 26+ messages in thread
From: Joseph Myers @ 2019-05-16 20:27 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: libc-alpha, Mark Wielaard, Nick Clifton, Jeff Law,
	Frank Ch. Eigler, Pedro Alves, Florian Weimer

On Thu, 16 May 2019, Carlos O'Donell wrote:

> Therefore I think all projects should just default to UTF-8
> encoded output for emails going to the list.

If the email text contains something that is *not* UTF-8 (if the patch 
contains diffs to files that have some reason for being in another 
encoding) then it's probably best not to label the email as being in 
UTF-8.  Otherwise, labelling it as UTF-8 makes sense (including if in fact 
it's in the ASCII subset of UTF-8).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-05-16 20:27         ` Joseph Myers
@ 2019-05-16 20:51           ` Carlos O'Donell
  2019-05-16 21:05             ` Joseph Myers
  0 siblings, 1 reply; 26+ messages in thread
From: Carlos O'Donell @ 2019-05-16 20:51 UTC (permalink / raw)
  To: Joseph Myers
  Cc: libc-alpha, Mark Wielaard, Nick Clifton, Jeff Law,
	Frank Ch. Eigler, Pedro Alves, Florian Weimer

On 5/16/19 4:27 PM, Joseph Myers wrote:
> On Thu, 16 May 2019, Carlos O'Donell wrote:
> 
>> Therefore I think all projects should just default to UTF-8
>> encoded output for emails going to the list.
> 
> If the email text contains something that is *not* UTF-8 (if the patch 
> contains diffs to files that have some reason for being in another 
> encoding) then it's probably best not to label the email as being in 
> UTF-8.  Otherwise, labelling it as UTF-8 makes sense (including if in fact 
> it's in the ASCII subset of UTF-8).

Are you suggesting we drop Content-Type: and Content-Transfer-Encoding:
from the email entirely if we find non-UTF-8 content?

Otherwise we should always use UTF-8?

We can attempt a UTF-8 encoding, and if that fails, we must pick a fallback
encoding, unfortunately iso88591 will encode any stream of bytes, so we could
fall back to that?

In summary the rules would look like this:

- Attempt a UTF-8 encoding.
  - Fail? Send it out as ISO-8859-1.
- Pass? Send it out as UTF-8.

For the record, this is already the set of rules used for normalizing the
content in the header, subject, and other places, but is not currently
used overall for the message headers.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-05-16 19:35     ` Joseph Myers
  2019-05-16 20:03       ` Carlos O'Donell
@ 2019-05-16 20:57       ` Florian Weimer
  2019-05-16 21:06         ` Carlos O'Donell
  1 sibling, 1 reply; 26+ messages in thread
From: Florian Weimer @ 2019-05-16 20:57 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Carlos O'Donell, libc-alpha, Mark Wielaard, Nick Clifton,
	Jeff Law, Frank Ch. Eigler, Pedro Alves

* Joseph Myers:

> On Wed, 17 Apr 2019, Carlos O'Donell wrote:
>
>> > One problem with the old hooks was that they did not generate a
>> > Content-Type header, which was unhelpful when the mails also weren't pure
>> > ASCII.  I see the new ones are generating 'Content-Type: text/plain;
>> > charset="us-ascii"'.  Will they also be smart about specifying an
>> > appropriate character set (so UTF-8 if the commit message / author name /
>> > diff contents are valid UTF-8 but not ASCII, for example)?
>> 
>> All of this is handled by python's email package and some handling
>> on the hooks part. If everything is ASCII then we don't do any special
>> encoding and just send ASCII. Otherwise we choose UTF-8 first and if
>> that fails a decode test then we fallback to ISO-8859-1.
>
> This does not seem to be working as intended.  See e.g. 
> https://sourceware.org/ml/glibc-cvs/2019-q2/msg00147.html (UTF-8 bytes 
> marked as us-ascii, so ’ appears as â??, for example).

I expect that this is unrelated to the change and was this way before.
I haven't modified those parts.

Defaulting to charset=utf-8 is probably the best option here, even
though the message text might not be UTF-8 always.  I think this
requires modifications to the Python scripts.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-05-16 20:51           ` Carlos O'Donell
@ 2019-05-16 21:05             ` Joseph Myers
  0 siblings, 0 replies; 26+ messages in thread
From: Joseph Myers @ 2019-05-16 21:05 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: libc-alpha, Mark Wielaard, Nick Clifton, Jeff Law,
	Frank Ch. Eigler, Pedro Alves, Florian Weimer

On Thu, 16 May 2019, Carlos O'Donell wrote:

> In summary the rules would look like this:
> 
> - Attempt a UTF-8 encoding.
>   - Fail? Send it out as ISO-8859-1.
> - Pass? Send it out as UTF-8.

Yes, that seems right.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-05-16 20:57       ` Florian Weimer
@ 2019-05-16 21:06         ` Carlos O'Donell
  0 siblings, 0 replies; 26+ messages in thread
From: Carlos O'Donell @ 2019-05-16 21:06 UTC (permalink / raw)
  To: Florian Weimer, Joseph Myers
  Cc: libc-alpha, Mark Wielaard, Nick Clifton, Jeff Law,
	Frank Ch. Eigler, Pedro Alves

On 5/16/19 4:57 PM, Florian Weimer wrote:
> * Joseph Myers:
> 
>> On Wed, 17 Apr 2019, Carlos O'Donell wrote:
>>
>>>> One problem with the old hooks was that they did not generate a
>>>> Content-Type header, which was unhelpful when the mails also weren't pure
>>>> ASCII.  I see the new ones are generating 'Content-Type: text/plain;
>>>> charset="us-ascii"'.  Will they also be smart about specifying an
>>>> appropriate character set (so UTF-8 if the commit message / author name /
>>>> diff contents are valid UTF-8 but not ASCII, for example)?
>>>
>>> All of this is handled by python's email package and some handling
>>> on the hooks part. If everything is ASCII then we don't do any special
>>> encoding and just send ASCII. Otherwise we choose UTF-8 first and if
>>> that fails a decode test then we fallback to ISO-8859-1.
>>
>> This does not seem to be working as intended.  See e.g. 
>> https://sourceware.org/ml/glibc-cvs/2019-q2/msg00147.html (UTF-8 bytes 
>> marked as us-ascii, so ’ appears as â??, for example).
> 
> I expect that this is unrelated to the change and was this way before.
> I haven't modified those parts.
> 
> Defaulting to charset=utf-8 is probably the best option here, even
> though the message text might not be UTF-8 always.  I think this
> requires modifications to the Python scripts.

Correct.

Today we have:

hooks/updates/emails.py:
201         e_msg = MIMEText(self.__email_body_with_diff)

We need something like this instead:

	full_text = self.__email_body_with_Diff
	encoding = None
	for potential_encoding in ('UTF-8', 'iso-8859-1')
		try:
			full_text.decode(potential_encoding)
			encoding = potential_encoding
			break
		except
			pass
	if encoding is None:
		encoding = 'us-ascii'
	e_msg = MIMEText(full_text, 'plain', encoding)
	... Rest of the handling ...

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-04-15 19:29 glibc git commit hooks update Carlos O'Donell
                   ` (3 preceding siblings ...)
  2019-04-17 22:21 ` Joseph Myers
@ 2019-05-23 10:05 ` Andreas Schwab
  2019-05-25  1:15   ` Carlos O'Donell
  4 siblings, 1 reply; 26+ messages in thread
From: Andreas Schwab @ 2019-05-23 10:05 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: libc-alpha, Mark Wielaard, Nick Clifton, Jeff Law,
	Frank Ch. Eigler, Pedro Alves, Florian Weimer

The bugzilla update script is broken:

https://sourceware.org/bugzilla/show_bug.cgi?id=18093#c6

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: glibc git commit hooks update.
  2019-05-23 10:05 ` Andreas Schwab
@ 2019-05-25  1:15   ` Carlos O'Donell
  0 siblings, 0 replies; 26+ messages in thread
From: Carlos O'Donell @ 2019-05-25  1:15 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: libc-alpha, Mark Wielaard, Nick Clifton, Jeff Law,
	Frank Ch. Eigler, Pedro Alves, Florian Weimer

On 5/23/19 5:05 AM, Andreas Schwab wrote:
> The bugzilla update script is broken:
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=18093#c6

Could you please describe what is broken?

Is it that Alexandra's name is not properly displayed?

The raw data in the comment message is UTF-8:

ef bf bd <UFFFD> "REPLACEMENT CHARACTER"

So yes, something went wrong during the conversion of

c3 a1 <U00E1> "LATIN SMALL LETTER A WITH ACUTE"

which is in the UTF-8 commit author information.

This looks again like potentially an encoding issue with
python and the underlying OS.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-05-25  1:15 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-15 19:29 glibc git commit hooks update Carlos O'Donell
2019-04-15 19:55 ` Zack Weinberg
2019-04-15 20:16 ` DJ Delorie
2019-04-15 20:22   ` Florian Weimer
2019-04-15 20:24     ` DJ Delorie
2019-04-15 20:28       ` Florian Weimer
2019-04-15 20:33         ` DJ Delorie
2019-04-15 20:39           ` Florian Weimer
2019-04-15 21:47             ` Yann Droneaud
2019-04-16  7:06               ` Florian Weimer
2019-04-15 21:08 ` Adhemerval Zanella
2019-04-17 22:21 ` Joseph Myers
2019-04-18  0:20   ` Carlos O'Donell
2019-04-18 12:38     ` Joseph Myers
2019-04-18 12:46       ` Florian Weimer
2019-04-18 14:45         ` Joseph Myers
2019-05-16 10:13           ` Florian Weimer
2019-05-16 19:35     ` Joseph Myers
2019-05-16 20:03       ` Carlos O'Donell
2019-05-16 20:27         ` Joseph Myers
2019-05-16 20:51           ` Carlos O'Donell
2019-05-16 21:05             ` Joseph Myers
2019-05-16 20:57       ` Florian Weimer
2019-05-16 21:06         ` Carlos O'Donell
2019-05-23 10:05 ` Andreas Schwab
2019-05-25  1:15   ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).