git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Cleaning up log messages
@ 2008-07-27 17:50 Jon Smirl
  2008-07-27 18:01 ` Johannes Schindelin
  2008-07-27 18:47 ` Junio C Hamano
  0 siblings, 2 replies; 10+ messages in thread
From: Jon Smirl @ 2008-07-27 17:50 UTC (permalink / raw)
  To: Git Mailing List

I was playing around with git log for the kernel and observed that
there is a lot of noise when trying to do statistics on the number of
commits.

For example:

Author: Greg K-H <gregkh@suse.de>
Author: Greg KH <gregkh@suse.de>
Author: Greg KH <greg@kroah.com>
Author: Greg KH <greg@press.(none)>
Author: gregkh@suse.de <gregkh@suse.de>
Author: Greg Kroah-Hartman <gregkh@suse>
Author: Greg Kroah-Hartman <gregkh@suse.de>
Author: Greg Kroah-Hartman <greg@kroah.com>

I don't see an obvious way to do this with git, but it would be neat
to have a 'clean' option on git log that would take each email address
(author, signed-off, acked, etc) and map it through a table which
would convert old email addresses in to the current one and also
standardize the formatting of the names. A cleaned log would be
altered on display, but just don't clean it if you want the original.

Of course this initial map would need to be built by hand. New commits
could be checked against the map and the mapped updated if the person
really has a new email address. Checking new commits against the map
would help clean things up going forward. checkpatch.pl could also
validate against the mapping file.

No pressing need to for this, it would just be a nice toy.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cleaning up log messages
  2008-07-27 17:50 Cleaning up log messages Jon Smirl
@ 2008-07-27 18:01 ` Johannes Schindelin
  2008-07-27 18:16   ` Jon Smirl
  2008-07-27 18:47 ` Junio C Hamano
  1 sibling, 1 reply; 10+ messages in thread
From: Johannes Schindelin @ 2008-07-27 18:01 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Git Mailing List

Hi,

On Sun, 27 Jul 2008, Jon Smirl wrote:

> I was playing around with git log for the kernel and observed that there 
> is a lot of noise when trying to do statistics on the number of commits.
> 
> For example:
> 
> Author: Greg K-H <gregkh@suse.de>
> Author: Greg KH <gregkh@suse.de>
> Author: Greg KH <greg@kroah.com>
> Author: Greg KH <greg@press.(none)>
> Author: gregkh@suse.de <gregkh@suse.de>
> Author: Greg Kroah-Hartman <gregkh@suse>
> Author: Greg Kroah-Hartman <gregkh@suse.de>
> Author: Greg Kroah-Hartman <greg@kroah.com>
> 
> I don't see an obvious way to do this with git, but it would be neat
> to have a 'clean' option on git log that would take each email address
> (author, signed-off, acked, etc) and map it through a table which
> would convert old email addresses in to the current one and also
> standardize the formatting of the names.

Something like .mailmap?

And to show the mapped author name instead of the committed one, you would 
use "--pretty=format:%aN"?  (Needs 1.6.0-rc0 at least, IIRC)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cleaning up log messages
  2008-07-27 18:01 ` Johannes Schindelin
@ 2008-07-27 18:16   ` Jon Smirl
  2008-07-27 18:33     ` Petr Baudis
  0 siblings, 1 reply; 10+ messages in thread
From: Jon Smirl @ 2008-07-27 18:16 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Git Mailing List

On 7/27/08, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
>
>  On Sun, 27 Jul 2008, Jon Smirl wrote:
>
>  > I was playing around with git log for the kernel and observed that there
>  > is a lot of noise when trying to do statistics on the number of commits.
>  >
>  > For example:
>  >
>  > Author: Greg K-H <gregkh@suse.de>
>  > Author: Greg KH <gregkh@suse.de>
>  > Author: Greg KH <greg@kroah.com>
>  > Author: Greg KH <greg@press.(none)>
>  > Author: gregkh@suse.de <gregkh@suse.de>
>  > Author: Greg Kroah-Hartman <gregkh@suse>
>  > Author: Greg Kroah-Hartman <gregkh@suse.de>
>  > Author: Greg Kroah-Hartman <greg@kroah.com>
>  >
>  > I don't see an obvious way to do this with git, but it would be neat
>  > to have a 'clean' option on git log that would take each email address
>  > (author, signed-off, acked, etc) and map it through a table which
>  > would convert old email addresses in to the current one and also
>  > standardize the formatting of the names.
>
>
> Something like .mailmap?
>
>  And to show the mapped author name instead of the committed one, you would
>  use "--pretty=format:%aN"?  (Needs 1.6.0-rc0 at least, IIRC)

So we can already do this? Where is a .mailmap for the kernel tree?

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cleaning up log messages
  2008-07-27 18:16   ` Jon Smirl
@ 2008-07-27 18:33     ` Petr Baudis
  2008-07-27 19:07       ` Jon Smirl
  0 siblings, 1 reply; 10+ messages in thread
From: Petr Baudis @ 2008-07-27 18:33 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Johannes Schindelin, Git Mailing List

On Sun, Jul 27, 2008 at 02:16:30PM -0400, Jon Smirl wrote:
> On 7/27/08, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > Something like .mailmap?
> >
> >  And to show the mapped author name instead of the committed one, you would
> >  use "--pretty=format:%aN"?  (Needs 1.6.0-rc0 at least, IIRC)
> 
> So we can already do this? Where is a .mailmap for the kernel tree?

	http://repo.or.cz/w/linux-2.6.git?a=blob;f=.mailmap

...right there. :-)

				Petr "Pasky" Baudis

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cleaning up log messages
  2008-07-27 17:50 Cleaning up log messages Jon Smirl
  2008-07-27 18:01 ` Johannes Schindelin
@ 2008-07-27 18:47 ` Junio C Hamano
  2008-07-27 20:52   ` Jon Smirl
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2008-07-27 18:47 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Git Mailing List

"Jon Smirl" <jonsmirl@gmail.com> writes:

> I was playing around with git log for the kernel and observed that
> there is a lot of noise when trying to do statistics on the number of
> commits.
>
> For example:
>
> Author: Greg K-H <gregkh@suse.de>
> Author: Greg KH <gregkh@suse.de>
> ...
> Author: Greg Kroah-Hartman <greg@kroah.com>

We have had .mailmap since a24e658 (git-shortlog: make the mailmap
configurable., 2005-10-06); maybe the kernel tree wants a maintainer for
the .mailmap file?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cleaning up log messages
  2008-07-27 18:33     ` Petr Baudis
@ 2008-07-27 19:07       ` Jon Smirl
  2008-07-27 19:20         ` Johannes Schindelin
  0 siblings, 1 reply; 10+ messages in thread
From: Jon Smirl @ 2008-07-27 19:07 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Johannes Schindelin, Git Mailing List

On 7/27/08, Petr Baudis <pasky@suse.cz> wrote:
> On Sun, Jul 27, 2008 at 02:16:30PM -0400, Jon Smirl wrote:
>  > On 7/27/08, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>
> > > Something like .mailmap?
>  > >
>  > >  And to show the mapped author name instead of the committed one, you would
>  > >  use "--pretty=format:%aN"?  (Needs 1.6.0-rc0 at least, IIRC)
>  >
>  > So we can already do this? Where is a .mailmap for the kernel tree?
>
>
>         http://repo.or.cz/w/linux-2.6.git?a=blob;f=.mailmap
>
>  ...right there. :-)

I updated to 1.6.0-rc0 and this is working. mailmap needs some
cleanup. Errors are still in the list, but this is a lot better than
it was. That made about 800 'contributors' disappear.

Is there a way to do short log and have it map the names? What about
replacing the emails with their current email address?

Random missing entries....
Greg KH
Greg Kroah-Hartman

Hans J Koch
Hans J. Koch

Jean-Christophe Dubois
Jean-Christophe DUBOIS

Miguel Boton
Miguel Botón

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cleaning up log messages
  2008-07-27 19:07       ` Jon Smirl
@ 2008-07-27 19:20         ` Johannes Schindelin
  2008-07-27 19:31           ` Jon Smirl
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Schindelin @ 2008-07-27 19:20 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Petr Baudis, Git Mailing List

Hi,

On Sun, 27 Jul 2008, Jon Smirl wrote:

> On 7/27/08, Petr Baudis <pasky@suse.cz> wrote:
> > On Sun, Jul 27, 2008 at 02:16:30PM -0400, Jon Smirl wrote:
> >  > On 7/27/08, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> >
> > > > Something like .mailmap?
> >  > >
> >  > >  And to show the mapped author name instead of the committed one, you would
> >  > >  use "--pretty=format:%aN"?  (Needs 1.6.0-rc0 at least, IIRC)
> >  >
> >  > So we can already do this? Where is a .mailmap for the kernel tree?
> >
> >         http://repo.or.cz/w/linux-2.6.git?a=blob;f=.mailmap
> >
> >  ...right there. :-)
> 
> I updated to 1.6.0-rc0 and this is working. mailmap needs some
> cleanup. Errors are still in the list, but this is a lot better than
> it was. That made about 800 'contributors' disappear.
> 
> Is there a way to do short log and have it map the names?

Yes, as of v1.6.0-rc0~58 you can pass --pretty=format: to shortlog.

> What about replacing the emails with their current email address?

Nope, that was never meant to be done.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cleaning up log messages
  2008-07-27 19:20         ` Johannes Schindelin
@ 2008-07-27 19:31           ` Jon Smirl
  2008-07-27 20:16             ` Johannes Schindelin
  0 siblings, 1 reply; 10+ messages in thread
From: Jon Smirl @ 2008-07-27 19:31 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Petr Baudis, Git Mailing List

On 7/27/08, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
>
>  On Sun, 27 Jul 2008, Jon Smirl wrote:
>
>  > On 7/27/08, Petr Baudis <pasky@suse.cz> wrote:
>  > > On Sun, Jul 27, 2008 at 02:16:30PM -0400, Jon Smirl wrote:
>  > >  > On 7/27/08, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>  > >
>  > > > > Something like .mailmap?
>  > >  > >
>  > >  > >  And to show the mapped author name instead of the committed one, you would
>  > >  > >  use "--pretty=format:%aN"?  (Needs 1.6.0-rc0 at least, IIRC)
>  > >  >
>  > >  > So we can already do this? Where is a .mailmap for the kernel tree?
>  > >
>  > >         http://repo.or.cz/w/linux-2.6.git?a=blob;f=.mailmap
>  > >
>  > >  ...right there. :-)
>  >
>  > I updated to 1.6.0-rc0 and this is working. mailmap needs some
>  > cleanup. Errors are still in the list, but this is a lot better than
>  > it was. That made about 800 'contributors' disappear.
>  >
>  > Is there a way to do short log and have it map the names?
>
>
> Yes, as of v1.6.0-rc0~58 you can pass --pretty=format: to shortlog.

How do you do it with git log? --pretty overrides the default of medium

--pretty[=<format>]

    Pretty-print the contents of the commit logs in a given format,
where <format> can be one of oneline, short, medium, full, fuller,
email, raw and format:<string>. When omitted, the format defaults to
medium.


>
>
>  > What about replacing the emails with their current email address?
>
>
> Nope, that was never meant to be done.
>
>  Ciao,
>  Dscho
>
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cleaning up log messages
  2008-07-27 19:31           ` Jon Smirl
@ 2008-07-27 20:16             ` Johannes Schindelin
  0 siblings, 0 replies; 10+ messages in thread
From: Johannes Schindelin @ 2008-07-27 20:16 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Petr Baudis, Git Mailing List

Hi,

On Sun, 27 Jul 2008, Jon Smirl wrote:

> How do you do it with git log? --pretty overrides the default of medium
> 
> --pretty[=<format>]
> 
>     Pretty-print the contents of the commit logs in a given format,
> where <format> can be one of oneline, short, medium, full, fuller,
> email, raw and format:<string>. When omitted, the format defaults to
> medium.

You get it _almost_ with

$ f='commit %H%nAuthor: %aN <%ae>%nDate:    %ad%n%n%s%n%n%b'
$ git log --pretty="format:$f"

The only difference being that the commit message is not indented.  If you 
really need that, it is easy to add.

But I rather doubt that you need it, as you want to make statistics, and 
therefore need to pipe the output into a script anyway.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Cleaning up log messages
  2008-07-27 18:47 ` Junio C Hamano
@ 2008-07-27 20:52   ` Jon Smirl
  0 siblings, 0 replies; 10+ messages in thread
From: Jon Smirl @ 2008-07-27 20:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

On 7/27/08, Junio C Hamano <gitster@pobox.com> wrote:
> "Jon Smirl" <jonsmirl@gmail.com> writes:
>
>  > I was playing around with git log for the kernel and observed that
>  > there is a lot of noise when trying to do statistics on the number of
>  > commits.
>  >
>  > For example:
>  >
>  > Author: Greg K-H <gregkh@suse.de>
>  > Author: Greg KH <gregkh@suse.de>
>  > ...
>
> > Author: Greg Kroah-Hartman <greg@kroah.com>
>
>
> We have had .mailmap since a24e658 (git-shortlog: make the mailmap
>  configurable., 2005-10-06); maybe the kernel tree wants a maintainer for
>  the .mailmap file?

This seems to be the main problem. There are so many missing entries
from the .mailmap file that I didn't think this feature was
implemented. I'd guestimate that 300-400 needed entries are missing.

I've made a few attempts at writing a script to fix the easy ones but
I don't have a good solution yet.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-07-27 20:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-27 17:50 Cleaning up log messages Jon Smirl
2008-07-27 18:01 ` Johannes Schindelin
2008-07-27 18:16   ` Jon Smirl
2008-07-27 18:33     ` Petr Baudis
2008-07-27 19:07       ` Jon Smirl
2008-07-27 19:20         ` Johannes Schindelin
2008-07-27 19:31           ` Jon Smirl
2008-07-27 20:16             ` Johannes Schindelin
2008-07-27 18:47 ` Junio C Hamano
2008-07-27 20:52   ` Jon Smirl

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).