git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* How to generate feature branch statistics?
@ 2016-07-20  8:05 Ernesto Maserati
  2016-07-20 13:14 ` Jeff King
  2016-07-20 13:56 ` Jakub Narębski
  0 siblings, 2 replies; 7+ messages in thread
From: Ernesto Maserati @ 2016-07-20  8:05 UTC (permalink / raw)
  To: git

I assume that feature branches are not frequently enough merged into
master. Because of that we discover bugs later than we could with a more
continuous code integration. I don't want to discuss here whether feature
branches are good or bad.

I want just to ask is there a way how to generate a statistic for the
average duration of feature branches until they are merged to the master? I
would like to know if it is 1 day, 2 days or lets say 8 or 17 days. Also it
would be interesting to see the statistical outliers.

I hope my motivation became clear and what kind of git repository data I
would like to produce.

Any ideas?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to generate feature branch statistics?
  2016-07-20  8:05 How to generate feature branch statistics? Ernesto Maserati
@ 2016-07-20 13:14 ` Jeff King
  2016-07-20 18:49   ` Junio C Hamano
  2016-07-20 13:56 ` Jakub Narębski
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff King @ 2016-07-20 13:14 UTC (permalink / raw)
  To: Ernesto Maserati; +Cc: git

On Wed, Jul 20, 2016 at 10:05:09AM +0200, Ernesto Maserati wrote:

> I assume that feature branches are not frequently enough merged into
> master. Because of that we discover bugs later than we could with a more
> continuous code integration. I don't want to discuss here whether feature
> branches are good or bad.
> 
> I want just to ask is there a way how to generate a statistic for the
> average duration of feature branches until they are merged to the master? I
> would like to know if it is 1 day, 2 days or lets say 8 or 17 days. Also it
> would be interesting to see the statistical outliers.

In a workflow that merges feature branches to master, you can generally
recognize them by looking for merges along the first-parent chain of
commits:

  git log --first-parent --merges master

(Depending on your workflow, some feature branches may be fast-forwards
with no merge commit, so this is just a sampling. Some workflows use
"git merge --no-ff" to merge in feature branches, so this would see all
of them).

And then for each merge, you can get the set of commits that were merged
in (it is the commits in the second parent that are not in the first).
The bottom-most one is the "start" of the branch (or close to it; of
course the author started writing code before they made a commit), and
the "end" is the merge itself.

So something like:

  git rev-list --first-parent --merges master |
  while read merge; do
	start=$(git log --format=%at $merge^1..$merge^2 | tail -1)
	end=$(git log -1 --format=%at $merge)
	subject=$(git log -1 --format=%s $merge)
	echo "$((end - start)) $subject"
  done

That should output a sequence of topic branch merges prefixed by the
number of seconds they were active. Two exercises for the reader:

  1. Converting seconds into some more useful time scale. :)

  2. This can probably be done with fewer invocations of git,
     which would be more efficient.

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to generate feature branch statistics?
  2016-07-20  8:05 How to generate feature branch statistics? Ernesto Maserati
  2016-07-20 13:14 ` Jeff King
@ 2016-07-20 13:56 ` Jakub Narębski
  2016-07-20 18:10   ` Jakub Narębski
  1 sibling, 1 reply; 7+ messages in thread
From: Jakub Narębski @ 2016-07-20 13:56 UTC (permalink / raw)
  To: Ernesto Maserati, git

W dniu 2016-07-20 o 10:05, Ernesto Maserati pisze:

> I assume that feature branches are not frequently enough merged into
> master. Because of that we discover bugs later than we could with a more
> continuous code integration. I don't want to discuss here whether feature
> branches are good or bad.
> 
> I want just to ask is there a way how to generate a statistic for the
> average duration of feature branches until they are merged to the master? I
> would like to know if it is 1 day, 2 days or lets say 8 or 17 days. Also it
> would be interesting to see the statistical outliers.
> 
> I hope my motivation became clear and what kind of git repository data I
> would like to produce.
> 
> Any ideas?

There are at least two tools to generate statistics about git repository,
namely Gitstat (https://sourceforge.net/projects/gitstat) and GitStats
(https://github.com/hoxu/gitstats), both generating repo statistics as
a web page. You can probably find more... but I don't know if any includes
the statistics you need.

I assume that you have some way of determining if the merge in 'master'
branch is a merge of a topic branch, or of long-lived graduation branch
(e.g. 'maint' or equivalent). To simplify the situation, I assume that
the only merges in master are merges of topic branches:

  git rev-list --min-parents=2 master | 
  while read merge_rev; do 

You might want to add "--grep=maint --invert-grep" or something like
that to exclude merges of 'maint' branch.
	
We can get date of merge (authordate with %ad/%at, or committerdate
with %cd/%ct), as an epoch (seconds since 1970 -- which is good for
comparing datetimes and getting the interval between two events)

     MERGE_DATE=$(git show -s --date=format:%s --pretty=%ad $merge_rev)

Assuming that topic branches are always merged using two-head merge
as a second parent (--first-parent ancestry for master in master branch
only), then we can get the first revision on a merged topic branch with

     FIRST_REV=$(git rev-list $merge_rev^2 ^$merge_rev^1 | tail -1)

We can extract the date from this revision in the same way

     FIRST_DATE=$(git show -s --pretty=%at $FIRST_REV)

Print the difference (here to standard output, you might want to write
to a file)

     echo $(expr $MERGE_DATE - $FIRST_DATE)

And finish the loop.

  done

Then pass the output to some histogramming or statistics tool... or use
a spreadsheet. Note the results are in seconds.

HTH (not checked much)
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to generate feature branch statistics?
  2016-07-20 13:56 ` Jakub Narębski
@ 2016-07-20 18:10   ` Jakub Narębski
  0 siblings, 0 replies; 7+ messages in thread
From: Jakub Narębski @ 2016-07-20 18:10 UTC (permalink / raw)
  To: Ernesto Maserati, git

W dniu 2016-07-20 o 15:56, Jakub Narębski pisze:
> W dniu 2016-07-20 o 10:05, Ernesto Maserati pisze:
> 
>> I assume that feature branches are not frequently enough merged into
>> master. Because of that we discover bugs later than we could with a more
>> continuous code integration. I don't want to discuss here whether feature
>> branches are good or bad.
>>
>> I want just to ask is there a way how to generate a statistic for the
>> average duration of feature branches until they are merged to the master? I
>> would like to know if it is 1 day, 2 days or lets say 8 or 17 days. Also it
>> would be interesting to see the statistical outliers.
>>
>> I hope my motivation became clear and what kind of git repository data I
>> would like to produce.
>>
>> Any ideas?
> 
> There are at least two tools to generate statistics about git repository,
> namely Gitstat (https://sourceforge.net/projects/gitstat) and GitStats
> (https://github.com/hoxu/gitstats), both generating repo statistics as
> a web page. You can probably find more... but I don't know if any includes
> the statistics you need.
> 
> I assume that you have some way of determining if the merge in 'master'
> branch is a merge of a topic branch, or of long-lived graduation branch
> (e.g. 'maint' or equivalent). To simplify the situation, I assume that
> the only merges in master are merges of topic branches:
> 
>   git rev-list --min-parents=2 master | 

Self correction: Here you need to use --first-parent, as in Peff answer
(which also uses less git invocations, and less of git porcelain).

I wonder if it is something that libgit2 would be helpful...
-- 
Jakub Narębski


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to generate feature branch statistics?
  2016-07-20 13:14 ` Jeff King
@ 2016-07-20 18:49   ` Junio C Hamano
  2016-07-20 23:10     ` Jakub Narębski
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2016-07-20 18:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Ernesto Maserati, git

Jeff King <peff@peff.net> writes:

> In a workflow that merges feature branches to master, you can generally
> recognize them by looking for merges along the first-parent chain of
> commits:
>
>   git log --first-parent --merges master
>
> (Depending on your workflow, some feature branches may be fast-forwards
> with no merge commit, so this is just a sampling. Some workflows use
> "git merge --no-ff" to merge in feature branches, so this would see all
> of them).
> And then for each merge, you can get the set of commits that were merged
> in (it is the commits in the second parent that are not in the first).
> The bottom-most one is the "start" of the branch (or close to it; of
> course the author started writing code before they made a commit), and
> the "end" is the merge itself.

A few things to keep in mind are

 * A feature branch may be merged to the master multiple times,
   when the feature branch is properly managed.

   E.g. It may have been once thought to be complete with 3 commits,
   get merged to 'master', then a bug is discovered and gain its
   fourth commits to fix the bug and merged to 'master' again,
   resulting in a topology like this:

         A---B---C-----------D (feature)
        /         \           \
    ---o---o---o---1---o---o---2---o (master)

   "git log --first-parent --merges master" will first find commit
   '2' that merged the feature for the second time, bringing in
   commit 'D', and then it will find commit '1' that merged the
   feature previously, bringing in commit 'A', 'B' and 'C'.

 * A feature branch that depends on other feature may have merges on
   their own.  You may start a feature X that depends on another
   features Y and Z that are not yet in 'master', in addition to
   depending on things in 'master' that have been added since Y and
   Z forked from it.  In such a case, your feature X may look like
   this:

                 .-------------------1----------2--------x---x (feature X)
                /                   /          /
       y---y---y (feature Y)       /          /
      /                           /          /
  ---o---o---o---o---o---o---o---0 (master) /
          \                                /
           z---z (feature Z)              /
                \                        /
                 .----------------------.

   where '1' and '2' are merges of feature Y and then Z into the tip
   of 'master' when you start working on feature X.

   And then feature Y and feature Z may graduate to 'master' before
   your feature X is ready to do so, resulting in something like:

                 .-------------------1----------2--------x---x (feature X)
                /                   /          /
       y---y---y (feature Y) ----  / -------  /  --.
      /                           /          /      \
  ---o---o---o---o---o---o---o---o---o---o---o---o---Y---Z (master)
          \                                /            /
           z---z (feature Z) ----------   /  ----------.
                \                        /
                 .----------------------.

   where 'Y' and 'Z' are merges of features Y and Z to 'master'.
   After that, feature X may become ready to be merged, resulting in:

                 .-------------------1----------2--------x---x (feature X)
                /                   /          /              \    
       y---y---y (feature Y) ----  / -------  /  --.           \
      /                           /          /      \           \
  ---o---o---o---o---o---o---o---o---o---o---o---o---Y---Z---o---X (master)
          \                                /            /
           z---z (feature Z) ----------   /  ----------.
                \                        /
                 .----------------------.

  When "git log --first-parent --merges master" finds X, it would
  notice that it pulled in commits '1', '2' and two 'x'.  The "tool"
  to inspect the history needs to be careful deciding if '1' and '2'
  are the part of feature X.  There are variants that make it tricky
  (e.g. 'Y' may not have yet been merged to 'master' when 'X' is
  merged, in which case you may end up pulling both 'x' and 'y' into
  'master' with a single merge), which should be avoided if feature
  branches are managed carefully, but not everybody is careful when
  managing their history.

Coming back to the introduction of the original message:

>> I assume that feature branches are not frequently enough merged into
>> master. Because of that we discover bugs later than we could with a more
>> continuous code integration. I don't want to discuss here whether feature
>> branches are good or bad.

For our own history and workflow, the duration between the inception
of a topic branch and the time it gets merged to 'master' is not all
that interesting.  More interesting numbers are:

 * The duration between the time a topic hits 'next' and the time it
   gets merged to 'master'.  This is the time the developers and
   testers are using the new feature in their own work to make sure
   it does not have any ill effect.

 * The percetage of topics that is merged to 'master' with some
   follow-up changes since it hits 'next'.  This is an approximate
   for the number of bugs that are caught by developers and testers
   before a new feature goes to the general public.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to generate feature branch statistics?
  2016-07-20 18:49   ` Junio C Hamano
@ 2016-07-20 23:10     ` Jakub Narębski
  2016-07-20 23:31       ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Narębski @ 2016-07-20 23:10 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King; +Cc: Ernesto Maserati, git

W dniu 2016-07-20 o 20:49, Junio C Hamano pisze:

> For our own history and workflow, the duration between the inception
> of a topic branch and the time it gets merged to 'master' is not all
> that interesting.

Nb. if I haven't messed something up (the git history is not simple
merging of topic branches into mainline), the shortest time from
creating a branch to merging it in git.git is 7 seconds (probably
it was a bugfix-type of a topic branch), the longest if I did it
correctly is slightly less than 4 years (???): 641830c.

-- 
Jakub Narębski


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How to generate feature branch statistics?
  2016-07-20 23:10     ` Jakub Narębski
@ 2016-07-20 23:31       ` Junio C Hamano
  0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2016-07-20 23:31 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: Jeff King, Ernesto Maserati, Git Mailing List

On Wed, Jul 20, 2016 at 4:10 PM, Jakub Narębski <jnareb@gmail.com> wrote:
> W dniu 2016-07-20 o 20:49, Junio C Hamano pisze:
>
>> For our own history and workflow, the duration between the inception
>> of a topic branch and the time it gets merged to 'master' is not all
>> that interesting.
>
> Nb. if I haven't messed something up (the git history is not simple
> merging of topic branches into mainline), the shortest time from
> creating a branch to merging it in git.git is 7 seconds (probably
> it was a bugfix-type of a topic branch), the longest if I did it
> correctly is slightly less than 4 years (???): 641830c.

The former is quite understandable. The point of having such a topic
is so that it can be merged down to older maintenance releases.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-07-20 23:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-20  8:05 How to generate feature branch statistics? Ernesto Maserati
2016-07-20 13:14 ` Jeff King
2016-07-20 18:49   ` Junio C Hamano
2016-07-20 23:10     ` Jakub Narębski
2016-07-20 23:31       ` Junio C Hamano
2016-07-20 13:56 ` Jakub Narębski
2016-07-20 18:10   ` Jakub Narębski

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).