git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git-merge-cache / StGIT - gitmergeonefile.py: merging one tree into another rather than two trees into merge base
@ 2005-09-10 18:27 Blaisorblade
  2005-09-11  8:24 ` Catalin Marinas
  0 siblings, 1 reply; 14+ messages in thread
From: Blaisorblade @ 2005-09-10 18:27 UTC (permalink / raw
  To: Catalin Marinas, Junio C Hamano, git

(Please CC me on replies as I'm not subscribed).

I experienced a (performance) problem with StGit, but I don't know if it's the 
culprit or if git-merge-cache is.

Today I was pushing my patch stack (which was against Linux 2.6.13) on top of 
the latest snapshot I have (i.e. upstream will likely have some mega of 
patches). And it was *really* slow (say it pushed 8 patches in 5 minutes).

After the analisys, it's more or less logical - it'd take more or less the 
same time to merge upstream changes against local ones. The problem is that 
it's not how things should go.

I looked with "top" at what was happening, and I saw a lot of time spent by 
either gitmergeonefile.py or git-update-cache --remove*. And looking at the 
merge algorithm, I realized it merges both HEAD and patch top into the patch 
bottom (or something like that).

I.e. if upstream rewrote everything, it will merge this rewrite into the patch 
bottom, together with the patch itself. So, the merge time will depend on the 
biggest of the two changed trees, not on the least one.

A more reasonable algorithm would be along "read-tree HEAD"; merge "stg diff 
-r /" (i.e. the current patch) in it.

I don't know if there's any real difference between cg-Xmergefile, 
gitmergeonefile.py and git-merge-one-file-script, and if that would matter in 
this case.

* About --remove, it was called on files which were present in stage1, removed 
in stage2 (i.e. upstream) and unchanged in stage3 (i.e. patch).

In the git-read-tree manpage (which is probably outdated), this is documented 
as follows:

       o  stage 1 and stage 3 are the same and stage 2 is different: 
           take stage 2 (some work has been done on stage 2)

But it didn't happen, which is strange.

Additional note: in StGIT, git-read-tree is called with stage1 which is not 
the merge base between stage2 and stage3.

It is passed the patch bottom, the current HEAD and the patch top.

For the first patch, the patch bottom is the merge base, but not otherwise; so 
differences between either stage1 or stage3, and stage2, include files 
changed in previous patches.

So, would stgit delete a file created in patch #1 when merge-applying patch 
#2?
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

	

	
		
___________________________________ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: git-merge-cache / StGIT - gitmergeonefile.py: merging one tree into another rather than two trees into merge base
  2005-09-10 18:27 git-merge-cache / StGIT - gitmergeonefile.py: merging one tree into another rather than two trees into merge base Blaisorblade
@ 2005-09-11  8:24 ` Catalin Marinas
  2005-09-12 12:59   ` Chuck Lever
                     ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Catalin Marinas @ 2005-09-11  8:24 UTC (permalink / raw
  To: Blaisorblade; +Cc: Junio C Hamano, git

On Sat, 2005-09-10 at 20:27 +0200, Blaisorblade wrote:
> I experienced a (performance) problem with StGit, but I don't know if it's the 
> culprit or if git-merge-cache is.

('s/cache/index/g' below since I still use the old names)

Maybe both. I did some profiling in StGIT but only for trivial patches
which did not involved calling gitmergeonefile.py. It was taking around
2.5s to push a patch with a changed base (involving merge) on an
NFS-mounted directory but most of the time, 1.8s, was spent in
git-read-tree to do the actual merge.

Someone else (Bcc'ed, not sure he wants his address to be public) did
some profiling on GIT and his conclusion was that most of the time was
spend in memmove() called from add_cache_entry() in read-cache.c called
from read-tree.c.

> Today I was pushing my patch stack (which was against Linux 2.6.13) on top of 
> the latest snapshot I have (i.e. upstream will likely have some mega of 
> patches). And it was *really* slow (say it pushed 8 patches in 5 minutes).

That's indeed very slow. How may files are modified in each patch? Do
you run it over NFS? Also for profiling, it is useful to run a 'stg
status' just to warm up the cache a little bit.

> I looked with "top" at what was happening, and I saw a lot of time spent by 
> either gitmergeonefile.py or git-update-cache --remove*. And looking at the 
> merge algorithm, I realized it merges both HEAD and patch top into the patch 
> bottom (or something like that).

It does a three-way merge between 'HEAD' and 'top' with 'bottom' as the
common ancestor. The new 'bottom' will become 'HEAD' and the new 'top'
will become the result of the merge.

> I.e. if upstream rewrote everything, it will merge this rewrite into the patch 
> bottom, together with the patch itself. So, the merge time will depend on the 
> biggest of the two changed trees, not on the least one.

Actually, it will three-way merge the diff between top and bottom with
the diff between HEAD and bottom. The time will depend on the size of
both diffs, not just the maximum. If the merge is not trivial, StGIT
will call git-merge-cache which calls gitmergeonefile.py. The Python
script is pretty simple and it simply calls diff3 or updates the cache
but if this happens for many files, the cumulated time might be big.
It's also worth profiling git-update-cache to check how it re-shuffles
the index file. Again, I haven't looked at the algorithm and I can't
comment more on this.

> A more reasonable algorithm would be along "read-tree HEAD"; merge "stg diff 
> -r /" (i.e. the current patch) in it.

That's not a three-way diff algorithm. I could generate the diff of the
patch and apply it with git-apply but this might fail and you wouldn't
even get reject files (not sure whether git-apply supports this).
Another problem is that it won't detect upstream merges of your patch,
you would need to check it by hand. If all you need is a way to apply a
patch using the 'patch' tool, have a look at Quilt. It has similar
commands (since StGIT is based on its ideas) but uses the diff/patch
tools and is much faster. It also generates rejects if a patch doesn't
apply cleanly but I find the diff3 information, together with the
original files left in place, much more useful.

> I don't know if there's any real difference between cg-Xmergefile, 
> gitmergeonefile.py and git-merge-one-file-script, and if that would matter in 
> this case.

There is not big difference since these are only scripts that simply
call other tools.

> In the git-read-tree manpage (which is probably outdated), this is documented 
> as follows:
> 
>        o  stage 1 and stage 3 are the same and stage 2 is different: 
>            take stage 2 (some work has been done on stage 2)
> 
> But it didn't happen, which is strange.

This means that the file is not modified by an StGIT patch but it is
modified upstream. This case works fine for me (it's a bug if it
doesn't). Did you get a conflict? Are the original files left in place?

> Additional note: in StGIT, git-read-tree is called with stage1 which is not 
> the merge base between stage2 and stage3.

stage1 is the bottom of the patch, stage2 is the HEAD (the upstream
hash) and stage3 is the top of the patch.

> For the first patch, the patch bottom is the merge base, but not otherwise; so 
> differences between either stage1 or stage3, and stage2, include files 
> changed in previous patches.

For the first patch, the bottom is the *old* base (== stage1), stage2 is
the new base and stage3 is the top of the patch.

> So, would stgit delete a file created in patch #1 when merge-applying patch 
> #2?

Definitely not.

-- 
Catalin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: git-merge-cache / StGIT - gitmergeonefile.py: merging one tree into another rather than two trees into merge base
  2005-09-11  8:24 ` Catalin Marinas
@ 2005-09-12 12:59   ` Chuck Lever
  2005-09-14 17:46     ` Blaisorblade
  2005-09-14 18:16   ` Blaisorblade
  2005-09-14 18:19   ` Blaisorblade
  2 siblings, 1 reply; 14+ messages in thread
From: Chuck Lever @ 2005-09-12 12:59 UTC (permalink / raw
  To: Catalin Marinas; +Cc: Blaisorblade, Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 1599 bytes --]

Catalin Marinas wrote:
> On Sat, 2005-09-10 at 20:27 +0200, Blaisorblade wrote:
> 
>>I experienced a (performance) problem with StGit, but I don't know if it's the 
>>culprit or if git-merge-cache is.
> 
> 
> ('s/cache/index/g' below since I still use the old names)
> 
> Maybe both. I did some profiling in StGIT but only for trivial patches
> which did not involved calling gitmergeonefile.py. It was taking around
> 2.5s to push a patch with a changed base (involving merge) on an
> NFS-mounted directory but most of the time, 1.8s, was spent in
> git-read-tree to do the actual merge.
> 
> Someone else (Bcc'ed, not sure he wants his address to be public) did

that's me!  thanks catalin.

> some profiling on GIT and his conclusion was that most of the time was
> spend in memmove() called from add_cache_entry() in read-cache.c called
> from read-tree.c.
> 
> 
>>Today I was pushing my patch stack (which was against Linux 2.6.13) on top of 
>>the latest snapshot I have (i.e. upstream will likely have some mega of 
>>patches). And it was *really* slow (say it pushed 8 patches in 5 minutes).
> 
> 
> That's indeed very slow. How may files are modified in each patch? Do
> you run it over NFS? Also for profiling, it is useful to run a 'stg
> status' just to warm up the cache a little bit.

i've probably seen similar behavior with git-update-index, but i never 
bothered to measure it.  blaisorblade, how are you profiling git?

junio asked me to port my cache abstraction helpers over daniel's 
read-tree changes, which i have done.  i want to post those changes here 
today for comments.

[-- Attachment #2: cel.vcf --]
[-- Type: text/x-vcard, Size: 439 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:cel@citi.umich.edu
title:Member of Technical Staff
tel;work:+1 734 763-4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668-1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: git-merge-cache / StGIT - gitmergeonefile.py: merging one tree into another rather than two trees into merge base
  2005-09-12 12:59   ` Chuck Lever
@ 2005-09-14 17:46     ` Blaisorblade
  0 siblings, 0 replies; 14+ messages in thread
From: Blaisorblade @ 2005-09-14 17:46 UTC (permalink / raw
  To: cel; +Cc: Catalin Marinas, Junio C Hamano, git

On Monday 12 September 2005 14:59, Chuck Lever wrote:
> Catalin Marinas wrote:
> > On Sat, 2005-09-10 at 20:27 +0200, Blaisorblade wrote:

> >>Today I was pushing my patch stack (which was against Linux 2.6.13) on
> >> top of the latest snapshot I have (i.e. upstream will likely have some
> >> mega of patches). And it was *really* slow (say it pushed 8 patches in 5
> >> minutes).

> > That's indeed very slow. How may files are modified in each patch? Do
> > you run it over NFS? Also for profiling, it is useful to run a 'stg
> > status' just to warm up the cache a little bit.

> i've probably seen similar behavior with git-update-index, but i never
> bothered to measure it.  blaisorblade, how are you profiling git?

I'm not, I just did the "testing" (actually I was simply using StGIT) I 
noticed and reported this behaviour (I even had time to run top to watch what 
was happening).

As said in the other answer, yes, I verified that it wasn't only for time 
spent getting cache-hot, and I'm not running on NFS, nor running big things 
in background.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

	

	
		
___________________________________ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: git-merge-cache / StGIT - gitmergeonefile.py: merging one tree into another rather than two trees into merge base
  2005-09-11  8:24 ` Catalin Marinas
  2005-09-12 12:59   ` Chuck Lever
@ 2005-09-14 18:16   ` Blaisorblade
  2005-09-14 18:19   ` Blaisorblade
  2 siblings, 0 replies; 14+ messages in thread
From: Blaisorblade @ 2005-09-14 18:16 UTC (permalink / raw
  To: Catalin Marinas; +Cc: Junio C Hamano, git

On Sunday 11 September 2005 10:24, Catalin Marinas wrote:
> On Sat, 2005-09-10 at 20:27 +0200, Blaisorblade wrote:
> > I experienced a (performance) problem with StGit, but I don't know if
> > it's the culprit or if git-merge-cache is.

> ('s/cache/index/g' below since I still use the old names)

> Maybe both. I did some profiling in StGIT but only for trivial patches
> which did not involved calling gitmergeonefile.py.

> It was taking around 
> 2.5s to push a patch with a changed base (involving merge) on an
> NFS-mounted directory but most of the time, 1.8s, was spent in
> git-read-tree to do the actual merge.
Yes, I even remember you mentioned that.
> > Today I was pushing my patch stack (which was against Linux 2.6.13) on
> > top of the latest snapshot I have (i.e. upstream will likely have some
> > mega of patches). And it was *really* slow (say it pushed 8 patches in 5
> > minutes).

> That's indeed very slow. How may files are modified in each patch?
1 files, in most cases - biggest one had 5 files changed (and is big 4,8K).

> Do 
> you run it over NFS?
No, local reiserfs.
> Also for profiling, it is useful to run a 'stg 
> status' just to warm up the cache a little bit.
Yes, I realized that -  but processing kept the same speed all the time (it's 
not like it spent 4:30 minutes for warming the cache).

> > I looked with "top" at what was happening, and I saw a lot of time spent
> > by either gitmergeonefile.py or git-update-cache --remove*. And looking
> > at the merge algorithm, I realized it merges both HEAD and patch top into
> > the patch bottom (or something like that).

> It does a three-way merge between 'HEAD' and 'top' with 'bottom' as the
> common ancestor. The new 'bottom' will become 'HEAD' and the new 'top'
> will become the result of the merge.

> > I.e. if upstream rewrote everything, it will merge this rewrite into the
> > patch bottom, together with the patch itself. So, the merge time will
> > depend on the biggest of the two changed trees, not on the least one.

> Actually, it will three-way merge the diff between top and bottom with
> the diff between HEAD and bottom. The time will depend on the size of
> both diffs, not just the maximum.
Yes, true, but the problem in this case was that one diff (the upstream one) 
was extremely huge. And it even involved some files removal - I almost never 
caught, with top, anything else than gitmergeonefile.py or git-update-cache 
--remove.

I guess that moves between stages don't have the big memmove() problem, while 
calling --remove shows it.

And, with master being the SHA1 I pushed onto 
(8920e8f94c44e31a73bdf923b04721e26e88cadd), 

git-diff-tree -r v2.6.13 master |grep ' D'|wc -l
gives 189.

I guess that 2.5 seconds (more or less) per each removal are enough to justify 
the time which was needed above. So fixing this should fix the behaviour I 
met.
> If the merge is not trivial, StGIT 
> will call git-merge-cache which calls gitmergeonefile.py. The Python
> script is pretty simple and it simply calls diff3 or updates the cache
> but if this happens for many files, the cumulated time might be big.
> It's also worth profiling git-update-cache to check how it re-shuffles
> the index file. Again, I haven't looked at the algorithm and I can't
> comment more on this.

> > A more reasonable algorithm would be along "read-tree HEAD"; merge "stg
> > diff -r /" (i.e. the current patch) in it.

> That's not a three-way diff algorithm. I could generate the diff of the
> patch and apply it with git-apply but this might fail and you wouldn't
> even get reject files (not sure whether git-apply supports this).
Well, true, but either diff3 or merge would work fairly well. Also, trying to 
apply the patch and resorting to something more complex wouldn't create so 
many problems.
> Another problem is that it won't detect upstream merges of your patch,
> you would need to check it by hand. If all you need is a way to apply a
> patch using the 'patch' tool, have a look at Quilt. It has similar
> commands (since StGIT is based on its ideas) but uses the diff/patch
> tools and is much faster. It also generates rejects if a patch doesn't
> apply cleanly but I find the diff3 information, together with the
> original files left in place, much more useful.

Yes, in most cases. I've had something to complain with diff3, however.

> > I don't know if there's any real difference between cg-Xmergefile,
> > gitmergeonefile.py and git-merge-one-file-script, and if that would
> > matter in this case.

> There is not big difference since these are only scripts that simply
> call other tools.

> > In the git-read-tree manpage (which is probably outdated), this is
> > documented as follows:
> >
> >        o  stage 1 and stage 3 are the same and stage 2 is different:
> >            take stage 2 (some work has been done on stage 2)
> >
> > But it didn't happen, which is strange.

> This means that the file is not modified by an StGIT patch but it is
> modified upstream. This case works fine for me (it's a bug if it
> doesn't). Did you get a conflict? Are the original files left in place?
No, it's not a correctness bug for me - just a performance bug. Why did 
gitupdateonefile.py need to call git-update-cache --remove? If git-read-tree 
had done his duty, this wouldn't be needed.

> > Additional note: in StGIT, git-read-tree is called with stage1 which is
> > not the merge base between stage2 and stage3.

> stage1 is the bottom of the patch, stage2 is the HEAD (the upstream
> hash) and stage3 is the top of the patch.
Yes, saw that in the code.
> > For the first patch, the patch bottom is the merge base, but not
> > otherwise; so differences between either stage1 or stage3, and stage2,
> > include files changed in previous patches.

> For the first patch, the bottom is the *old* base (== stage1), stage2 is
> the new base and stage3 is the top of the patch.
Yes, but the *old* base happens to be, in this case (i.e. upgrading the stack 
base from upstream), also the merge base.
> > So, would stgit delete a file created in patch #1 when merge-applying
> > patch #2?

> Definitely not.
Gonna test - but if you look at the manpage of read-tree, this would probably 
be the expected behaviour.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade


		
___________________________________ 
Yahoo! Messenger: chiamate gratuite in tutto il mondo 
http://it.messenger.yahoo.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: git-merge-cache / StGIT - gitmergeonefile.py: merging one tree into another rather than two trees into merge base
  2005-09-11  8:24 ` Catalin Marinas
  2005-09-12 12:59   ` Chuck Lever
  2005-09-14 18:16   ` Blaisorblade
@ 2005-09-14 18:19   ` Blaisorblade
  2005-09-15 10:06     ` Catalin Marinas
  2 siblings, 1 reply; 14+ messages in thread
From: Blaisorblade @ 2005-09-14 18:19 UTC (permalink / raw
  To: Catalin Marinas; +Cc: Junio C Hamano, git

On Sunday 11 September 2005 10:24, Catalin Marinas wrote:
> On Sat, 2005-09-10 at 20:27 +0200, Blaisorblade wrote:
> > I experienced a (performance) problem with StGit, but I don't know if
> > it's the culprit or if git-merge-cache is.

> ('s/cache/index/g' below since I still use the old names)

> Maybe both. I did some profiling in StGIT but only for trivial patches
> which did not involved calling gitmergeonefile.py.

> It was taking around 
> 2.5s to push a patch with a changed base (involving merge) on an
> NFS-mounted directory but most of the time, 1.8s, was spent in
> git-read-tree to do the actual merge.
Yes, I even remember you mentioned that.
> > Today I was pushing my patch stack (which was against Linux 2.6.13) on
> > top of the latest snapshot I have (i.e. upstream will likely have some
> > mega of patches). And it was *really* slow (say it pushed 8 patches in 5
> > minutes).

> That's indeed very slow. How may files are modified in each patch?
1 files, in most cases - biggest one had 5 files changed (and is big 4,8K).

> Do 
> you run it over NFS?
No, local reiserfs.
> Also for profiling, it is useful to run a 'stg 
> status' just to warm up the cache a little bit.
Yes, I realized that -  but processing kept the same speed all the time (it's 
not like it spent 4:30 minutes for warming the cache).

> > I looked with "top" at what was happening, and I saw a lot of time spent
> > by either gitmergeonefile.py or git-update-cache --remove*. And looking
> > at the merge algorithm, I realized it merges both HEAD and patch top into
> > the patch bottom (or something like that).

> It does a three-way merge between 'HEAD' and 'top' with 'bottom' as the
> common ancestor. The new 'bottom' will become 'HEAD' and the new 'top'
> will become the result of the merge.

> > I.e. if upstream rewrote everything, it will merge this rewrite into the
> > patch bottom, together with the patch itself. So, the merge time will
> > depend on the biggest of the two changed trees, not on the least one.

> Actually, it will three-way merge the diff between top and bottom with
> the diff between HEAD and bottom. The time will depend on the size of
> both diffs, not just the maximum.
Yes, true, but the problem in this case was that one diff (the upstream one) 
was extremely huge. And it even involved some files removal - I almost never 
caught, with top, anything else than gitmergeonefile.py or git-update-cache 
--remove.

I guess that moves between stages don't have the big memmove() problem, while 
calling --remove shows it.

And, with master being the SHA1 I pushed onto 
(8920e8f94c44e31a73bdf923b04721e26e88cadd), 

git-diff-tree -r v2.6.13 master |grep ' D'|wc -l

gives 189. I guess that 2.5 seconds per each removal are enough to justify the 
time which was needed above. Actually, it even had to be less than a second 
per removal, otherwise it'd have been even slower.
> If the merge is not trivial, StGIT 
> will call git-merge-cache which calls gitmergeonefile.py. The Python
> script is pretty simple and it simply calls diff3 or updates the cache
> but if this happens for many files, the cumulated time might be big.
> It's also worth profiling git-update-cache to check how it re-shuffles
> the index file. Again, I haven't looked at the algorithm and I can't
> comment more on this.

> > A more reasonable algorithm would be along "read-tree HEAD"; merge "stg
> > diff -r /" (i.e. the current patch) in it.

> That's not a three-way diff algorithm. I could generate the diff of the
> patch and apply it with git-apply but this might fail and you wouldn't
> even get reject files (not sure whether git-apply supports this).
Well, true, but either diff3 or merge would work fairly well. Also, trying to 
apply the patch and resorting to something more complex wouldn't create so 
many problems.
> Another problem is that it won't detect upstream merges of your patch,
> you would need to check it by hand. If all you need is a way to apply a
> patch using the 'patch' tool, have a look at Quilt. It has similar
> commands (since StGIT is based on its ideas) but uses the diff/patch
> tools and is much faster. It also generates rejects if a patch doesn't
> apply cleanly but I find the diff3 information, together with the
> original files left in place, much more useful.

Yes, in most cases. I've had something to complain with diff3, however (that's 
a separate story). Maybe merge would be even better?

> > I don't know if there's any real difference between cg-Xmergefile,
> > gitmergeonefile.py and git-merge-one-file-script, and if that would
> > matter in this case.

> There is not big difference since these are only scripts that simply
> call other tools.

> > In the git-read-tree manpage (which is probably outdated), this is
> > documented as follows:
> >
> >        o  stage 1 and stage 3 are the same and stage 2 is different:
> >            take stage 2 (some work has been done on stage 2)
> >
> > But it didn't happen, which is strange.

> This means that the file is not modified by an StGIT patch but it is
> modified upstream. This case works fine for me (it's a bug if it
> doesn't). Did you get a conflict? Are the original files left in place?
No, it's not a correctness bug for me - just a performance bug. Why did 
gitupdateonefile.py need to call git-update-cache --remove? If git-read-tree 
had done his duty, this wouldn't be needed.

> > Additional note: in StGIT, git-read-tree is called with stage1 which is
> > not the merge base between stage2 and stage3.

> stage1 is the bottom of the patch, stage2 is the HEAD (the upstream
> hash) and stage3 is the top of the patch.
Yes, saw that in the code.
> > For the first patch, the patch bottom is the merge base, but not
> > otherwise; so differences between either stage1 or stage3, and stage2,
> > include files changed in previous patches.

> For the first patch, the bottom is the *old* base (== stage1), stage2 is
> the new base and stage3 is the top of the patch.
Yes, but the *old* base happens to be, in this case (i.e. upgrading the stack 
base from upstream), also the merge base.
> > So, would stgit delete a file created in patch #1 when merge-applying
> > patch #2?

> Definitely not.
Gonna test - but if you look at the manpage of read-tree, this would probably 
be the expected behaviour.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

	

	
		
___________________________________ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: git-merge-cache / StGIT - gitmergeonefile.py: merging one tree into another rather than two trees into merge base
  2005-09-14 18:19   ` Blaisorblade
@ 2005-09-15 10:06     ` Catalin Marinas
  2005-09-16 18:45       ` Blaisorblade
  2005-09-16 19:59       ` Junio C Hamano
  0 siblings, 2 replies; 14+ messages in thread
From: Catalin Marinas @ 2005-09-15 10:06 UTC (permalink / raw
  To: Blaisorblade; +Cc: Junio C Hamano, git

Blaisorblade <blaisorblade@yahoo.it> wrote:
> On Sunday 11 September 2005 10:24, Catalin Marinas wrote:
>> That's indeed very slow. How may files are modified in each patch?
>
> 1 files, in most cases - biggest one had 5 files changed (and is big
> 4,8K).

OK, this doesn't really matters since the diff between the new HEAD
and the old bottom of the patch is big.

>> Actually, it will three-way merge the diff between top and bottom with
>> the diff between HEAD and bottom. The time will depend on the size of
>> both diffs, not just the maximum.
>
> Yes, true, but the problem in this case was that one diff (the
> upstream one) was extremely huge. And it even involved some files
> removal - I almost never caught, with top, anything else than
> gitmergeonefile.py or git-update-cache --remove.

git-read-tree -m doesn't handle the case when a file is removed from
one branch and unmodified in the other, which is what happens in your
test. For each of these removed files, git-merge-cache will call
gitmergeonefile.py which calls 'git-update-cache --remove'.

An improvement would be to make git-read-tree smarter and only leave
out the three-way merges and the conflicts. Otherwise, implementing my
own git-merge-cache which includes gitmergeonefile.py or implementing
gitmergeonefile in C would be an improvement but I'm not sure it is
that significant.

> I guess that moves between stages don't have the big memmove()
> problem, while calling --remove shows it.
>
> And, with master being the SHA1 I pushed onto 
> (8920e8f94c44e31a73bdf923b04721e26e88cadd), 
>
> git-diff-tree -r v2.6.13 master |grep ' D'|wc -l
>
> gives 189. I guess that 2.5 seconds per each removal are enough to
> justify the time which was needed above. Actually, it even had to be
> less than a second per removal, otherwise it'd have been even
> slower.

The 2.5 figure was for trivial merges and most of the time was spent
in git-read-tree.

>> That's not a three-way diff algorithm. I could generate the diff of the
>> patch and apply it with git-apply but this might fail and you wouldn't
>> even get reject files (not sure whether git-apply supports this).
>
> Well, true, but either diff3 or merge would work fairly well. Also,
> trying to apply the patch and resorting to something more complex
> wouldn't create so many problems.

But this would slow things down for people pulling changes on a daily
(or even more often) basis. It could be added as an option to 'stg
push', i.e. when you expect something like this, just pass a
'--nothreeway' option.

>> Another problem is that it won't detect upstream merges of your patch,
>> you would need to check it by hand. If all you need is a way to apply a
>> patch using the 'patch' tool, have a look at Quilt. It has similar
>> commands (since StGIT is based on its ideas) but uses the diff/patch
>> tools and is much faster. It also generates rejects if a patch doesn't
>> apply cleanly but I find the diff3 information, together with the
>> original files left in place, much more useful.
>
> Yes, in most cases. I've had something to complain with diff3,
> however (that's a separate story). Maybe merge would be even better?

I couldn't find the difference between merge and diff3 -E. Is there
any?

>> > In the git-read-tree manpage (which is probably outdated), this is
>> > documented as follows:
>> >
>> >        o  stage 1 and stage 3 are the same and stage 2 is different:
>> >            take stage 2 (some work has been done on stage 2)
>> >
>> > But it didn't happen, which is strange.
>
>> This means that the file is not modified by an StGIT patch but it is
>> modified upstream. This case works fine for me (it's a bug if it
>> doesn't). Did you get a conflict? Are the original files left in place?
>
> No, it's not a correctness bug for me - just a performance bug. Why did 
> gitupdateonefile.py need to call git-update-cache --remove? If git-read-tree 
> had done his duty, this wouldn't be needed.

Do you mean it wasn't modified by the StGIT patch (and hence stage 3
is the same as stage 1) but it was removed in stage 2. This case is
*not* handled by git-read-tree, though I think it should deal with
this trivial case as well, it's not that different from the case
below.

If stage 2 is a simple SHA1 modification (not file removal) the case
was already handled by git-read-tree and there should only be a stage
0 entry for this file. git-merge-cache should *not* call
gitmergeonefile.py.

You can run git-read-tree manually and check the stages for this case.

>> > So, would stgit delete a file created in patch #1 when merge-applying
>> > patch #2?
>
>> Definitely not.
>
> Gonna test - but if you look at the manpage of read-tree, this would
> probably be the expected behaviour.

If patch #2 didn't explicitely removed this file and if patch #1 is
applied, it won't delete it (unless there is a bug).

-- 
Catalin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: git-merge-cache / StGIT - gitmergeonefile.py: merging one tree into another rather than two trees into merge base
  2005-09-15 10:06     ` Catalin Marinas
@ 2005-09-16 18:45       ` Blaisorblade
  2005-09-16 19:59       ` Junio C Hamano
  1 sibling, 0 replies; 14+ messages in thread
From: Blaisorblade @ 2005-09-16 18:45 UTC (permalink / raw
  To: Catalin Marinas; +Cc: git

On Thursday 15 September 2005 12:06, Catalin Marinas wrote:
> Blaisorblade <blaisorblade@yahoo.it> wrote:
> > On Sunday 11 September 2005 10:24, Catalin Marinas wrote:
> >> That's indeed very slow. How may files are modified in each patch?
> >
> > 1 files, in most cases - biggest one had 5 files changed (and is big
> > 4,8K).

> OK, this doesn't really matters since the diff between the new HEAD
> and the old bottom of the patch is big.
In fact.
> >> Actually, it will three-way merge the diff between top and bottom with
> >> the diff between HEAD and bottom. The time will depend on the size of
> >> both diffs, not just the maximum.

> git-read-tree -m doesn't handle the case when a file is removed from
> one branch and unmodified in the other, which is what happens in your
> test. For each of these removed files, git-merge-cache will call
> gitmergeonefile.py which calls 'git-update-cache --remove'.

> An improvement would be to make git-read-tree smarter and only leave
> out the three-way merges and the conflicts. Otherwise, implementing my
> own git-merge-cache which includes gitmergeonefile.py or implementing
> gitmergeonefile in C would be an improvement but I'm not sure it is
> that significant.
gitmergeonefile in C wouldn't matter that much (I hope) - you'd rather move 
the code to a module to have it byte-compiled would be enough, if this 
matters.

What could make more sense is to make sure that the file removals are done in 
one step, which can be accomplished only if the Porcelain merge script is 
called at once with a file list on stdin.

But implementing the behaviour we discuss below (the one I quote from man page 
but which is unimplemented) would fix this particular problem. So yes, 
implementing this would speed things up.
> > I guess that moves between stages don't have the big memmove()
> > problem, while calling --remove shows it.
> >
> > And, with master being the SHA1 I pushed onto
> > (8920e8f94c44e31a73bdf923b04721e26e88cadd),
> >
> > git-diff-tree -r v2.6.13 master |grep ' D'|wc -l

> > gives 189. I guess that 2.5 seconds per each removal are enough to
> > justify the time which was needed above. Actually, it even had to be
> > less than a second per removal, otherwise it'd have been even
> > slower.

> The 2.5 figure was for trivial merges and most of the time was spent
> in git-read-tree.
Yes, but I think it's a reasonable figure here too (we'll have to do memmove() 
anyway, if things are reasonable - never looked at the code, though).
> >> That's not a three-way diff algorithm. I could generate the diff of the
> >> patch and apply it with git-apply but this might fail and you wouldn't
> >> even get reject files (not sure whether git-apply supports this).

> > Well, true, but either diff3 or merge would work fairly well. Also,
> > trying to apply the patch and resorting to something more complex
> > wouldn't create so many problems.

> But this would slow things down for people pulling changes on a daily
> (or even more often) basis. It could be added as an option to 'stg
> push', i.e. when you expect something like this, just pass a
> '--nothreeway' option.
Using "merge" by hand *is* a three-way merge, indeed (done with older tools, I 
agree).
> >> Another problem is that it won't detect upstream merges of your patch,
> >> you would need to check it by hand. If all you need is a way to apply a
> >> patch using the 'patch' tool, have a look at Quilt. It has similar
> >> commands (since StGIT is based on its ideas) but uses the diff/patch
> >> tools and is much faster. It also generates rejects if a patch doesn't
> >> apply cleanly but I find the diff3 information, together with the
> >> original files left in place, much more useful.

> > Yes, in most cases. I've had something to complain with diff3,
> > however (that's a separate story). Maybe merge would be even better?

> I couldn't find the difference between merge and diff3 -E. Is there
> any?
I didn't see any difference in my (limited) use. So I switched to wiggle, 
which is smart enough for my needs.

> >> > In the git-read-tree manpage (which is probably outdated), this is
> >> > documented as follows:
> >> >
> >> >        o  stage 1 and stage 3 are the same and stage 2 is different:
> >> >            take stage 2 (some work has been done on stage 2)
> >> >
> >> > But it didn't happen, which is strange.
> >>
> >> This means that the file is not modified by an StGIT patch but it is
> >> modified upstream. This case works fine for me (it's a bug if it
> >> doesn't). Did you get a conflict? Are the original files left in place?
> >
> > No, it's not a correctness bug for me - just a performance bug. Why did
> > gitupdateonefile.py need to call git-update-cache --remove? If
> > git-read-tree had done his duty, this wouldn't be needed.

> Do you mean it wasn't modified by the StGIT patch (and hence stage 3
> is the same as stage 1) but it was removed in stage 2. This case is
> *not* handled by git-read-tree, though I think it should deal with
> this trivial case as well, it's not that different from the case
> below.
Ok, manpage is out-of-date, as I guessed. The above excerpt says yes, you say 
no, I verified no, please fix man page.
> If stage 2 is a simple SHA1 modification (not file removal) the case
> was already handled by git-read-tree and there should only be a stage
> 0 entry for this file. git-merge-cache should *not* call
> gitmergeonefile.py.

> >> > So, would stgit delete a file created in patch #1 when merge-applying
> >> > patch #2?
> >>
> >> Definitely not.

> > Gonna test - but if you look at the manpage of read-tree, this would
> > probably be the expected behaviour.

> If patch #2 didn't explicitely removed this file and if patch #1 is
> applied, it won't delete it (unless there is a bug).
Yes, sorry, here you're right - I wasn't careful enough. It's true that the 
file is only in stage 2 (current HEAD) but not in stage1 (bottom) or stage3 
(top), but that is assumed as "work done in the HEAD" so it's kept.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade


	

	
		
___________________________________ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: git-merge-cache / StGIT - gitmergeonefile.py: merging one tree into another rather than two trees into merge base
  2005-09-15 10:06     ` Catalin Marinas
  2005-09-16 18:45       ` Blaisorblade
@ 2005-09-16 19:59       ` Junio C Hamano
  2005-09-17  9:31         ` Catalin Marinas
  1 sibling, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2005-09-16 19:59 UTC (permalink / raw
  To: Catalin Marinas; +Cc: git

Catalin Marinas <catalin.marinas@gmail.com> writes:

> git-read-tree -m doesn't handle the case when a file is removed from
> one branch and unmodified in the other, which is what happens in your
> test. For each of these removed files, git-merge-cache will call
> gitmergeonefile.py which calls 'git-update-cache --remove'.
>
> An improvement would be to make git-read-tree smarter...

I think this was once discussed but the primary reason for the
behaviour is that Linus wanted to leave as much merge policy
decision to be scriptable without hardcoding it in read-tree.

For example, you would see branch A removing a path while branch
B keeping the path if A renamed it to somewhere else while B
kept it -- a cleverer merge policy than one-path-at-a-time we
currently have _could_ take a hint from the unmerged cache, go
look for the other half of the rename in branch A, and ask the
user if the rename should be honored in the merge result for
example.  If you always do "added or removed in only one head
means take that head", most likely in the above example B would
not have a new path (rename target of A) so you would end up
always taking the file rename.

>> git-diff-tree -r v2.6.13 master |grep ' D'|wc -l
>>
>> gives 189.

A completely different topic:  would it please people if I did this?

    $ git-diff-tree -r --names-and-changes A B
    M	frotz
    R93	rezrov	nitfol
    D	yomin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: git-merge-cache / StGIT - gitmergeonefile.py: merging one tree into another rather than two trees into merge base
  2005-09-16 19:59       ` Junio C Hamano
@ 2005-09-17  9:31         ` Catalin Marinas
  2005-09-19 15:50           ` omitted test scripts? Chuck Lever
  0 siblings, 1 reply; 14+ messages in thread
From: Catalin Marinas @ 2005-09-17  9:31 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

On Fri, 2005-09-16 at 12:59 -0700, Junio C Hamano wrote:
> Catalin Marinas <catalin.marinas@gmail.com> writes:
> 
> > git-read-tree -m doesn't handle the case when a file is removed from
> > one branch and unmodified in the other, which is what happens in your
> > test. For each of these removed files, git-merge-cache will call
> > gitmergeonefile.py which calls 'git-update-cache --remove'.
> >
> > An improvement would be to make git-read-tree smarter...
> 
> I think this was once discussed but the primary reason for the
> behaviour is that Linus wanted to leave as much merge policy
> decision to be scriptable without hardcoding it in read-tree.

This could be OK for git-read-tree but maybe git-merge-index could have
a --smart option to deal with trivial cases like below (or a separate
tool). Whoever wants to write a more interactive script would not pass
this option.

> If you always do "added or removed in only one head
> means take that head", most likely in the above example B would
> not have a new path (rename target of A) so you would end up
> always taking the file rename.

That's what happens with the current merge scripts.

Another option for me would be to try to interface Python with libgit.a
and emulate the git-merge-index behaviour but I've never tried mixing
Python and C before.

> A completely different topic:  would it please people if I did this?
> 
>     $ git-diff-tree -r --names-and-changes A B
>     M	frotz
>     R93	rezrov	nitfol
>     D	yomin

It might help for people using git directly (me as well). StGIT parses
the git-diff-tree output anyway.

-- 
Catalin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* omitted test scripts?
  2005-09-17  9:31         ` Catalin Marinas
@ 2005-09-19 15:50           ` Chuck Lever
  2005-09-19 16:19             ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: Chuck Lever @ 2005-09-19 15:50 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1103 bytes --]

Catalin Marinas wrote:
> On Fri, 2005-09-16 at 12:59 -0700, Junio C Hamano wrote:
> 
>>Catalin Marinas <catalin.marinas@gmail.com> writes:
>>
>>
>>>git-read-tree -m doesn't handle the case when a file is removed from
>>>one branch and unmodified in the other, which is what happens in your
>>>test. For each of these removed files, git-merge-cache will call
>>>gitmergeonefile.py which calls 'git-update-cache --remove'.
>>>
>>>An improvement would be to make git-read-tree smarter...
>>
>>I think this was once discussed but the primary reason for the
>>behaviour is that Linus wanted to leave as much merge policy
>>decision to be scriptable without hardcoding it in read-tree.
> 
> 
> This could be OK for git-read-tree but maybe git-merge-index could have
> a --smart option to deal with trivial cases like below (or a separate
> tool). Whoever wants to write a more interactive script would not pass
> this option.

i noticed while testing the new cache API that there are no tests under 
t/ for git-merge-index.

there is also no test script that covers the prune_cache() function in 
ls-files.c.

[-- Attachment #2: cel.vcf --]
[-- Type: text/x-vcard, Size: 439 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:cel@citi.umich.edu
title:Member of Technical Staff
tel;work:+1 734 763 4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668 1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: omitted test scripts?
  2005-09-19 15:50           ` omitted test scripts? Chuck Lever
@ 2005-09-19 16:19             ` Junio C Hamano
  2005-09-19 17:01               ` Daniel Barkalow
  2005-09-19 18:54               ` Matthias Urlichs
  0 siblings, 2 replies; 14+ messages in thread
From: Junio C Hamano @ 2005-09-19 16:19 UTC (permalink / raw
  To: cel; +Cc: git

Chuck Lever <cel@citi.umich.edu> writes:

> i noticed while testing the new cache API that there are no tests under 
> t/ for git-merge-index.

In the past, when I did any major butchering of existing code or
made nontrivial additions, I wrote test scripts to cover what
should happen (and what to be preserved) first to make sure the
changes or additions would not introduce regression.  I think we
have been reasonably successful (Daniel could fill us in with
experiences with read-tree tests).

I have been expecting people to follow suit, without explicitly
asking them to do, so when I took over the project.  Further I
got sloppy when accepting any sizable patches.  Sorry about
that.

> there is also no test script that covers the prune_cache() function in 
> ls-files.c.

That's one of the item on the TODO list and patches to extend
test coverage are always welcomed.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: omitted test scripts?
  2005-09-19 16:19             ` Junio C Hamano
@ 2005-09-19 17:01               ` Daniel Barkalow
  2005-09-19 18:54               ` Matthias Urlichs
  1 sibling, 0 replies; 14+ messages in thread
From: Daniel Barkalow @ 2005-09-19 17:01 UTC (permalink / raw
  To: Junio C Hamano; +Cc: cel, git

On Mon, 19 Sep 2005, Junio C Hamano wrote:

> Chuck Lever <cel@citi.umich.edu> writes:
> 
> > i noticed while testing the new cache API that there are no tests under 
> > t/ for git-merge-index.
> 
> In the past, when I did any major butchering of existing code or
> made nontrivial additions, I wrote test scripts to cover what
> should happen (and what to be preserved) first to make sure the
> changes or additions would not introduce regression.  I think we
> have been reasonably successful (Daniel could fill us in with
> experiences with read-tree tests).

I think the only thing missing from the read-tree tests was for refusing 
to create directory/file conflicts which would otherwise be permitted when 
not using the --emu23 flag. Other than that, everything I neglected to 
handle caused something to fail. That reminds me that I still owe you a 
replacement test for this case.

> I have been expecting people to follow suit, without explicitly
> asking them to do, so when I took over the project.  Further I
> got sloppy when accepting any sizable patches.  Sorry about
> that.

I think I also owe you some tests for fetch. Any suggestions on writing 
unit tests? (I'd like to test that fetch.c actually calls functions in the 
right order and such, even in complicated cases)

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: omitted test scripts?
  2005-09-19 16:19             ` Junio C Hamano
  2005-09-19 17:01               ` Daniel Barkalow
@ 2005-09-19 18:54               ` Matthias Urlichs
  1 sibling, 0 replies; 14+ messages in thread
From: Matthias Urlichs @ 2005-09-19 18:54 UTC (permalink / raw
  To: git

Hi, Junio C Hamano wrote:

> In the past, when I did any major butchering of existing code or
> made nontrivial additions, I wrote test scripts to cover what
> should happen (and what to be preserved) first to make sure the
> changes or additions would not introduce regression.

... which is very good programming practice regardless.

One of the better tenets of Extreme Programming says that any feature
for which you don't have a test basically doesn't exist. 

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
 - -
Buddha most emphatically insists that what he teaches is nothing
unusual, being simply the recognition of a plain fact which can be
attested to by every mortal...

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2005-09-19 18:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-10 18:27 git-merge-cache / StGIT - gitmergeonefile.py: merging one tree into another rather than two trees into merge base Blaisorblade
2005-09-11  8:24 ` Catalin Marinas
2005-09-12 12:59   ` Chuck Lever
2005-09-14 17:46     ` Blaisorblade
2005-09-14 18:16   ` Blaisorblade
2005-09-14 18:19   ` Blaisorblade
2005-09-15 10:06     ` Catalin Marinas
2005-09-16 18:45       ` Blaisorblade
2005-09-16 19:59       ` Junio C Hamano
2005-09-17  9:31         ` Catalin Marinas
2005-09-19 15:50           ` omitted test scripts? Chuck Lever
2005-09-19 16:19             ` Junio C Hamano
2005-09-19 17:01               ` Daniel Barkalow
2005-09-19 18:54               ` Matthias Urlichs

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).