* Adding a new file as if it had existed
@ 2006-12-12 10:05 Bahadir Balban
2006-12-12 10:13 ` Junio C Hamano
2006-12-12 12:36 ` Jakub Narebski
0 siblings, 2 replies; 11+ messages in thread
From: Bahadir Balban @ 2006-12-12 10:05 UTC (permalink / raw)
To: git
Hi,
When I initialise a git repository, I use a subset of files in the
project and leave out irrelevant files for performance reasons. Then
when I need to make changes to a file not yet in the repository, the
file is treated as new, and if I reset the change or change branches
the file is gone.
Is there a good way of adding new files to git as if they had existed
from the initial commit (or even better, since a particular commit)?
This way I would only track the new changes I made to an existing
file.
Thanks,
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adding a new file as if it had existed
2006-12-12 10:05 Adding a new file as if it had existed Bahadir Balban
@ 2006-12-12 10:13 ` Junio C Hamano
2006-12-12 11:32 ` Bahadir Balban
2006-12-12 12:36 ` Jakub Narebski
1 sibling, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2006-12-12 10:13 UTC (permalink / raw)
To: Bahadir Balban; +Cc: git
"Bahadir Balban" <bahadir.balban@gmail.com> writes:
> Is there a good way of adding new files to git as if they had existed
> from the initial commit (or even better, since a particular commit)?
> This way I would only track the new changes I made to an existing
> file.
No.
I do not understand why not adding all the files you care about
eventually anyway in the initial commit is needed for
"performance reasons", if you do not touch majority of them for
a long time. Care to explain?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adding a new file as if it had existed
2006-12-12 10:13 ` Junio C Hamano
@ 2006-12-12 11:32 ` Bahadir Balban
2006-12-12 12:07 ` Johannes Schindelin
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Bahadir Balban @ 2006-12-12 11:32 UTC (permalink / raw)
To: git
On 12/12/06, Junio C Hamano <junkio@cox.net> wrote:
> No.
>
> I do not understand why not adding all the files you care about
> eventually anyway in the initial commit is needed for
> "performance reasons", if you do not touch majority of them for
> a long time. Care to explain?
If I don't know which files I may be touching in the future for
implementing some feature, then I am obliged to add all the files even
if they are irrelevant. I said "performance reasons" assuming all the
file hashes need checked for every commit -a to see if they're
changed, but I just tried on a PIII and it seems not so slow.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adding a new file as if it had existed
2006-12-12 11:32 ` Bahadir Balban
@ 2006-12-12 12:07 ` Johannes Schindelin
2006-12-12 12:26 ` Andy Parkins
2006-12-12 18:31 ` Junio C Hamano
2 siblings, 0 replies; 11+ messages in thread
From: Johannes Schindelin @ 2006-12-12 12:07 UTC (permalink / raw)
To: Bahadir Balban; +Cc: git
Hi,
On Tue, 12 Dec 2006, Bahadir Balban wrote:
> On 12/12/06, Junio C Hamano <junkio@cox.net> wrote:
> > No.
> >
> > I do not understand why not adding all the files you care about
> > eventually anyway in the initial commit is needed for
> > "performance reasons", if you do not touch majority of them for
> > a long time. Care to explain?
>
> If I don't know which files I may be touching in the future for
> implementing some feature,
When I use an SCM, it is to track the revisions of a project. It seems you
are content to have only parts of a revision? That does not make sense to
me.
> I said "performance reasons" assuming all the file hashes need checked
> for every commit -a to see if they're changed, but I just tried on a
> PIII and it seems not so slow.
Bingo!
You just felt the consequences of the "index".
Ciao,
Dscho
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adding a new file as if it had existed
2006-12-12 11:32 ` Bahadir Balban
2006-12-12 12:07 ` Johannes Schindelin
@ 2006-12-12 12:26 ` Andy Parkins
2006-12-12 13:20 ` Andreas Ericsson
2006-12-12 18:31 ` Junio C Hamano
2 siblings, 1 reply; 11+ messages in thread
From: Andy Parkins @ 2006-12-12 12:26 UTC (permalink / raw)
To: git
On Tuesday 2006 December 12 11:32, Bahadir Balban wrote:
> If I don't know which files I may be touching in the future for
> implementing some feature, then I am obliged to add all the files even
> if they are irrelevant. I said "performance reasons" assuming all the
> file hashes need checked for every commit -a to see if they're
> changed, but I just tried on a PIII and it seems not so slow.
Here's a handy rule of thumb I've learned in my use of git:
"git is fast. Really fast."
That'll hold you in good stead. In my experience there is no operation in git
that is slow. I've got some trees that are for embedded work and hold the
whole linux kernel, often more than once. Subversion, which I used
previously, took literally hours to import the whole tree. Git takes
minutes.
As to your direct concern: git doesn't hash every file at every commit. There
is no need. git has an "index" that is used to prepare a commit; at the time
you do the actual commit, git already knows which files are being checked in.
Obviously, Linus uses git for managing the linux kernel, he's said before
that he wanted a version control system that can do multiple commits /per
second/. git can do that.
In short - don't worry about making life easy for git - it's a workhorse and
does a grand job.
Andy
--
Dr Andy Parkins, M Eng (hons), MIEE
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adding a new file as if it had existed
2006-12-12 10:05 Adding a new file as if it had existed Bahadir Balban
2006-12-12 10:13 ` Junio C Hamano
@ 2006-12-12 12:36 ` Jakub Narebski
1 sibling, 0 replies; 11+ messages in thread
From: Jakub Narebski @ 2006-12-12 12:36 UTC (permalink / raw)
To: git
Bahadir Balban wrote:
> When I initialise a git repository, I use a subset of files in the
> project and leave out irrelevant files for performance reasons. Then
> when I need to make changes to a file not yet in the repository, the
> file is treated as new, and if I reset the change or change branches
> the file is gone.
>
> Is there a good way of adding new files to git as if they had existed
> from the initial commit (or even better, since a particular commit)?
> This way I would only track the new changes I made to an existing
> file.
Generally, it is not possible without rewriting history. In git (in any
sane SCM) commits are atomic; there is no CVS-like bunch of per-file
histories. You can use cg-admin-rewritehist from Cogito (alternate UI
for git)... but as it was said somewhere else git is fast. And the rule
of thumb: check first, then optimize.
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adding a new file as if it had existed
2006-12-12 12:26 ` Andy Parkins
@ 2006-12-12 13:20 ` Andreas Ericsson
0 siblings, 0 replies; 11+ messages in thread
From: Andreas Ericsson @ 2006-12-12 13:20 UTC (permalink / raw)
To: Andy Parkins; +Cc: git
Andy Parkins wrote:
> On Tuesday 2006 December 12 11:32, Bahadir Balban wrote:
>
>> If I don't know which files I may be touching in the future for
>> implementing some feature, then I am obliged to add all the files even
>> if they are irrelevant. I said "performance reasons" assuming all the
>> file hashes need checked for every commit -a to see if they're
>> changed, but I just tried on a PIII and it seems not so slow.
>
> Here's a handy rule of thumb I've learned in my use of git:
>
> "git is fast. Really fast."
>
Almost alarmingly so. When I started using git (back in May/June last
year, when git was 2 - 3 months old), I was worried at first because it
didn't seem to actually *do* anything, but just returned me to the
prompt immediately.
>
> As to your direct concern: git doesn't hash every file at every commit. There
> is no need. git has an "index" that is used to prepare a commit; at the time
> you do the actual commit, git already knows which files are being checked in.
>
> In short - don't worry about making life easy for git - it's a workhorse and
> does a grand job.
>
Yup. Now I've gone the other way around and think other scm's are broken
when they chew disk for 10 seconds whenever I try to do anything with
them. I usually end up importing the other repo into git and do my work
there.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adding a new file as if it had existed
2006-12-12 11:32 ` Bahadir Balban
2006-12-12 12:07 ` Johannes Schindelin
2006-12-12 12:26 ` Andy Parkins
@ 2006-12-12 18:31 ` Junio C Hamano
2006-12-13 9:40 ` Andreas Ericsson
2 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2006-12-12 18:31 UTC (permalink / raw)
To: Bahadir Balban; +Cc: git, Johannes Schindelin, Andy Parkins, Andreas Ericsson
"Bahadir Balban" <bahadir.balban@gmail.com> writes:
> ... I said "performance reasons" assuming all the
> file hashes need checked for every commit -a to see if they're
> changed, but I just tried on a PIII and it seems not so slow.
Ok.
Other people have already cleared the fear for 'commit' case, so
I hope you are happier.
There is one thing we could further optimize, though.
Switching branches with 100k blobs in a commit even when there
are a handful paths different between the branches would still
need to populate the index by reading two trees and collapsing
them into a single stage. In theory, we should be able to do a
lot better if two-tree case of read-tree took advanrage of
cache-tree information. If ce_match_stat() says Ok for all
paths in a subdirectory and the cached tree object name for that
subdirectory in the index match what we are reading from the new
tree, we should be able to skip reading that subdirectory (and
its subdirectories) from the new tree object at all.
Anybody interested to give it a try?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adding a new file as if it had existed
2006-12-12 18:31 ` Junio C Hamano
@ 2006-12-13 9:40 ` Andreas Ericsson
2006-12-13 15:46 ` Johannes Schindelin
0 siblings, 1 reply; 11+ messages in thread
From: Andreas Ericsson @ 2006-12-13 9:40 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Bahadir Balban, git, Johannes Schindelin, Andy Parkins
Junio C Hamano wrote:
> "Bahadir Balban" <bahadir.balban@gmail.com> writes:
>
> There is one thing we could further optimize, though.
>
> Switching branches with 100k blobs in a commit even when there
> are a handful paths different between the branches would still
> need to populate the index by reading two trees and collapsing
> them into a single stage. In theory, we should be able to do a
> lot better if two-tree case of read-tree took advanrage of
> cache-tree information. If ce_match_stat() says Ok for all
> paths in a subdirectory and the cached tree object name for that
> subdirectory in the index match what we are reading from the new
> tree, we should be able to skip reading that subdirectory (and
> its subdirectories) from the new tree object at all.
>
> Anybody interested to give it a try?
>
I'm not vell-versed enough in git internals to have my hopes high of
making something useful of it, but if you give me a pointer of where to
start I'd be happy to try, and perhaps learn something in the process.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adding a new file as if it had existed
2006-12-13 9:40 ` Andreas Ericsson
@ 2006-12-13 15:46 ` Johannes Schindelin
2006-12-13 15:52 ` Andreas Ericsson
0 siblings, 1 reply; 11+ messages in thread
From: Johannes Schindelin @ 2006-12-13 15:46 UTC (permalink / raw)
To: Andreas Ericsson; +Cc: Junio C Hamano, Bahadir Balban, git, Andy Parkins
Hi,
On Wed, 13 Dec 2006, Andreas Ericsson wrote:
> Junio C Hamano wrote:
> > "Bahadir Balban" <bahadir.balban@gmail.com> writes:
> >
> > There is one thing we could further optimize, though.
> >
> > Switching branches with 100k blobs in a commit even when there
> > are a handful paths different between the branches would still
> > need to populate the index by reading two trees and collapsing
> > them into a single stage. In theory, we should be able to do a
> > lot better if two-tree case of read-tree took advanrage of
> > cache-tree information. If ce_match_stat() says Ok for all
> > paths in a subdirectory and the cached tree object name for that
> > subdirectory in the index match what we are reading from the new
> > tree, we should be able to skip reading that subdirectory (and
> > its subdirectories) from the new tree object at all.
> >
> > Anybody interested to give it a try?
> >
>
> I'm not vell-versed enough in git internals to have my hopes high of
> making something useful of it, but if you give me a pointer of where to
> start I'd be happy to try, and perhaps learn something in the process.
Okay, I'll have a stab at explaining it.
For huge working directories, you usually have a huge number of trees. The
idea of cache_tree is to remember not only the stat information of the
blobs in the index, but to cache the hashes of the trees also (until they
are invalidated, e.g. by an update-index). This avoids recalculation of
the hashes when committing.
This cache is accessible by the global variable active_cache_tree. It is
best accessed by the function cache_tree_find(), which you call like that:
struct cache_tree *ct = cache_tree_find(active_cache_tree, path);
where the variable "path" may contain slashes. The SHA1 of the
corresponding tree is in ct->sha1, and you can check if the hash is still
valid by asking
if (cache_tree_fully_valid(ct))
/* still valid */
AFAIU Junio would like to take the shortcut of doing nothing at all when
(twoway) reading a tree whose hash is identical to the hash stored in the
corresponding cache_tree _and_ when the cache is still fully valid.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Adding a new file as if it had existed
2006-12-13 15:46 ` Johannes Schindelin
@ 2006-12-13 15:52 ` Andreas Ericsson
0 siblings, 0 replies; 11+ messages in thread
From: Andreas Ericsson @ 2006-12-13 15:52 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, Bahadir Balban, git, Andy Parkins
Johannes Schindelin wrote:
> Hi,
>
> On Wed, 13 Dec 2006, Andreas Ericsson wrote:
>
>> Junio C Hamano wrote:
>>> "Bahadir Balban" <bahadir.balban@gmail.com> writes:
>>>
>>> There is one thing we could further optimize, though.
>>>
>>> Switching branches with 100k blobs in a commit even when there
>>> are a handful paths different between the branches would still
>>> need to populate the index by reading two trees and collapsing
>>> them into a single stage. In theory, we should be able to do a
>>> lot better if two-tree case of read-tree took advanrage of
>>> cache-tree information. If ce_match_stat() says Ok for all
>>> paths in a subdirectory and the cached tree object name for that
>>> subdirectory in the index match what we are reading from the new
>>> tree, we should be able to skip reading that subdirectory (and
>>> its subdirectories) from the new tree object at all.
>>>
>>> Anybody interested to give it a try?
>>>
>> I'm not vell-versed enough in git internals to have my hopes high of
>> making something useful of it, but if you give me a pointer of where to
>> start I'd be happy to try, and perhaps learn something in the process.
>
> Okay, I'll have a stab at explaining it.
>
> For huge working directories, you usually have a huge number of trees. The
> idea of cache_tree is to remember not only the stat information of the
> blobs in the index, but to cache the hashes of the trees also (until they
> are invalidated, e.g. by an update-index). This avoids recalculation of
> the hashes when committing.
>
> This cache is accessible by the global variable active_cache_tree. It is
> best accessed by the function cache_tree_find(), which you call like that:
>
> struct cache_tree *ct = cache_tree_find(active_cache_tree, path);
>
> where the variable "path" may contain slashes. The SHA1 of the
> corresponding tree is in ct->sha1, and you can check if the hash is still
> valid by asking
>
> if (cache_tree_fully_valid(ct))
> /* still valid */
>
> AFAIU Junio would like to take the shortcut of doing nothing at all when
> (twoway) reading a tree whose hash is identical to the hash stored in the
> corresponding cache_tree _and_ when the cache is still fully valid.
>
Seems you wrote half the code for me already. :)
Thanks for the excellent explanation. I'll see if I can grok it further
tonight.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-12-13 15:52 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-12-12 10:05 Adding a new file as if it had existed Bahadir Balban
2006-12-12 10:13 ` Junio C Hamano
2006-12-12 11:32 ` Bahadir Balban
2006-12-12 12:07 ` Johannes Schindelin
2006-12-12 12:26 ` Andy Parkins
2006-12-12 13:20 ` Andreas Ericsson
2006-12-12 18:31 ` Junio C Hamano
2006-12-13 9:40 ` Andreas Ericsson
2006-12-13 15:46 ` Johannes Schindelin
2006-12-13 15:52 ` Andreas Ericsson
2006-12-12 12:36 ` Jakub Narebski
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).