git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Git Mailing List <git@vger.kernel.org>,
	Kristian H?gsberg <krh@redhat.com>
Subject: Re: performance problem: "git commit filename"
Date: Mon, 14 Jan 2008 10:38:01 -0800	[thread overview]
Message-ID: <7v63xwl0c6.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: <alpine.LFD.1.00.0801140902140.2806@woody.linux-foundation.org> (Linus Torvalds's message of "Mon, 14 Jan 2008 09:07:27 -0800 (PST)")

Linus Torvalds <torvalds@linux-foundation.org> writes:

> ... I think it would be even better if we 
> just bit the bullet and started looking at having a different in-memory 
> representation from the on-disk one. Possibly not *that* much different: 
> perhaps just keeping a pointer to the on-disk one along with a flags 
> value.

We have two things we currently do that are not about on-disk
index file ('-'), and this patch adds another ('+'):

 - Update the work tree file that corresponds to this entry
   (CE_UPDATE);

 - This entry is to be removed (ce_mode == 0);

 + The work tree file that corresponds to this entry is known to
   be unchanged (CE_UPTODATE).

We could introduce "struct in_core_cache_entry" that has these
information, indexed and sorted by name, and has a pointer to
what we read from the on-disk index.

	struct in_core_cache_entry {
		struct cache_entry *e;
                unsigned is_up_to_date : 1,
                	 to_be_updated : 1,
                         to_be_removed : 1;
	};

The code that iterate over active_cache[] will instead iterate
over this.  The number of the entries in this array will be the
new active_nr.

In the existing code, we reference "ce->name" and "ce->sha1"
everywhere. When we check and update flags we do bitops between
"ce->ce_flags" and "htons(CE_BLAH)" in many places.  Converting
them would adds another indirection and be quite painful.  But
the compiler can reliably spot the places we fail to find, so it
at least is not so risky.  It will just be a lot of work.

  reply	other threads:[~2008-01-14 18:38 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-12 22:46 performance problem: "git commit filename" Linus Torvalds
2008-01-13  1:46 ` Linus Torvalds
2008-01-13  4:04   ` Linus Torvalds
2008-01-13  5:38     ` Daniel Barkalow
2008-01-13  8:14       ` Junio C Hamano
2008-01-13 16:57       ` Linus Torvalds
2008-01-13 19:31         ` Daniel Barkalow
2008-01-13  8:12     ` Junio C Hamano
2008-01-13 10:33       ` Junio C Hamano
2008-01-13 10:54         ` [PATCH] builtin-commit.c: do not lstat(2) partially committed paths twice Junio C Hamano
2008-01-13 11:09         ` performance problem: "git commit filename" Junio C Hamano
2008-01-13 17:24         ` Linus Torvalds
2008-01-13 19:39           ` Junio C Hamano
2008-01-13 22:36             ` [PATCH] index: be careful when handling long names Junio C Hamano
2008-01-13 22:53               ` Alex Riesen
2008-01-13 23:08                 ` Junio C Hamano
2008-01-13 23:33                   ` Alex Riesen
2008-01-14 21:03                     ` Junio C Hamano
2008-01-14  1:00           ` performance problem: "git commit filename" Junio C Hamano
2008-01-14 17:07             ` Linus Torvalds
2008-01-14 18:38               ` Junio C Hamano [this message]
2008-01-14 19:39               ` Linus Torvalds
2008-01-14 20:08                 ` Junio C Hamano
2008-01-14 21:00                   ` Linus Torvalds
2008-01-15  0:18             ` Linus Torvalds
2008-01-15  1:13               ` Junio C Hamano
2008-01-13 10:38       ` [PATCH] builtin-commit.c: remove useless check added by faulty cut and paste Junio C Hamano
2008-01-14 21:23         ` しらいしななこ
2008-01-14 21:54           ` Junio C Hamano
2008-01-14 23:46     ` performance problem: "git commit filename" Kristian Høgsberg
2008-01-14 23:15   ` Kristian Høgsberg
2008-01-14 23:48     ` Junio C Hamano
2008-01-14 23:53       ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7v63xwl0c6.fsf@gitster.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=krh@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).