git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: Wincent Colaiuta <win@wincent.com>, git@vger.kernel.org
Subject: Re: "git add -i" with path gives "Argument list too long"
Date: Sun, 21 Mar 2010 18:23:38 -0700	[thread overview]
Message-ID: <7vwrx56tet.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <20100322003915.GA3212@coredump.intra.peff.net> (Jeff King's message of "Sun\, 21 Mar 2010 20\:39\:15 -0400")

Jeff King <peff@peff.net> writes:

> their behavior. Junio could probably say more, or you will have to read
> the code.

Or read what I already said here a few times ;-) I generally do not want
to repeat myself.

There are two semantics of pathspecs:

  (1) exact match, or leading path.  e.g.

  	git ls-files Makefile Documentation/

  (2) exact match, leading path, or fnmatch(3).

	git ls-files Makefile Documentation/ '*.txt'

The former is used by the diff family, and the latter by pretty much
everything else.  In very old days, the former was the only kind we
supported, and originally, ls-files didn't even take any pathspecs.

The 5be4efb ([PATCH] Make "git-ls-files" work in subdirectories,
2005-08-21) taught ls-files to take pathspec that can glob, but diff
family never got updated to match that.

In order to operate on set of paths, you would need to (a) enumerate your
paths, and (b) filter that enumeration efficiently with pathspecs.  If you
are iterating over the index (e.g. "ls-files", "diff-files", "grep"),
there is nothing tricky in the enumeration step.  We have a flat array of
names in the index and you just walk active_cache[] from 0 to active_nr.
If on the other hand you are walking in an inherently hierarchical
namespace (e.g. "ls-tree", "diff-tree", "grep --no-index") with non empty
set of pathspecs, you need to take advantage of the filtering behaviour
while enumerating the paths---otherwise your performance will suck.

"Leading path" semantics is easier to understand; if a tree entry you are
looking at is "contrib", and the pathspecs you have are "Makefile" and
"Documentation/", then there is no way for anything underneath it
(e.g. "contrib/README") would survive the filtering process, so you can
skip the entry without even opening the sub-tree object.  Linus's argument
was "teaching globs to pathspec code would suck in performance" and he is
right in general.  Because diff-tree is inherently about walking the two
tree objects in parallel, it does not extend its pathspec semantics to
globbing (i.e. if the user asked for '*.txt', you have to open _all_ the
tree objects down to the leaf level to see if they contain any file whose
name ends with .txt), and other family members of diff (namely, diff-files
and "diff" without --cached nor any tree-ish argument) match this
behaviour for consistency, even though theoretically "diff-files" could
easily do globbing, as it walks the flat index namespace.

But I think it is Ok to sacrifice the optimization and descend into any
and all subtrees/directories to see if a path that might match the pattern
exists when the user asks for '*.txt', as long as (and this is a _very_
important point) an update to pathspec logic on the diff side does not
break the optimization unnecessarily.  E.g.

	git diff v1.0.0 v1.2.0 -- Makefile 'Documentation/*.txt'

should still skip opening tree object for 'contrib/' (because anything
underneath contrib/ would never match either pathspecs given), but can
and should descend into Documentation.  And it should _not_ skip 'howto'
subdirectory in Documentation/ directory, as it could find a match with
'*.txt' in that subdirectory.

To prepare for this, later reimplementations of pathspec matching logic
(the one used by "git grep") can compute hints meant to be used by the
path enumeration step, as I explained earlier, enumeration needs to take
advantage of what filtering would do to paths that it will find.

By the way I threw this "pathspec unification" to the list of possible
GSoC ideas, but I suspect it might be a bit too much work to do this
properly for a summer student (and also we might not want to trust this
important part of the system to a summer student).  Other things that
probably needs to be thought through (and I haven't) that may be related
to this codepath is how to handle case-insensitive filesystems.  I think
we currently do not match paths that we obtain from the filesystem case
insensitively with the given pathspecs (we probably shouldn't go case
insensitive when we are walking the index or the tree objects, on the
other hand).

  reply	other threads:[~2010-03-22  1:24 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-21 20:52 "git add -i" with path gives "Argument list too long" Wincent Colaiuta
2010-03-22  0:39 ` Jeff King
2010-03-22  1:23   ` Junio C Hamano [this message]
2010-03-22  1:41     ` Jeff King
2010-03-22  1:55       ` Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2010-01-04 18:43 Wincent Colaiuta
2010-01-05  4:14 ` Jeff King
2010-01-05  5:31   ` Junio C Hamano
2010-01-05 12:34   ` Wincent Colaiuta
2010-01-06 12:19     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vwrx56tet.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=win@wincent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).