git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* What's cooking in git.git (Oct 2012, #01; Tue, 2)
@ 2012-10-02 23:20 Junio C Hamano
  2012-10-03 15:23 ` Nguyen Thai Ngoc Duy
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Junio C Hamano @ 2012-10-02 23:20 UTC (permalink / raw)
  To: git

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.

The tip of 'master' is a bit past v1.8.0-rc0; we are more or less
feature complete, except for a few topics marked as "Will merge to
'master' in the list below.

I do not expect the topics in the stalled category will be ready for
the upcoming release; they may appear in the release after v1.8.0 if
they are rerolled.  Also some of the topics that have been cooking
in 'next' have to be deferred to the next cycle.

I'm planning to keep this cycle reasonably short and aim for tagging
the result as 1.8.0 at the end of 9th week, on October 21st, after
which I'd disappear for a few weeks.  http://tinyurl.com/gitCal is
where you can always find my rough tagging schedule at.

You can find the changes described here in the integration branches of the
repositories listed at

    http://git-blame.blogspot.com/p/git-public-repositories.html

--------------------------------------------------
[New Topics]

* bw/config-lift-variable-name-length-limit (2012-10-01) 1 commit
 - Remove the hard coded length limit on variable names in config files

 The configuration parser had an unnecessary hardcoded limit on
 variable names that was not checked consistently. Lift the limit.

 Will merge to 'next'.


* jc/maint-t1450-fsck-order-fix (2012-10-02) 1 commit
 - t1450: the order the objects are checked is undefined

 The fsck test assumed too much on what kind of error it will
 detect. The only important thing is the inconsistency is detected
 as an error.

 Will merge to 'next'.

--------------------------------------------------
[Graduated to "master"]

* da/mergetool-custom (2012-09-25) 1 commit
  (merged to 'next' on 2012-09-27 at f910851)
 + mergetool--lib: Allow custom commands to override built-ins

 The actual external command to run for mergetool backend can be
 specified with difftool/mergetool.$name.cmd configuration
 variables, but this mechanism was ignored for the backends we
 natively support.


* ep/malloc-check-perturb (2012-09-26) 1 commit
  (merged to 'next' on 2012-09-27 at f115b8b)
 + MALLOC_CHECK: enable it, unless disabled explicitly

 Fixes a brown-paper bag bug.


* jc/blame-follows-renames (2012-09-21) 1 commit
  (merged to 'next' on 2012-09-27 at 12634c1)
 + git blame: document that it always follows origin across whole-file renames
 (this branch is used by jc/blame-no-follow.)

 Clarify the "blame" documentation to tell the users that there is
 no need to ask for "--follow".


* jk/completion-tests (2012-09-27) 2 commits
  (merged to 'next' on 2012-09-27 at 5cd6968)
 + t9902: add completion tests for "odd" filenames
 + t9902: add a few basic completion tests


* jk/receive-pack-unpack-error-to-pusher (2012-09-21) 3 commits
  (merged to 'next' on 2012-09-27 at 90f0c6f)
 + receive-pack: drop "n/a" on unpacker errors
 + receive-pack: send pack-processing stderr over sideband
 + receive-pack: redirect unpack-objects stdout to /dev/null

 Send errors from "unpack-objects" and "index-pack" back to the "git
 push" over the git and smart-http protocols, just like it is done
 for a push over the ssh protocol.


* os/commit-submodule-ignore (2012-09-24) 1 commit
  (merged to 'next' on 2012-09-27 at 9cd2cfd)
 + commit: pay attention to submodule.$name.ignore in .gitmodules

 "git status" honored the ignore=dirty settings in .gitmodules but
 "git commit" didn't.


* rt/maint-clone-single (2012-09-20) 1 commit
  (merged to 'next' on 2012-09-27 at a47d54d)
 + clone --single: limit the fetch refspec to fetched branch

 Running "git fetch" in a repository made with "git clone --single"
 slurps all the branches, defeating the point of "--single".

--------------------------------------------------
[Stalled]

* rc/maint-complete-git-p4 (2012-09-24) 1 commit
  (merged to 'next' on 2012-09-25 at 116e58f)
 + Teach git-completion about git p4

 Comment from Pete will need to be addressed in a follow-up patch.


* as/check-ignore (2012-09-27) 17 commits
 - [SQUASH-FIX] 283d072 (Add git-check-ignore sub-command, 2012-09-20)
 - [SQUASH-FIX] 283d072 (Add git-check-ignore sub-command, 2012-09-20)
 - [REROLL NEEDED] minimum compilation fix
 - Add git-check-ignore sub-command
 - dir.c: provide free_directory() for reclaiming dir_struct memory
 - pathspec.c: move reusable code from builtin/add.c
 - dir.c: refactor treat_gitlinks()
 - dir.c: keep track of where patterns came from
 - dir.c: refactor is_path_excluded()
 - dir.c: refactor is_excluded()
 - dir.c: refactor is_excluded_from_list()
 - dir.c: rename excluded() to is_excluded()
 - dir.c: rename excluded_from_list() to is_excluded_from_list()
 - dir.c: rename path_excluded() to is_path_excluded()
 - dir.c: rename cryptic 'which' variable to more consistent name
 - Improve documentation and comments regarding directory traversal API
 - Update directory listing API doc to match code

 Expecting a reroll.


* as/test-tweaks (2012-09-20) 7 commits
 - tests: paint unexpectedly fixed known breakages in bold red
 - tests: test the test framework more thoroughly
 - [SQUASH] t/t0000-basic.sh: quoting of TEST_DIRECTORY is screwed up
 - tests: refactor mechanics of testing in a sub test-lib
 - tests: paint skipped tests in bold blue
 - tests: test number comes first in 'not ok $count - $message'
 - tests: paint known breakages in bold yellow

 Various minor tweaks to the test framework to paint its output
 lines in colors that match what they mean better.

 Has the "is this really blue?" issue Peff raised resolved???


* fa/remote-svn (2012-09-19) 16 commits
 - Add a test script for remote-svn
 - remote-svn: add marks-file regeneration
 - Add a svnrdump-simulator replaying a dump file for testing
 - remote-svn: add incremental import
 - remote-svn: Activate import/export-marks for fast-import
 - Create a note for every imported commit containing svn metadata
 - vcs-svn: add fast_export_note to create notes
 - Allow reading svn dumps from files via file:// urls
 - remote-svn, vcs-svn: Enable fetching to private refs
 - When debug==1, start fast-import with "--stats" instead of "--quiet"
 - Add documentation for the 'bidi-import' capability of remote-helpers
 - Connect fast-import to the remote-helper via pipe, adding 'bidi-import' capability
 - Add argv_array_detach and argv_array_free_detached
 - Add svndump_init_fd to allow reading dumps from arbitrary FDs
 - Add git-remote-testsvn to Makefile
 - Implement a remote helper for svn in C
 (this branch is used by fa/vcs-svn.)

 A GSoC project.
 Waiting for comments from mentors and stakeholders.


* fa/vcs-svn (2012-09-19) 4 commits
 - vcs-svn: remove repo_tree
 - vcs-svn/svndump: rewrite handle_node(), begin|end_revision()
 - vcs-svn/svndump: restructure node_ctx, rev_ctx handling
 - svndump: move struct definitions to .h
 (this branch uses fa/remote-svn.)

 A GSoC project.
 Waiting for comments from mentors and stakeholders.


* jc/maint-name-rev (2012-09-17) 7 commits
 - describe --contains: use "name-rev --algorithm=weight"
 - name-rev --algorithm=weight: tests and documentation
 - name-rev --algorithm=weight: cache the computed weight in notes
 - name-rev --algorithm=weight: trivial optimization
 - name-rev: --algorithm option
 - name_rev: clarify the logic to assign a new tip-name to a commit
 - name-rev: lose unnecessary typedef

 "git name-rev" names the given revision based on a ref that can be
 reached in the smallest number of steps from the rev, but that is
 not useful when the caller wants to know which tag is the oldest one
 that contains the rev.  This teaches a new mode to the command that
 uses the oldest ref among those which contain the rev.

 I am not sure if this is worth it; for one thing, even with the help
 from notes-cache, it seems to make the "describe --contains" even
 slower. Also the command will be unusably slow for a user who does
 not have a write access (hence unable to create or update the
 notes-cache).

 Stalled mostly due to lack of responses.


* jc/xprm-generation (2012-09-14) 1 commit
 - test-generation: compute generation numbers and clock skews

 A toy to analyze how bad the clock skews are in histories of real
 world projects.

 Stalled mostly due to lack of responses.


* jc/blame-no-follow (2012-09-21) 2 commits
 - blame: pay attention to --no-follow
 - diff: accept --no-follow option

 Teaches "--no-follow" option to "git blame" to disable its
 whole-file rename detection.

 Stalled mostly due to lack of responses.


* mk/maint-graph-infinity-loop (2012-09-25) 1 commit
 - graph.c: infinite loop in git whatchanged --graph -m

 The --graph code fell into infinite loop when asked to do what the
 code did not expect ;-)

 Anybody who worked on "--graph" wants to comment?
 Stalled mostly due to lack of responses.


* ph/credential-refactor (2012-09-02) 5 commits
 - wincred: port to generic credential helper
 - Merge branch 'ef/win32-cred-helper' into ph/credential-refactor
 - osxkeychain: port to generic credential helper implementation
 - gnome-keyring: port to generic helper implementation
 - contrib: add generic credential helper

 Attempts to refactor to share code among OSX keychain, Gnome keyring
 and Win32 credential helpers.


* ms/contrib-thunderbird-updates (2012-08-31) 2 commits
 - [SQUASH] minimum fixup
 - Thunderbird: fix appp.sh format problems

 Update helper to send out format-patch output using Thunderbird.
 Seems to have design regression for silent users.


* jx/test-real-path (2012-08-27) 1 commit
 - test: set the realpath of CWD as TRASH_DIRECTORY

 Running tests with the "trash" directory elsewhere with the "--root"
 option did not work well if the directory was specified by a symbolic
 link pointing at it.

 Seems broken as it makes $(pwd) and TRASH_DIRECTORY inconsistent.
 Will discard.


* jc/maint-push-refs-all (2012-08-27) 2 commits
 - get_fetch_map(): tighten checks on dest refs
 - [BROKEN] fetch/push: allow refs/*:refs/*

 Allows pushing and fetching everything including refs/stash.
 This is broken (see the log message there).

 Not ready.


* jc/add-delete-default (2012-08-13) 1 commit
 - git add: notice removal of tracked paths by default

 "git add dir/" updated modified files and added new files, but does
 not notice removed files, which may be "Huh?" to some users.  They
 can of course use "git add -A dir/", but why should they?

 Resurrected from graveyard, as I thought it was a worthwhile thing
 to do in the longer term.

 Waiting for comments.


* tx/relative-in-the-future (2012-08-16) 2 commits
 - date: show relative dates in the future
 - date: refactor the relative date logic from presentation

 Not my itch; rewritten an earlier submission by Tom Xue into
 somewhat more maintainable form, though it breaks existing i18n.

 Waiting for a voluteer to fix it up.
 Otherwise may discard.


* mb/remote-default-nn-origin (2012-07-11) 6 commits
 - Teach get_default_remote to respect remote.default.
 - Test that plain "git fetch" uses remote.default when on a detached HEAD.
 - Teach clone to set remote.default.
 - Teach "git remote" about remote.default.
 - Teach remote.c about the remote.default configuration setting.
 - Rename remote.c's default_remote_name static variables.

 When the user does not specify what remote to interact with, we
 often attempt to use 'origin'.  This can now be customized via a
 configuration variable.

 Expecting a reroll.

 "The first remote becomes the default" bit is better done as a
 separate step.


* jc/split-blob (2012-04-03) 6 commits
 - chunked-object: streaming checkout
 - chunked-object: fallback checkout codepaths
 - bulk-checkin: support chunked-object encoding
 - bulk-checkin: allow the same data to be multiply hashed
 - new representation types in the packstream
 - packfile: use varint functions

 Not ready.

 I finished the streaming checkout codepath, but as explained in
 127b177 (bulk-checkin: support chunked-object encoding, 2011-11-30),
 these are still early steps of a long and painful journey. At least
 pack-objects and fsck need to learn the new encoding for the series
 to be usable locally, and then index-pack/unpack-objects needs to
 learn it to be used remotely.

 Given that I heard a lot of noise that people want large files, and
 that I was asked by somebody at GitTogether'11 privately for an
 advice on how to pay developers (not me) to help adding necessary
 support, I am somewhat dissapointed that the original patch series
 that was sent long time ago still remains here without much comments
 and updates from the developer community. I even made the interface
 to the logic that decides where to split chunks easily replaceable,
 and I deliberately made the logic in the original patch extremely
 stupid to entice others, especially the "bup" fanbois, to come up
 with a better logic, thinking that giving people an easy target to
 shoot for, they may be encouraged to help out. The plan is not
 working :-<.

--------------------------------------------------
[Cooking]

* jl/submodule-add-by-name (2012-09-30) 2 commits
 - submodule add: Fail when .git/modules/<name> already exists unless forced
 - Teach "git submodule add" the --name option

 If you remove a submodule, in order to keep the repository so that
 "git checkout" to an older commit in the superproject history can
 resurrect the submodule, the real repository will stay in $GIT_DIR
 of the superproject.  A later "git submodule add $path" to add a
 different submodule at the same path will fail.  Diagnose this case
 a bit better, and if the user really wants to add an unrelated
 submodule at the same path, give the "--name" option to give it a
 place in $GIT_DIR of the superproject that does not conflict with
 the original submodule.

 Will merge to 'next'.


* nd/grep-reflog (2012-09-29) 4 commits
  (merged to 'next' on 2012-10-01 at 57773a6)
 + revision: make --grep search in notes too if shown
 + log --grep-reflog: reject the option without -g
 + revision: add --grep-reflog to filter commits by reflog messages
 + grep: prepare for new header field filter

 Teach the commands from the "log" family the "--grep-reflog" option
 to limit output by string that appears in the reflog entry when the
 "--walk-reflogs" option is in effect.

 Will merge to 'master'.


* lt/mailinfo-handle-attachment-more-sanely (2012-09-30) 1 commit
  (merged to 'next' on 2012-10-01 at 2a1cecc)
 + mailinfo: don't require "text" mime type for attachments

 A patch attached as application/octet-stream (e.g. not text/*) were
 mishandled, not correctly honoring Content-Transfer-Encoding
 (e.g. base64).

 Will merge to 'master'.


* tu/gc-auto-quiet (2012-09-27) 1 commit
  (merged to 'next' on 2012-10-01 at ad8b91b)
 + silence git gc --auto --quiet output

 "gc --auto" notified the user that auto-packing has triggered even
 under the "--quiet" option.

 Will merge to 'master'.


* jm/diff-context-config (2012-10-02) 2 commits
  (merged to 'next' on 2012-10-02 at e57700a)
 + t4055: avoid use of sed 'a' command
  (merged to 'next' on 2012-10-01 at 509a558)
 + diff: diff.context configuration gives default to -U

 Teaches a new configuration variable to "git diff" Porcelain and
 its friends.

 Will defer to the next cycle.


* mh/ceiling (2012-09-29) 9 commits
 - t1504: stop resolving symlinks in GIT_CEILING_DIRECTORIES
 - longest_ancestor_length(): resolve symlinks before comparing paths
 - longest_ancestor_length(): use string_list_longest_prefix()
 - longest_ancestor_length(): always add a slash to the end of prefixes
 - longest_ancestor_length(): explicitly filter list before loop
 - longest_ancestor_length(): use string_list_split()
 - Introduce new function real_path_if_valid()
 - real_path_internal(): add comment explaining use of cwd
 - Introduce new static function real_path_internal()

 Elements of GIT_CEILING_DIRECTORIES list may not match the real
 pathname we obtain from getcwd(), leading the GIT_DIR discovery
 logic to escape the ceilings the user thought to have specified.

 The solution felt a bit unnecessarily convoluted to me.
 Expecting a reroll.


* jl/submodule-rm (2012-09-29) 1 commit
  (merged to 'next' on 2012-10-01 at 4e5c4fc)
 + submodule: teach rm to remove submodules unless they contain a git directory

 "git rm submodule" cannot blindly remove a submodule directory as
 its working tree may have local changes, and worse yet, it may even
 have its repository embedded in it.  Teach it some special cases
 where it is safe to remove a submodule, specifically, when there is
 no local changes in the submodule working tree, and its repository
 is not embedded in its working tree but is elsewhere and uses the
 gitfile mechanism to point at it.

 Will defer to the next cycle.


* nd/wildmatch (2012-09-27) 5 commits
 - Support "**" in .gitignore and .gitattributes patterns using wildmatch()
 - Integrate wildmatch to git
 - compat/wildmatch: fix case-insensitive matching
 - compat/wildmatch: remove static variable force_lower_case
 - Import wildmatch from rsync

 Allows pathname patterns in .gitignore and .gitattributes files
 with double-asterisks "foo/**/bar" to match any number of directory
 hiearchies.

 It was pointed out that some symbols that do not have to be global
 are left global. I think this reroll fixed most of them.

 Will merge to 'next'.


* nd/pretty-placeholder-with-color-option (2012-09-30) 9 commits
 - pretty: support %>> that steal trailing spaces
 - pretty: support truncating in %>, %< and %><
 - pretty: support padding placeholders, %< %> and %><
 - pretty: two phase conversion for non utf-8 commits
 - utf8.c: add utf8_strnwidth() with the ability to skip ansi sequences
 - utf8.c: move display_mode_esc_sequence_len() for use by other functions
 - pretty: support %C(auto[,N]) to turn on coloring on next placeholder(s)
 - pretty: split parsing %C into a separate function
 - pretty: share code between format_decoration and show_decorations


* jk/no-more-pre-exec-callback (2012-06-05) 1 commit
 - pager: drop "wait for output to run less" hack

 (Originally merged to 'next' on 2012-07-23)

 Will defer to the next cycle.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)
  2012-10-02 23:20 What's cooking in git.git (Oct 2012, #01; Tue, 2) Junio C Hamano
@ 2012-10-03 15:23 ` Nguyen Thai Ngoc Duy
  2012-10-03 18:17   ` Junio C Hamano
  2012-10-04  8:17 ` David Michael Barr
  2012-10-30 12:15 ` Florian Achleitner
  2 siblings, 1 reply; 31+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-10-03 15:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Oct 3, 2012 at 6:20 AM, Junio C Hamano <gitster@pobox.com> wrote:
> * nd/wildmatch (2012-09-27) 5 commits
>  - Support "**" in .gitignore and .gitattributes patterns using wildmatch()
>  - Integrate wildmatch to git
>  - compat/wildmatch: fix case-insensitive matching
>  - compat/wildmatch: remove static variable force_lower_case
>  - Import wildmatch from rsync
>
>  Allows pathname patterns in .gitignore and .gitattributes files
>  with double-asterisks "foo/**/bar" to match any number of directory
>  hiearchies.
>
>  It was pointed out that some symbols that do not have to be global
>  are left global. I think this reroll fixed most of them.
>
>  Will merge to 'next'.

Just a bit of finding lately, in case you want to postpone the merge.

There's an interesting case: "**foo". According to our rules, that
pattern does not contain slashes therefore is basename match. But some
might find that confusing because "**" can match slashes, as opposed
to ordinary wildcards which cannot. So we could either go with our
rules and consider "**" just like "*" in this case (do we need
document clarification?), or redefine it that the presence of "**"
implies FNM_PATHNAME.

I think the latter makes more sense. When users put "**" they expect
to match some slashes. But that may call for a refactoring in
path_matches() in attr.c. Putting strstr(pattern, "**") in that
matching function may increase overhead unnecessarily.

The third option is just die() and let users decide either "*foo",
"**/foo" or "/**foo", never "**foo".
-- 
Duy

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)
  2012-10-03 15:23 ` Nguyen Thai Ngoc Duy
@ 2012-10-03 18:17   ` Junio C Hamano
  2012-10-04  1:56     ` Nguyen Thai Ngoc Duy
  2012-10-04  9:34     ` What's cooking in git.git (Oct 2012, #01; Tue, 2) Michael Haggerty
  0 siblings, 2 replies; 31+ messages in thread
From: Junio C Hamano @ 2012-10-03 18:17 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> There's an interesting case: "**foo". According to our rules, that
> pattern does not contain slashes therefore is basename match. But some
> might find that confusing because "**" can match slashes,...

By "our rules", if you mean "if a pattern has slash, it is anchored",
that obviously need to be updated with this series, if "**" is meant
to match multiple hierarchies.
> I think the latter makes more sense. When users put "**" they expect
> to match some slashes. But that may call for a refactoring in
> path_matches() in attr.c. Putting strstr(pattern, "**") in that
> matching function may increase overhead unnecessarily.
>
> The third option is just die() and let users decide either "*foo",
> "**/foo" or "/**foo", never "**foo".

For the double-star at the beginning, you should just turn it into "**/"
if it is not followed by a slash internally, I think.

What is the semantics of ** in the first place?  Is it described to
a reasonable level of detail in the documentation updates?  For
example does "**foo" match "afoo", "a/b/foo", "a/bfoo", "a/foo/b",
"a/bfoo/c"?  Does "x**y" match "xy", "xay", "xa/by", "x/a/y"?

I am guessing that the only sensible definition is that "**"
requires anything that comes before it (if exists) is at a proper
hierarchy boundary, and anything matches it is also at a proper
hierarchy boundary, so "x**y" matches "x/a/y" and not "xy", "xay",
nor "xa/by" in the above example.  If "x**y" can match "xy" or "xay"
(or "**foo" can match "afoo"), it would be unreasonable to say it
implies the pattern is anchored at any level, no?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)
  2012-10-03 18:17   ` Junio C Hamano
@ 2012-10-04  1:56     ` Nguyen Thai Ngoc Duy
  2012-10-04  6:01       ` Junio C Hamano
  2012-10-04  9:34     ` What's cooking in git.git (Oct 2012, #01; Tue, 2) Michael Haggerty
  1 sibling, 1 reply; 31+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-10-04  1:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Oct 4, 2012 at 1:17 AM, Junio C Hamano <gitster@pobox.com> wrote:
> For the double-star at the beginning, you should just turn it into "**/"
> if it is not followed by a slash internally, I think.
>
> What is the semantics of ** in the first place?  Is it described to
> a reasonable level of detail in the documentation updates?  For
> example does "**foo" match "afoo", "a/b/foo", "a/bfoo", "a/foo/b",
> "a/bfoo/c"?  Does "x**y" match "xy", "xay", "xa/by", "x/a/y"?

It's basically what rsync describes: use ’**’ to match anything,
including slashes.

Reading rsync's man page again, I notice I missed two other rules related to **:

 - If the pattern contains a / (not counting a trailing /) or a "**",
then it is matched against the full pathname, including any leading
directories.  If  the  pattern  doesn't contain  a / or a "**", then
it is matched only against the final component of the filename.
(Remember that the algorithm is applied recursively so "full filename"
can actually be any portion of a path from the starting directory on
down.)

 - A trailing "dir_name/***" will match both the directory (as if
"dir_name/" had been specified) and everything in the directory (as if
"dir_name/**" had been specified).  This behavior was added in version
2.6.7.

From what you wrote, I think we'll go with the first rule. The second
rule looks irrelevant to what git's doing.

> I am guessing that the only sensible definition is that "**"
> requires anything that comes before it (if exists) is at a proper
> hierarchy boundary, and anything matches it is also at a proper
> hierarchy boundary, so "x**y" matches "x/a/y"

and "x/y" too? (As opposed to "x/**/y" which does not)

> and not "xy", "xay",
> nor "xa/by" in the above example.  If "x**y" can match "xy" or "xay"
> (or "**foo" can match "afoo"), it would be unreasonable to say it
> implies the pattern is anchored at any level, no?

Yeah. That makes things easier to reason, though not exactly what we're having.
-- 
Duy

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)
  2012-10-04  1:56     ` Nguyen Thai Ngoc Duy
@ 2012-10-04  6:01       ` Junio C Hamano
  2012-10-04  7:39         ` [PATCH 0/6] wildmatch part 2 Nguyễn Thái Ngọc Duy
  0 siblings, 1 reply; 31+ messages in thread
From: Junio C Hamano @ 2012-10-04  6:01 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

>> I am guessing that the only sensible definition is that "**"
>> requires anything that comes before it (if exists) is at a proper
>> hierarchy boundary, and anything matches it is also at a proper
>> hierarchy boundary, so "x**y" matches "x/a/y"
>
> and "x/y" too? (As opposed to "x/**/y" which does not)

Yeah, x**y would match x/y under that "sensible" semantics.

>> and not "xy", "xay",
>> nor "xa/by" in the above example.  If "x**y" can match "xy" or "xay"
>> (or "**foo" can match "afoo"), it would be unreasonable to say it
>> implies the pattern is anchored at any level, no?
>
> Yeah. That makes things easier to reason, though not exactly what we're having.

It sounds like that "x**y" with the code you imported would match
"xy" and "xa/b/cy", and I do not think of a concise and good way to
describe what it does to the end users.

"matches anything including '/'" is not a useful description for the
purpose of allowing the user to intuitively understand why "x**y" is
anchored at the level (or is not anchored and can appear anywhere).

Perhaps the wildmatch code may not be what we want X-<.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 0/6] wildmatch part 2
  2012-10-04  6:01       ` Junio C Hamano
@ 2012-10-04  7:39         ` Nguyễn Thái Ngọc Duy
  2012-10-04  7:39           ` [PATCH 1/6] attr: remove the union in struct match_attr Nguyễn Thái Ngọc Duy
                             ` (6 more replies)
  0 siblings, 7 replies; 31+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-04  7:39 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

On Thu, Oct 4, 2012 at 1:01 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Perhaps the wildmatch code may not be what we want X-<.

When I imported wildmatch I was hoping to make minimum changes to it.
But wildmatch is probably the only practical way to support "**" even
if we later need to change it the way we want. Other options are base
our work on top of compat/fnmatch.c, which is an #ifdef spaghetti
mess, or write a new fnmatch()-compatible function. Both unattractive
to me.

Anyway, this is on top of nd/wildmatch, which makes "ab**cd" match
full pathname.

attr patches port .gitignore optimizations over. In long term, we
should probably have a shared matching implementation instead. I tried
that road once and failed so I won't attempt again any time soon. If
we drop wildmatch, I can split these attr patches out as a separate
series. It's a good thing to do anyway.

The last patch just reflects that current "**" is not exactly what we
want. I'm not sure if I could look into wildmatch.c and change it.
Anybody is welcome to step up, of course.

Nguyễn Thái Ngọc Duy (6):
  attr: remove the union in struct match_attr
  attr: avoid strlen() on every match
  attr: avoid searching for basename on every match
  attr: more matching optimizations from .gitignore
  gitignore: do not do basename match with patterns that have '**'
  t3001: note about expected "**" behavior

 Documentation/gitignore.txt        |  10 ++--
 attr.c                             | 101 +++++++++++++++++++++++++++----------
 dir.c                              |   6 +--
 dir.h                              |   2 +
 t/t0003-attributes.sh              |  16 ++++++
 t/t3001-ls-files-others-exclude.sh |  18 +++++++
 6 files changed, 118 insertions(+), 35 deletions(-)

-- 
1.7.12.1.405.gb727dc9

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/6] attr: remove the union in struct match_attr
  2012-10-04  7:39         ` [PATCH 0/6] wildmatch part 2 Nguyễn Thái Ngọc Duy
@ 2012-10-04  7:39           ` Nguyễn Thái Ngọc Duy
  2012-10-04  7:39           ` [PATCH 2/6] attr: avoid strlen() on every match Nguyễn Thái Ngọc Duy
                             ` (5 subsequent siblings)
  6 siblings, 0 replies; 31+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-04  7:39 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

We're going to add more attributes to u.pattern so it'll become bigger
in size than a pointer. There's no point in sharing the same room with
u.attr.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 attr.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/attr.c b/attr.c
index 15ebaa1..48df800 100644
--- a/attr.c
+++ b/attr.c
@@ -119,10 +119,10 @@ struct attr_state {
 /*
  * One rule, as from a .gitattributes file.
  *
- * If is_macro is true, then u.attr is a pointer to the git_attr being
+ * If is_macro is true, then attr is a pointer to the git_attr being
  * defined.
  *
- * If is_macro is false, then u.pattern points at the filename pattern
+ * If is_macro is false, then pattern points at the filename pattern
  * to which the rule applies.  (The memory pointed to is part of the
  * memory block allocated for the match_attr instance.)
  *
@@ -131,10 +131,8 @@ struct attr_state {
  * listed as they appear in the file (macros unexpanded).
  */
 struct match_attr {
-	union {
-		char *pattern;
-		struct git_attr *attr;
-	} u;
+	const char *pattern;
+	struct git_attr *attr;
 	char is_macro;
 	unsigned num_attr;
 	struct attr_state state[FLEX_ARRAY];
@@ -240,11 +238,12 @@ static struct match_attr *parse_attr_line(const char *line, const char *src,
 		      sizeof(struct attr_state) * num_attr +
 		      (is_macro ? 0 : namelen + 1));
 	if (is_macro)
-		res->u.attr = git_attr_internal(name, namelen);
+		res->attr = git_attr_internal(name, namelen);
 	else {
-		res->u.pattern = (char *)&(res->state[num_attr]);
-		memcpy(res->u.pattern, name, namelen);
-		res->u.pattern[namelen] = 0;
+		char *p = (char *)&(res->state[num_attr]);
+		memcpy(p, name, namelen);
+		p[namelen] = 0;
+		res->pattern = p;
 	}
 	res->is_macro = is_macro;
 	res->num_attr = num_attr;
@@ -682,7 +681,7 @@ static int fill_one(const char *what, struct match_attr *a, int rem)
 
 		if (*n == ATTR__UNKNOWN) {
 			debug_set(what,
-				  a->is_macro ? a->u.attr->name : a->u.pattern,
+				  a->is_macro ? a->attr->name : a->pattern,
 				  attr, v);
 			*n = v;
 			rem--;
@@ -702,7 +701,7 @@ static int fill(const char *path, int pathlen, struct attr_stack *stk, int rem)
 		if (a->is_macro)
 			continue;
 		if (path_matches(path, pathlen,
-				 a->u.pattern, base, strlen(base)))
+				 a->pattern, base, strlen(base)))
 			rem = fill_one("fill", a, rem);
 	}
 	return rem;
@@ -722,7 +721,7 @@ static int macroexpand_one(int attr_nr, int rem)
 			struct match_attr *ma = stk->attrs[i];
 			if (!ma->is_macro)
 				continue;
-			if (ma->u.attr->attr_nr == attr_nr)
+			if (ma->attr->attr_nr == attr_nr)
 				a = ma;
 		}
 
-- 
1.7.12.1.405.gb727dc9

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 2/6] attr: avoid strlen() on every match
  2012-10-04  7:39         ` [PATCH 0/6] wildmatch part 2 Nguyễn Thái Ngọc Duy
  2012-10-04  7:39           ` [PATCH 1/6] attr: remove the union in struct match_attr Nguyễn Thái Ngọc Duy
@ 2012-10-04  7:39           ` Nguyễn Thái Ngọc Duy
  2012-10-04  7:39           ` [PATCH 3/6] attr: avoid searching for basename " Nguyễn Thái Ngọc Duy
                             ` (4 subsequent siblings)
  6 siblings, 0 replies; 31+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-04  7:39 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 attr.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/attr.c b/attr.c
index 48df800..66b96d9 100644
--- a/attr.c
+++ b/attr.c
@@ -277,6 +277,7 @@ static struct match_attr *parse_attr_line(const char *line, const char *src,
 static struct attr_stack {
 	struct attr_stack *prev;
 	char *origin;
+	size_t originlen;
 	unsigned num_matches;
 	unsigned alloc;
 	struct match_attr **attrs;
@@ -532,6 +533,7 @@ static void bootstrap_attr_stack(void)
 	if (!is_bare_repository() || direction == GIT_ATTR_INDEX) {
 		elem = read_attr(GITATTRIBUTES_FILE, 1);
 		elem->origin = xstrdup("");
+		elem->originlen = 0;
 		elem->prev = attr_stack;
 		attr_stack = elem;
 		debug_push(elem);
@@ -625,7 +627,7 @@ static void prepare_attr_stack(const char *path)
 			strbuf_addstr(&pathbuf, GITATTRIBUTES_FILE);
 			elem = read_attr(pathbuf.buf, 0);
 			strbuf_setlen(&pathbuf, cp - path);
-			elem->origin = strbuf_detach(&pathbuf, NULL);
+			elem->origin = strbuf_detach(&pathbuf, &elem->originlen);
 			elem->prev = attr_stack;
 			attr_stack = elem;
 			debug_push(elem);
@@ -701,7 +703,7 @@ static int fill(const char *path, int pathlen, struct attr_stack *stk, int rem)
 		if (a->is_macro)
 			continue;
 		if (path_matches(path, pathlen,
-				 a->pattern, base, strlen(base)))
+				 a->pattern, base, stk->originlen))
 			rem = fill_one("fill", a, rem);
 	}
 	return rem;
-- 
1.7.12.1.405.gb727dc9

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 3/6] attr: avoid searching for basename on every match
  2012-10-04  7:39         ` [PATCH 0/6] wildmatch part 2 Nguyễn Thái Ngọc Duy
  2012-10-04  7:39           ` [PATCH 1/6] attr: remove the union in struct match_attr Nguyễn Thái Ngọc Duy
  2012-10-04  7:39           ` [PATCH 2/6] attr: avoid strlen() on every match Nguyễn Thái Ngọc Duy
@ 2012-10-04  7:39           ` Nguyễn Thái Ngọc Duy
  2012-10-04  7:39           ` [PATCH 4/6] attr: more matching optimizations from .gitignore Nguyễn Thái Ngọc Duy
                             ` (3 subsequent siblings)
  6 siblings, 0 replies; 31+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-04  7:39 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 attr.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/attr.c b/attr.c
index 66b96d9..eb576ac 100644
--- a/attr.c
+++ b/attr.c
@@ -644,13 +644,11 @@ static void prepare_attr_stack(const char *path)
 }
 
 static int path_matches(const char *pathname, int pathlen,
+			const char *basename,
 			const char *pattern,
 			const char *base, int baselen)
 {
 	if (!strchr(pattern, '/')) {
-		/* match basename */
-		const char *basename = strrchr(pathname, '/');
-		basename = basename ? basename + 1 : pathname;
 		return (fnmatch_icase(pattern, basename, 0) == 0);
 	}
 	/*
@@ -693,7 +691,8 @@ static int fill_one(const char *what, struct match_attr *a, int rem)
 	return rem;
 }
 
-static int fill(const char *path, int pathlen, struct attr_stack *stk, int rem)
+static int fill(const char *path, int pathlen, const char *basename,
+		struct attr_stack *stk, int rem)
 {
 	int i;
 	const char *base = stk->origin ? stk->origin : "";
@@ -702,7 +701,7 @@ static int fill(const char *path, int pathlen, struct attr_stack *stk, int rem)
 		struct match_attr *a = stk->attrs[i];
 		if (a->is_macro)
 			continue;
-		if (path_matches(path, pathlen,
+		if (path_matches(path, pathlen, basename,
 				 a->pattern, base, stk->originlen))
 			rem = fill_one("fill", a, rem);
 	}
@@ -741,15 +740,19 @@ static void collect_all_attrs(const char *path)
 {
 	struct attr_stack *stk;
 	int i, pathlen, rem;
+	const char *basename;
 
 	prepare_attr_stack(path);
 	for (i = 0; i < attr_nr; i++)
 		check_all_attr[i].value = ATTR__UNKNOWN;
 
+	basename = strrchr(path, '/');
+	basename = basename ? basename + 1 : path;
+
 	pathlen = strlen(path);
 	rem = attr_nr;
 	for (stk = attr_stack; 0 < rem && stk; stk = stk->prev)
-		rem = fill(path, pathlen, stk, rem);
+		rem = fill(path, pathlen, basename, stk, rem);
 }
 
 int git_check_attr(const char *path, int num, struct git_attr_check *check)
-- 
1.7.12.1.405.gb727dc9

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 4/6] attr: more matching optimizations from .gitignore
  2012-10-04  7:39         ` [PATCH 0/6] wildmatch part 2 Nguyễn Thái Ngọc Duy
                             ` (2 preceding siblings ...)
  2012-10-04  7:39           ` [PATCH 3/6] attr: avoid searching for basename " Nguyễn Thái Ngọc Duy
@ 2012-10-04  7:39           ` Nguyễn Thái Ngọc Duy
  2012-10-04  7:39           ` [PATCH 5/6] gitignore: do not do basename match with patterns that have '**' Nguyễn Thái Ngọc Duy
                             ` (2 subsequent siblings)
  6 siblings, 0 replies; 31+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-04  7:39 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

.gitattributes and .gitignore share the same pattern syntax but has
separate matching implementation. Over the years, ignore's
implementation accumulates more optimizations while attr's stays the
same.

This patch adds those optimizations to .gitattributes. Basically it
tries to avoid fnmatch/wildmatch in favor of strncmp as much as
possible.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 attr.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++----------
 dir.c  |  4 ++--
 dir.h  |  2 ++
 3 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/attr.c b/attr.c
index eb576ac..3fde9fa 100644
--- a/attr.c
+++ b/attr.c
@@ -116,6 +116,13 @@ struct attr_state {
 	const char *setto;
 };
 
+struct pattern {
+	const char *pattern;
+	int patternlen;
+	int nowildcardlen;
+	int flags;		/* EXC_FLAG_* */
+};
+
 /*
  * One rule, as from a .gitattributes file.
  *
@@ -131,7 +138,7 @@ struct attr_state {
  * listed as they appear in the file (macros unexpanded).
  */
 struct match_attr {
-	const char *pattern;
+	struct pattern pat;
 	struct git_attr *attr;
 	char is_macro;
 	unsigned num_attr;
@@ -243,7 +250,13 @@ static struct match_attr *parse_attr_line(const char *line, const char *src,
 		char *p = (char *)&(res->state[num_attr]);
 		memcpy(p, name, namelen);
 		p[namelen] = 0;
-		res->pattern = p;
+		res->pat.pattern = p;
+		res->pat.patternlen = strlen(p);
+		res->pat.nowildcardlen = simple_length(p);
+		if (!strchr(p, '/'))
+			res->pat.flags |= EXC_FLAG_NODIR;
+		if (*p == '*' && no_wildcard(p+1))
+			res->pat.flags |= EXC_FLAG_ENDSWITH;
 	}
 	res->is_macro = is_macro;
 	res->num_attr = num_attr;
@@ -645,26 +658,56 @@ static void prepare_attr_stack(const char *path)
 
 static int path_matches(const char *pathname, int pathlen,
 			const char *basename,
-			const char *pattern,
+			const struct pattern *pat,
 			const char *base, int baselen)
 {
-	if (!strchr(pattern, '/')) {
+	const char *pattern = pat->pattern;
+	int prefix = pat->nowildcardlen;
+	const char *name;
+	int namelen;
+
+	if (pat->flags & EXC_FLAG_NODIR) {
+		if (prefix == pat->patternlen &&
+		    !strcmp_icase(pattern, basename))
+			return 1;
+
+		if (pat->flags & EXC_FLAG_ENDSWITH &&
+		    pat->patternlen - 1 <= pathlen &&
+		    !strcmp_icase(pattern + 1, pathname +
+				  pathlen - pat->patternlen + 1))
+			return 1;
+
 		return (fnmatch_icase(pattern, basename, 0) == 0);
 	}
 	/*
 	 * match with FNM_PATHNAME; the pattern has base implicitly
 	 * in front of it.
 	 */
-	if (*pattern == '/')
+	if (*pattern == '/') {
 		pattern++;
+		prefix--;
+	}
+
+	/*
+	 * note: unlike excluded_from_list, baselen here does not
+	 * contain the trailing slash
+	 */
+
 	if (pathlen < baselen ||
 	    (baselen && pathname[baselen] != '/') ||
 	    strncmp(pathname, base, baselen))
 		return 0;
-	if (baselen != 0)
-		baselen++;
-	return (ignore_case && iwildmatch(pattern, pathname + baselen)) ||
-		(!ignore_case && wildmatch(pattern, pathname + baselen));
+
+	namelen = baselen ? pathlen - baselen - 1 : pathlen;
+	name = pathname + pathlen - namelen;
+
+	/* if the non-wildcard part is longer than the remaining
+	   pathname, surely it cannot match */
+	if (!namelen || prefix > namelen)
+		return 0;
+
+	return (ignore_case && iwildmatch(pattern, name)) ||
+		(!ignore_case && wildmatch(pattern, name));
 }
 
 static int macroexpand_one(int attr_nr, int rem);
@@ -702,7 +745,7 @@ static int fill(const char *path, int pathlen, const char *basename,
 		if (a->is_macro)
 			continue;
 		if (path_matches(path, pathlen, basename,
-				 a->pattern, base, stk->originlen))
+				 &a->pat, base, stk->originlen))
 			rem = fill_one("fill", a, rem);
 	}
 	return rem;
diff --git a/dir.c b/dir.c
index 92cda82..fd49336 100644
--- a/dir.c
+++ b/dir.c
@@ -292,7 +292,7 @@ int match_pathspec_depth(const struct pathspec *ps,
 /*
  * Return the length of the "simple" part of a path match limiter.
  */
-static int simple_length(const char *match)
+int simple_length(const char *match)
 {
 	int len = -1;
 
@@ -304,7 +304,7 @@ static int simple_length(const char *match)
 	}
 }
 
-static int no_wildcard(const char *string)
+int no_wildcard(const char *string)
 {
 	return string[simple_length(string)] == '\0';
 }
diff --git a/dir.h b/dir.h
index 893465a..7ea8678 100644
--- a/dir.h
+++ b/dir.h
@@ -101,6 +101,8 @@ extern void add_exclude(const char *string, const char *base,
 			int baselen, struct exclude_list *which);
 extern void free_excludes(struct exclude_list *el);
 extern int file_exists(const char *);
+extern int simple_length(const char *match);
+extern int no_wildcard(const char *string);
 
 extern int is_inside_dir(const char *dir);
 extern int dir_inside_of(const char *subdir, const char *dir);
-- 
1.7.12.1.405.gb727dc9

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 5/6] gitignore: do not do basename match with patterns that have '**'
  2012-10-04  7:39         ` [PATCH 0/6] wildmatch part 2 Nguyễn Thái Ngọc Duy
                             ` (3 preceding siblings ...)
  2012-10-04  7:39           ` [PATCH 4/6] attr: more matching optimizations from .gitignore Nguyễn Thái Ngọc Duy
@ 2012-10-04  7:39           ` Nguyễn Thái Ngọc Duy
  2012-10-04 17:59             ` Junio C Hamano
  2012-10-05  7:01             ` Johannes Sixt
  2012-10-04  7:39           ` [PATCH 6/6] t3001: note about expected "**" behavior Nguyễn Thái Ngọc Duy
  2012-10-04 17:43           ` [PATCH 0/6] wildmatch part 2 Junio C Hamano
  6 siblings, 2 replies; 31+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-04  7:39 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

"**" can match slashes, not like "*". "ab**ef" should be able to match
"ab/cd/ef", or "ab/c/d/ef" and so on. Turn off the EXC_FLAG_NODIR in
this case otherwise the pattern is only checked against the base
name. This behavior is in sync with rsync.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/gitignore.txt        | 10 +++++-----
 attr.c                             |  2 +-
 dir.c                              |  2 +-
 t/t0003-attributes.sh              | 16 ++++++++++++++++
 t/t3001-ls-files-others-exclude.sh | 10 ++++++++++
 5 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/Documentation/gitignore.txt b/Documentation/gitignore.txt
index eb81d31..4dfe8bd 100644
--- a/Documentation/gitignore.txt
+++ b/Documentation/gitignore.txt
@@ -81,11 +81,11 @@ PATTERN FORMAT
    regular file or a symbolic link `foo` (this is consistent
    with the way how pathspec works in general in git).
 
- - If the pattern does not contain a slash '/', git treats it as
-   a shell glob pattern and checks for a match against the
-   pathname relative to the location of the `.gitignore` file
-   (relative to the toplevel of the work tree if not from a
-   `.gitignore` file).
+ - If the pattern does not contain a slash '/' nor '**', git
+   treats it as a shell glob pattern and checks for a match
+   against the pathname relative to the location of the
+   `.gitignore` file (relative to the toplevel of the work tree
+   if not from a `.gitignore` file).
 
  - Otherwise, git treats the pattern as a shell glob suitable
    for consumption by fnmatch(3) with the FNM_PATHNAME flag:
diff --git a/attr.c b/attr.c
index 3fde9fa..634b39c 100644
--- a/attr.c
+++ b/attr.c
@@ -253,7 +253,7 @@ static struct match_attr *parse_attr_line(const char *line, const char *src,
 		res->pat.pattern = p;
 		res->pat.patternlen = strlen(p);
 		res->pat.nowildcardlen = simple_length(p);
-		if (!strchr(p, '/'))
+		if (!strchr(p, '/') && !strstr(p, "**"))
 			res->pat.flags |= EXC_FLAG_NODIR;
 		if (*p == '*' && no_wildcard(p+1))
 			res->pat.flags |= EXC_FLAG_ENDSWITH;
diff --git a/dir.c b/dir.c
index fd49336..6a5de98 100644
--- a/dir.c
+++ b/dir.c
@@ -340,7 +340,7 @@ void add_exclude(const char *string, const char *base,
 	x->base = base;
 	x->baselen = baselen;
 	x->flags = flags;
-	if (!strchr(string, '/'))
+	if (!strchr(string, '/') && !strstr(string, "**"))
 		x->flags |= EXC_FLAG_NODIR;
 	x->nowildcardlen = simple_length(string);
 	if (*string == '*' && no_wildcard(string+1))
diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
index 6c3c554..9b534a0 100755
--- a/t/t0003-attributes.sh
+++ b/t/t0003-attributes.sh
@@ -249,4 +249,20 @@ EOF
 	test_line_count = 0 err
 '
 
+test_expect_success '"**" with no slashes test' '
+	echo "a**f foo=bar" >.gitattributes &&
+	cat <<\EOF >expect &&
+f: foo: unspecified
+a/f: foo: bar
+a/b/f: foo: bar
+a/b/c/f: foo: bar
+EOF
+	git check-attr foo -- "f" >actual 2>err &&
+	git check-attr foo -- "a/f" >>actual 2>>err &&
+	git check-attr foo -- "a/b/f" >>actual 2>>err &&
+	git check-attr foo -- "a/b/c/f" >>actual 2>>err &&
+	test_cmp expect actual &&
+	test_line_count = 0 err
+'
+
 test_done
diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh
index 67c8bcf..6a5a4ab 100755
--- a/t/t3001-ls-files-others-exclude.sh
+++ b/t/t3001-ls-files-others-exclude.sh
@@ -225,4 +225,14 @@ EOF
 	test_cmp expect actual
 '
 
+
+test_expect_success 'ls-files with "**" patterns and no slashes' '
+	cat <<\EOF >expect &&
+one/a.1
+one/two/a.1
+EOF
+	git ls-files -o -i --exclude "one**a.1" >actual
+	test_cmp expect actual
+'
+
 test_done
-- 
1.7.12.1.405.gb727dc9

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 6/6] t3001: note about expected "**" behavior
  2012-10-04  7:39         ` [PATCH 0/6] wildmatch part 2 Nguyễn Thái Ngọc Duy
                             ` (4 preceding siblings ...)
  2012-10-04  7:39           ` [PATCH 5/6] gitignore: do not do basename match with patterns that have '**' Nguyễn Thái Ngọc Duy
@ 2012-10-04  7:39           ` Nguyễn Thái Ngọc Duy
  2012-10-04 18:04             ` Junio C Hamano
  2012-10-04 17:43           ` [PATCH 0/6] wildmatch part 2 Junio C Hamano
  6 siblings, 1 reply; 31+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-04  7:39 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

"**" currently matches any characters including slashes. It's probably
too powerful. A more sensible definition may be match any characters
that the but the whole match must be wrapped by slashes. So "**" can
match none, "/", "/aaa/", "/aa/bb/" and so on but not "aa/bb".

Note it in the test suite.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 t/t3001-ls-files-others-exclude.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh
index 6a5a4ab..99b5f5c 100755
--- a/t/t3001-ls-files-others-exclude.sh
+++ b/t/t3001-ls-files-others-exclude.sh
@@ -235,4 +235,12 @@ EOF
 	test_cmp expect actual
 '
 
+# We might want ** to match at directory boundary, e.g. a**b matches
+# a/b, a/x/b, a/x/x/b... but not ax/xb.
+test_expect_failure 'ls-files with "**" patterns and no slashes' '
+	: >expect &&
+	git ls-files -o -i --exclude "o**a.1" >actual
+	test_cmp expect actual
+'
+
 test_done
-- 
1.7.12.1.405.gb727dc9

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)
  2012-10-02 23:20 What's cooking in git.git (Oct 2012, #01; Tue, 2) Junio C Hamano
  2012-10-03 15:23 ` Nguyen Thai Ngoc Duy
@ 2012-10-04  8:17 ` David Michael Barr
  2012-10-04  8:30   ` fa/remote-svn (Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)) Jonathan Nieder
  2012-10-04 16:27   ` What's cooking in git.git (Oct 2012, #01; Tue, 2) Junio C Hamano
  2012-10-30 12:15 ` Florian Achleitner
  2 siblings, 2 replies; 31+ messages in thread
From: David Michael Barr @ 2012-10-04  8:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Florian Achleitner, Dmitry Ivankov, Jonathan Nieder


On Wednesday, 3 October 2012 at 9:20 AM, Junio C Hamano wrote: 
> 
> * fa/remote-svn (2012-09-19) 16 commits
> - Add a test script for remote-svn
> - remote-svn: add marks-file regeneration
> - Add a svnrdump-simulator replaying a dump file for testing
> - remote-svn: add incremental import
> - remote-svn: Activate import/export-marks for fast-import
> - Create a note for every imported commit containing svn metadata
> - vcs-svn: add fast_export_note to create notes
> - Allow reading svn dumps from files via file:// urls
> - remote-svn, vcs-svn: Enable fetching to private refs
> - When debug==1, start fast-import with "--stats" instead of "--quiet"
> - Add documentation for the 'bidi-import' capability of remote-helpers
> - Connect fast-import to the remote-helper via pipe, adding 'bidi-import' capability
> - Add argv_array_detach and argv_array_free_detached
> - Add svndump_init_fd to allow reading dumps from arbitrary FDs
> - Add git-remote-testsvn to Makefile
> - Implement a remote helper for svn in C
> (this branch is used by fa/vcs-svn.)
> 
> A GSoC project.
> Waiting for comments from mentors and stakeholders.

I have reviewed this topic and am happy with the design and implementation.
I support this topic for inclusion.

Acked-by: David Michael Barr <b@rr-dav.id.au>
> 
> * fa/vcs-svn (2012-09-19) 4 commits
> - vcs-svn: remove repo_tree
> - vcs-svn/svndump: rewrite handle_node(), begin|end_revision()
> - vcs-svn/svndump: restructure node_ctx, rev_ctx handling
> - svndump: move struct definitions to .h
> (this branch uses fa/remote-svn.)
> 
> A GSoC project.
> Waiting for comments from mentors and stakeholders.

This follow-on topic I'm not so sure on, some of the design decisions make me uncomfortable and I need some convincing before I can get behind this topic. 

--
David Michael Barr

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: fa/remote-svn (Re: What's cooking in git.git (Oct 2012, #01; Tue, 2))
  2012-10-04  8:17 ` David Michael Barr
@ 2012-10-04  8:30   ` Jonathan Nieder
  2012-10-04 13:16     ` Stephen Bash
  2012-10-04 16:27   ` What's cooking in git.git (Oct 2012, #01; Tue, 2) Junio C Hamano
  1 sibling, 1 reply; 31+ messages in thread
From: Jonathan Nieder @ 2012-10-04  8:30 UTC (permalink / raw)
  To: David Michael Barr
  Cc: Junio C Hamano, git, Florian Achleitner, Dmitry Ivankov

David Michael Barr wrote:
> On Wednesday, 3 October 2012 at 9:20 AM, Junio C Hamano wrote: 

>> * fa/remote-svn (2012-09-19) 16 commits
>> - Add a test script for remote-svn
>> - remote-svn: add marks-file regeneration
>> - Add a svnrdump-simulator replaying a dump file for testing
>> - remote-svn: add incremental import
>> - remote-svn: Activate import/export-marks for fast-import
>> - Create a note for every imported commit containing svn metadata
>> - vcs-svn: add fast_export_note to create notes
>> - Allow reading svn dumps from files via file:// urls
>> - remote-svn, vcs-svn: Enable fetching to private refs
>> - When debug==1, start fast-import with "--stats" instead of "--quiet"
>> - Add documentation for the 'bidi-import' capability of remote-helpers
>> - Connect fast-import to the remote-helper via pipe, adding 'bidi-import' capability
>> - Add argv_array_detach and argv_array_free_detached
>> - Add svndump_init_fd to allow reading dumps from arbitrary FDs
>> - Add git-remote-testsvn to Makefile
>> - Implement a remote helper for svn in C
>> (this branch is used by fa/vcs-svn.)
>>
>> A GSoC project.
>> Waiting for comments from mentors and stakeholders.
>
> I have reviewed this topic and am happy with the design and implementation.
> I support this topic for inclusion.

Thanks!  I'll try moving the tests to the first patch and trying it
and hopefully send out a branch to pull tomorrow.

If I don't send anything tomorrow, that's probably a sign that I never
will, so since I like the goal of the series I guess it would be a
kind of implied ack.

Jonathan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)
  2012-10-03 18:17   ` Junio C Hamano
  2012-10-04  1:56     ` Nguyen Thai Ngoc Duy
@ 2012-10-04  9:34     ` Michael Haggerty
  2012-10-04 11:46       ` Nguyen Thai Ngoc Duy
  2012-10-04 16:39       ` Junio C Hamano
  1 sibling, 2 replies; 31+ messages in thread
From: Michael Haggerty @ 2012-10-04  9:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nguyen Thai Ngoc Duy, git

On 10/03/2012 08:17 PM, Junio C Hamano wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
> 
>> There's an interesting case: "**foo". According to our rules, that
>> pattern does not contain slashes therefore is basename match. But some
>> might find that confusing because "**" can match slashes,...
> 
> By "our rules", if you mean "if a pattern has slash, it is anchored",
> that obviously need to be updated with this series, if "**" is meant
> to match multiple hierarchies.
>> I think the latter makes more sense. When users put "**" they expect
>> to match some slashes. But that may call for a refactoring in
>> path_matches() in attr.c. Putting strstr(pattern, "**") in that
>> matching function may increase overhead unnecessarily.
>>
>> The third option is just die() and let users decide either "*foo",
>> "**/foo" or "/**foo", never "**foo".
> 
> For the double-star at the beginning, you should just turn it into "**/"
> if it is not followed by a slash internally, I think.
> 
> What is the semantics of ** in the first place?  Is it described to
> a reasonable level of detail in the documentation updates?  For
> example does "**foo" match "afoo", "a/b/foo", "a/bfoo", "a/foo/b",
> "a/bfoo/c"?  Does "x**y" match "xy", "xay", "xa/by", "x/a/y"?
> 
> I am guessing that the only sensible definition is that "**"
> requires anything that comes before it (if exists) is at a proper
> hierarchy boundary, and anything matches it is also at a proper
> hierarchy boundary, so "x**y" matches "x/a/y" and not "xy", "xay",
> nor "xa/by" in the above example.  If "x**y" can match "xy" or "xay"
> (or "**foo" can match "afoo"), it would be unreasonable to say it
> implies the pattern is anchored at any level, no?

Given that there is no obvious interpretation for what a construct like
"x**y" would mean, and many plausible guesses (most of which sound
rather useless), I suggest that we forbid it.  This will make the
feature easier to explain and make .gitignore files that use it easier
to understand.

I think that 98% of the usefulness of "**" would be in constructs where
it replaces a proper part of the pathname, like "**/SOMETHING" or
"SOMETHING/**/SOMETHING"; in other words, where its use matches the
regexp "(^|/)\*\*/".  In these constructs the only ambiguity is whether
"**/" matches regexp

    "([^/]+/)+"

or

    "([^/]+/)*"

(e.g., whether "foo/**/bar" matches "foo/bar").  I personally prefer the
second, because the first behavior can be had using the second
interpretation by using "SOMETHING/*/**/SOMETHING", whereas the second
behavior cannot be implemented in terms of the first in a single line of
the .gitignore file.

Optionally, one might also like to support "SOMETHING/**" or "**" alone
in the obvious ways.

As for the implementation, it is quite easy to textually convert a glob
pattern, including "**" parts, into a regexp.  I happen to have written
some Python code that does this for another project (see below).  An
obvious optimization would be to read any literal parts of the path off
the beginning of the glob pattern and only use regexps for the tail
part.  Would a regexp-based implementation be too slow?

Michael

_filename_char_pattern = r'[^/]'
_glob_patterns = [
    ('?', _filename_char_pattern),
    ('/**', r'(/.+)?'),
    ('**/', r'(.+/)?'),
    ('*', _filename_char_pattern + r'*'),
    ]


def glob_to_regexp(pattern):
    pattern = os.path.normpath(pattern) # remove trivial redundancies

    if pattern == '**':
        # This case has to be handled separately because it doesn't
        # involve a '/' character adjacent to the '**' pattern.  (Such
        # slashes otherwise have to be considered part of the pattern
        # to handle the matching of zero path components.)
        return re.compile(
            r'^' + _filename_char_pattern + r'(.+' +
_filename_char_pattern + r')?$'
            )

    regexp = [r'^']
    i = 0
    while i < len(pattern):
        for (s, r) in _glob_patterns:
            if pattern.startswith(s, i):
                regexp.append(r)
                i += len(s)
                break
        else:
            # AFAIK it's a normal character.  Escape it and add it to
            # pattern.
            regexp.append(re.escape(pattern[i]))
            i += 1

    regexp.append(r'$')

    return re.compile(''.join(regexp))




-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)
  2012-10-04  9:34     ` What's cooking in git.git (Oct 2012, #01; Tue, 2) Michael Haggerty
@ 2012-10-04 11:46       ` Nguyen Thai Ngoc Duy
  2012-10-04 15:17         ` Michael Haggerty
  2012-10-04 16:39       ` Junio C Hamano
  1 sibling, 1 reply; 31+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-10-04 11:46 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, git

On Thu, Oct 4, 2012 at 4:34 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
On Thu, Oct 4, 2012 at 4:34 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> Given that there is no obvious interpretation for what a construct like
> "x**y" would mean, and many plausible guesses (most of which sound
> rather useless), I suggest that we forbid it.  This will make the
> feature easier to explain and make .gitignore files that use it easier
> to understand.

Yep, sounds like a good short term plan.

> As for the implementation, it is quite easy to textually convert a glob
> pattern, including "**" parts, into a regexp.

Or we could introduce regexp syntax as an alternative and let users
choose (and pay associated price). Patterns starting with // are never
matched (we don't normalize paths in .gitignore). Any patterns started
with "//regex:" is followed by regex. Reject all other // patterns for
future use.

> _filename_char_pattern = r'[^/]'
> _glob_patterns = [
>     ('?', _filename_char_pattern),
>     ('/**', r'(/.+)?'),
>     ('**/', r'(.+/)?'),
>     ('*', _filename_char_pattern + r'*'),
>     ]

I don't fully understand the rest (never been a big fan of python) but
what about bracket expressions like [!abc] and [:alnum:]?
-- 
Duy

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: fa/remote-svn (Re: What's cooking in git.git (Oct 2012, #01; Tue, 2))
  2012-10-04  8:30   ` fa/remote-svn (Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)) Jonathan Nieder
@ 2012-10-04 13:16     ` Stephen Bash
  2012-10-04 16:30       ` Junio C Hamano
  0 siblings, 1 reply; 31+ messages in thread
From: Stephen Bash @ 2012-10-04 13:16 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Junio C Hamano, git, Florian Achleitner, Dmitry Ivankov,
	David Michael Barr

----- Original Message -----
> From: "Jonathan Nieder" <jrnieder@gmail.com>
> Sent: Thursday, October 4, 2012 4:30:01 AM
> Subject: Re: fa/remote-svn (Re: What's cooking in git.git (Oct 2012, #01; Tue, 2))
> 
> > > * fa/remote-svn (2012-09-19) 16 commits
> > > - Add a test script for remote-svn
> > > - remote-svn: add marks-file regeneration
> > > - Add a svnrdump-simulator replaying a dump file for testing
> > > - remote-svn: add incremental import
> > > - remote-svn: Activate import/export-marks for fast-import
> > > - Create a note for every imported commit containing svn metadata
> > > - vcs-svn: add fast_export_note to create notes
> > > - Allow reading svn dumps from files via file:// urls
> > > - remote-svn, vcs-svn: Enable fetching to private refs
> > > - When debug==1, start fast-import with "--stats" instead of
> > > "--quiet"
> > > - Add documentation for the 'bidi-import' capability of
> > > remote-helpers
> > > - Connect fast-import to the remote-helper via pipe, adding
> > > 'bidi-import' capability
> > > - Add argv_array_detach and argv_array_free_detached
> > > - Add svndump_init_fd to allow reading dumps from arbitrary FDs
> > > - Add git-remote-testsvn to Makefile
> > > - Implement a remote helper for svn in C
> > > (this branch is used by fa/vcs-svn.)
> > >
> > > A GSoC project.
> > > Waiting for comments from mentors and stakeholders.
> >
> > I have reviewed this topic and am happy with the design and
> > implementation.  I support this topic for inclusion.
> 
> Thanks!  I'll try moving the tests to the first patch and trying it
> and hopefully send out a branch to pull tomorrow.
> 
> If I don't send anything tomorrow, that's probably a sign that I never
> will, so since I like the goal of the series I guess it would be a
> kind of implied ack.

I seemed to have missed the GSoC wrap up conversation... (links happily
accepted)  Looking at the big picture (as much as I can remember) it
seems to me the missing pieces now are branch mapping (lots of hard
work), and possibly parts (all?) of the "push to SVN" functionality?

Thoughts?

Thanks,
Stephen

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)
  2012-10-04 11:46       ` Nguyen Thai Ngoc Duy
@ 2012-10-04 15:17         ` Michael Haggerty
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Haggerty @ 2012-10-04 15:17 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Junio C Hamano, git

On 10/04/2012 01:46 PM, Nguyen Thai Ngoc Duy wrote:
> On Thu, Oct 4, 2012 at 4:34 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> As for the implementation, it is quite easy to textually convert a glob
>> pattern, including "**" parts, into a regexp.
> 
> Or we could introduce regexp syntax as an alternative and let users
> choose (and pay associated price).

It seems like overkill to me.  For filenames, globs are usually adequate.

>> _filename_char_pattern = r'[^/]'
>> _glob_patterns = [
>>     ('?', _filename_char_pattern),
>>     ('/**', r'(/.+)?'),
>>     ('**/', r'(.+/)?'),
>>     ('*', _filename_char_pattern + r'*'),
>>     ]
> 
> I don't fully understand the rest (never been a big fan of python) but
> what about bracket expressions like [!abc] and [:alnum:]?

You're right; I forgot that the code that I posted doesn't support brackets.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)
  2012-10-04  8:17 ` David Michael Barr
  2012-10-04  8:30   ` fa/remote-svn (Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)) Jonathan Nieder
@ 2012-10-04 16:27   ` Junio C Hamano
  1 sibling, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2012-10-04 16:27 UTC (permalink / raw)
  To: David Michael Barr
  Cc: git, Florian Achleitner, Dmitry Ivankov, Jonathan Nieder

David Michael Barr <b@rr-dav.id.au> writes:

> On Wednesday, 3 October 2012 at 9:20 AM, Junio C Hamano wrote: 
>> 
>> * fa/remote-svn (2012-09-19) 16 commits
>> ...
>> 
>> A GSoC project.
>> Waiting for comments from mentors and stakeholders.
>
> I have reviewed this topic and am happy with the design and implementation.
> I support this topic for inclusion.
>
> Acked-by: David Michael Barr <b@rr-dav.id.au>
>> 
>> * fa/vcs-svn (2012-09-19) 4 commits
>> ...
>
> This follow-on topic I'm not so sure on, some of the design
> decisions make me uncomfortable and I need some convincing before
> I can get behind this topic.

Thanks for a feedback.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: fa/remote-svn (Re: What's cooking in git.git (Oct 2012, #01; Tue, 2))
  2012-10-04 13:16     ` Stephen Bash
@ 2012-10-04 16:30       ` Junio C Hamano
  0 siblings, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2012-10-04 16:30 UTC (permalink / raw)
  To: Stephen Bash
  Cc: Jonathan Nieder, git, Florian Achleitner, Dmitry Ivankov,
	David Michael Barr

Stephen Bash <bash@genarts.com> writes:

> I seemed to have missed the GSoC wrap up conversation... (links happily
> accepted).

I also seem to have missed such conversation, if anything like that
happened.  It certainly would have been nice for the mentors and the
student for each project to give us a two-to-three-paragraphs
summary.

As GSoC is a Google event and not the Git community one, I wouldn't
*demand* a concluding write-up, but it still took considerable
community resource, so...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)
  2012-10-04  9:34     ` What's cooking in git.git (Oct 2012, #01; Tue, 2) Michael Haggerty
  2012-10-04 11:46       ` Nguyen Thai Ngoc Duy
@ 2012-10-04 16:39       ` Junio C Hamano
  2012-10-05 12:19         ` Andreas Schwab
  2012-10-05 13:21         ` Nguyen Thai Ngoc Duy
  1 sibling, 2 replies; 31+ messages in thread
From: Junio C Hamano @ 2012-10-04 16:39 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Nguyen Thai Ngoc Duy, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> On 10/03/2012 08:17 PM, Junio C Hamano wrote:
>> 
>> What is the semantics of ** in the first place?  Is it described to
>> a reasonable level of detail in the documentation updates?  For
>> example does "**foo" match "afoo", "a/b/foo", "a/bfoo", "a/foo/b",
>> "a/bfoo/c"?  Does "x**y" match "xy", "xay", "xa/by", "x/a/y"?
>> 
>> I am guessing that the only sensible definition is that "**"
>> requires anything that comes before it (if exists) is at a proper
>> hierarchy boundary, and anything matches it is also at a proper
>> hierarchy boundary, so "x**y" matches "x/a/y" and not "xy", "xay",
>> nor "xa/by" in the above example.  If "x**y" can match "xy" or "xay"
>> (or "**foo" can match "afoo"), it would be unreasonable to say it
>> implies the pattern is anchored at any level, no?
>
> Given that there is no obvious interpretation for what a construct like
> "x**y" would mean, and many plausible guesses (most of which sound
> rather useless), I suggest that we forbid it.  This will make the
> feature easier to explain and make .gitignore files that use it easier
> to understand.
>
> I think that 98% of the usefulness of "**" would be in constructs where
> it replaces a proper part of the pathname, like "**/SOMETHING" or
> "SOMETHING/**/SOMETHING"...

I think it is a good way to go in the longer term, if we all agree
that "**" matching anything does not give us a useful semantics
[*1*].

Is it something we can easily get by simple patch into the wildmatch
code?  I'd hate to see us parsing the input and validating it before
passing it to the library, as we will surely botch the quoting or
something while doing so.

When we require "x/**/y", I think we still want it to match "x/y".
Do people agree, or are there good reasons to require at least one
level between x and y for such a pattern?  Assuming that we do want
to match "x/y" with "x/**/y", I suspect that "'**' matches anything
including a slash" would not give us that semantics. Is it something
we can easily fix in the wildmatch code?


[Footnote]

*1* The message you are responding to was written in a somewhat
provocative way on purpose so that people who like the way rsync
matches "**" can vocally object. I would like to see arguments from
the both sides to see if it makes sense.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/6] wildmatch part 2
  2012-10-04  7:39         ` [PATCH 0/6] wildmatch part 2 Nguyễn Thái Ngọc Duy
                             ` (5 preceding siblings ...)
  2012-10-04  7:39           ` [PATCH 6/6] t3001: note about expected "**" behavior Nguyễn Thái Ngọc Duy
@ 2012-10-04 17:43           ` Junio C Hamano
  6 siblings, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2012-10-04 17:43 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> On Thu, Oct 4, 2012 at 1:01 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Perhaps the wildmatch code may not be what we want X-<.
>
> When I imported wildmatch I was hoping to make minimum changes to it.
> But wildmatch is probably the only practical way to support "**" even
> if we later need to change it the way we want. Other options are base
> our work on top of compat/fnmatch.c, which is an #ifdef spaghetti
> mess, or write a new fnmatch()-compatible function. Both unattractive
> to me.
>
> Anyway, this is on top of nd/wildmatch, which makes "ab**cd" match
> full pathname.

I do not think we are in a hurry to push "**" support in before we
know what semantics we want to get out of it.  Pushing half-baked
"this is good enough at least to me for now" topics before they are
ready will cost the users in the longer term.

On the other hand, the three patches (2/3/4) in this series look
like a good improvement regardless of what kind of matching engine
we use.  I would have preferred to see them _before_ nd/wildmatch.

I do not agree with the reasoning behind [1/6] that changes

	union {
		char *pattern;
		strict git_attr *attr;
	} u;
	char is_macro;

to

	char *pattern;
	strict git_attr *attr;
	char is_macro;

by the way.

The union is much less about space saving but is more about the
nature of the usage; we use pattern or attr but not both at the same
time.  Even if the pattern becomes richer with later patches, that
does not change the fundamental premise that depending on the value
of "is_macro", either "attr" is used or "pattern" is used but they
won't be in effect at the same time.  The evolution of this should
go more like this, I think:

	union {
		struct {
			const char *string;
			int pattern_length;
			int prefix_literal_length;
			int flags;
		} pattern;
		struct git_attr *attr;
	} u;
	char is_macro;

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/6] gitignore: do not do basename match with patterns that have '**'
  2012-10-04  7:39           ` [PATCH 5/6] gitignore: do not do basename match with patterns that have '**' Nguyễn Thái Ngọc Duy
@ 2012-10-04 17:59             ` Junio C Hamano
  2012-10-05  7:01             ` Johannes Sixt
  1 sibling, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2012-10-04 17:59 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> -		if (!strchr(p, '/'))
> +		if (!strchr(p, '/') && !strstr(p, "**"))

Doesn't wildmatch allow these to be quoted, similar to the way usual
glob works, e.g.

	$ >ff
        $ >\?f
        $ echo ??
        ?f ff
        $ echo \?f
        ?f

Even If wildmatch out-of-the-box doesn't, I would assume that we
would want to fix it so that it does.  And if that is the case,
we would want to be careful about "two/asterisks\**in path" to
avoid triggering this logic, no?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 6/6] t3001: note about expected "**" behavior
  2012-10-04  7:39           ` [PATCH 6/6] t3001: note about expected "**" behavior Nguyễn Thái Ngọc Duy
@ 2012-10-04 18:04             ` Junio C Hamano
  0 siblings, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2012-10-04 18:04 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> "**" currently matches any characters including slashes. It's probably
> too powerful. A more sensible definition may be match any characters
> that the but the whole match must be wrapped by slashes. So "**" can
> match none, "/", "/aaa/", "/aa/bb/" and so on but not "aa/bb".

I do not think this is something we want to retroactively change
after releasing it to the public, especially when we _know_ it is a
problem from the get-go (unlike the case we did not notice it had a
problem, release it to the public and then realize it and have to
scramble to devise a fix to bring more sanity in a backward
compatible way).

We must either declare that "**" that matches any characters is the
sane semantics and promise we will never change it, or have "**"
that matches \(/[^/]*\)*/ (sorry for a line noise^W^W^Wregexp) from
the beginning.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/6] gitignore: do not do basename match with patterns that have '**'
  2012-10-04  7:39           ` [PATCH 5/6] gitignore: do not do basename match with patterns that have '**' Nguyễn Thái Ngọc Duy
  2012-10-04 17:59             ` Junio C Hamano
@ 2012-10-05  7:01             ` Johannes Sixt
  2012-10-05 11:23               ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 31+ messages in thread
From: Johannes Sixt @ 2012-10-05  7:01 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Junio C Hamano

Am 10/4/2012 9:39, schrieb Nguyễn Thái Ngọc Duy:
> - - If the pattern does not contain a slash '/', git treats it as
> -   a shell glob pattern and checks for a match against the
> -   pathname relative to the location of the `.gitignore` file
> -   (relative to the toplevel of the work tree if not from a
> -   `.gitignore` file).
> + - If the pattern does not contain a slash '/' nor '**', git
> +   treats it as a shell glob pattern and checks for a match
> +   against the pathname relative to the location of the
> +   `.gitignore` file (relative to the toplevel of the work tree
> +   if not from a `.gitignore` file).

> +test_expect_success '"**" with no slashes test' '
> +	echo "a**f foo=bar" >.gitattributes &&
> +	cat <<\EOF >expect &&
> +f: foo: unspecified
> +a/f: foo: bar
> +a/b/f: foo: bar
> +a/b/c/f: foo: bar
> +EOF

Should the above .gitattributes match nested paths, such as b/a/c/f?

I think it should, because the user can easily say "/a**f" that nested
paths should not be matched.

But if it does not match, as your documentation update implies, which
options does the user have to match nested paths? Only to add more
patterns for each nested directory, such as "b/a**f".

-- Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/6] gitignore: do not do basename match with patterns that have '**'
  2012-10-05  7:01             ` Johannes Sixt
@ 2012-10-05 11:23               ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 31+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-10-05 11:23 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Junio C Hamano

On Fri, Oct 5, 2012 at 2:01 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
> Am 10/4/2012 9:39, schrieb Nguyễn Thái Ngọc Duy:
>> - - If the pattern does not contain a slash '/', git treats it as
>> -   a shell glob pattern and checks for a match against the
>> -   pathname relative to the location of the `.gitignore` file
>> -   (relative to the toplevel of the work tree if not from a
>> -   `.gitignore` file).
>> + - If the pattern does not contain a slash '/' nor '**', git
>> +   treats it as a shell glob pattern and checks for a match
>> +   against the pathname relative to the location of the
>> +   `.gitignore` file (relative to the toplevel of the work tree
>> +   if not from a `.gitignore` file).

I think in the latest round, we forbid this case (i.e. a/**, **/b and
a/**/b are ok, but a**b is not), exactly because it's hard to define
how it should do. Thanks for another example.

>> +test_expect_success '"**" with no slashes test' '
>> +     echo "a**f foo=bar" >.gitattributes &&
>> +     cat <<\EOF >expect &&
>> +f: foo: unspecified
>> +a/f: foo: bar
>> +a/b/f: foo: bar
>> +a/b/c/f: foo: bar
>> +EOF
>
> Should the above .gitattributes match nested paths, such as b/a/c/f?
>
> I think it should, because the user can easily say "/a**f" that nested
> paths should not be matched.

The user can also say **/a/**f to match b/a/c/f.
-- 
Duy

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)
  2012-10-04 16:39       ` Junio C Hamano
@ 2012-10-05 12:19         ` Andreas Schwab
  2012-10-05 12:30           ` Matthieu Moy
  2012-10-05 13:21         ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 31+ messages in thread
From: Andreas Schwab @ 2012-10-05 12:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michael Haggerty, Nguyen Thai Ngoc Duy, git

Junio C Hamano <gitster@pobox.com> writes:

> When we require "x/**/y", I think we still want it to match "x/y".

FWIW, in bash (+extglob), ksh and zsh it doesn't.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)
  2012-10-05 12:19         ` Andreas Schwab
@ 2012-10-05 12:30           ` Matthieu Moy
  2012-10-05 14:15             ` Andreas Schwab
  0 siblings, 1 reply; 31+ messages in thread
From: Matthieu Moy @ 2012-10-05 12:30 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Junio C Hamano, Michael Haggerty, Nguyen Thai Ngoc Duy, git

Andreas Schwab <schwab@linux-m68k.org> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> When we require "x/**/y", I think we still want it to match "x/y".
>
> FWIW, in bash (+extglob), ksh and zsh it doesn't.

You're right about bash, but I see the opposite for zsh and ksh:

zsh$ echo x/**/y
x/y x/z/y

ksh$ echo x/**/y
x/y x/z/y

(didn't check the doc so see whether this was configurable, but I've set
HOME=/ when launching the shell to disable my own configuration)

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)
  2012-10-04 16:39       ` Junio C Hamano
  2012-10-05 12:19         ` Andreas Schwab
@ 2012-10-05 13:21         ` Nguyen Thai Ngoc Duy
  1 sibling, 0 replies; 31+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-10-05 13:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michael Haggerty, git

On Thu, Oct 04, 2012 at 09:39:02AM -0700, Junio C Hamano wrote:
> Assuming that we do want to match "x/y" with "x/**/y", I suspect
> that "'**' matches anything including a slash" would not give us
> that semantics. Is it something we can easily fix in the wildmatch
> code?

Something like this may suffice. Lightly tested with "git add -n".
Reading the code, I think we can even distinguish "match zero or more
directories" and "match one or more directories" with "/**/" and maybe
"/***/". Right now **, ***, ****... are the same. So are /**/, /***/,
/****/...

-- 8< --
diff --git a/wildmatch.c b/wildmatch.c
index f153f8a..81eadc8 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -98,8 +98,12 @@ static int dowild(const uchar *p, const uchar *text,
 	    continue;
 	  case '*':
 	    if (*++p == '*') {
+		int slashstarstar = p[-2] == '/';
 		while (*++p == '*') {}
 		special = TRUE;
+		if (slashstarstar && *p == '/' &&
+		    dowild(p + 1, text, a, force_lower_case) == TRUE)
+		    return TRUE;
 	    } else
 		special = FALSE;
 	    if (*p == '\0') {
-- 8< --
-- 
Duy

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)
  2012-10-05 12:30           ` Matthieu Moy
@ 2012-10-05 14:15             ` Andreas Schwab
  0 siblings, 0 replies; 31+ messages in thread
From: Andreas Schwab @ 2012-10-05 14:15 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: Junio C Hamano, Michael Haggerty, Nguyen Thai Ngoc Duy, git

Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes:

> Andreas Schwab <schwab@linux-m68k.org> writes:
>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> When we require "x/**/y", I think we still want it to match "x/y".
>>
>> FWIW, in bash (+extglob), ksh and zsh it doesn't.
>
> You're right about bash, but I see the opposite for zsh and ksh:
>
> zsh$ echo x/**/y
> x/y x/z/y
>
> ksh$ echo x/**/y
> x/y x/z/y

Looks like this is different between filename expansion and case pattern
matching (I only tested the latter).

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)
  2012-10-02 23:20 What's cooking in git.git (Oct 2012, #01; Tue, 2) Junio C Hamano
  2012-10-03 15:23 ` Nguyen Thai Ngoc Duy
  2012-10-04  8:17 ` David Michael Barr
@ 2012-10-30 12:15 ` Florian Achleitner
  2 siblings, 0 replies; 31+ messages in thread
From: Florian Achleitner @ 2012-10-30 12:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, David Michael Barr, Dmitry Ivankov, Jonathan Nieder

Sorry for reacting so late, I didn't read the list carefully in the last weeks 
and my gmail filter somehow didn't trigger on that.

On Tuesday 02 October 2012 16:20:22 Junio C Hamano wrote:
> * fa/remote-svn (2012-09-19) 16 commits
>  - Add a test script for remote-svn
>  - remote-svn: add marks-file regeneration
>  - Add a svnrdump-simulator replaying a dump file for testing
>  - remote-svn: add incremental import
>  - remote-svn: Activate import/export-marks for fast-import
>  - Create a note for every imported commit containing svn metadata
>  - vcs-svn: add fast_export_note to create notes
>  - Allow reading svn dumps from files via file:// urls
>  - remote-svn, vcs-svn: Enable fetching to private refs
>  - When debug==1, start fast-import with "--stats" instead of "--quiet"
>  - Add documentation for the 'bidi-import' capability of remote-helpers
>  - Connect fast-import to the remote-helper via pipe, adding 'bidi-import'
> capability - Add argv_array_detach and argv_array_free_detached
>  - Add svndump_init_fd to allow reading dumps from arbitrary FDs
>  - Add git-remote-testsvn to Makefile
>  - Implement a remote helper for svn in C
>  (this branch is used by fa/vcs-svn.)
> 
>  A GSoC project.
>  Waiting for comments from mentors and stakeholders.

>From my point of view, this is rather complete. It got eight review cycles on 
the list.
Note that the remote helper can only fetch, pushing is not possible at all.

> 
> 
> * fa/vcs-svn (2012-09-19) 4 commits
>  - vcs-svn: remove repo_tree
>  - vcs-svn/svndump: rewrite handle_node(), begin|end_revision()
>  - vcs-svn/svndump: restructure node_ctx, rev_ctx handling
>  - svndump: move struct definitions to .h
>  (this branch uses fa/remote-svn.)
> 
>  A GSoC project.
>  Waiting for comments from mentors and stakeholders.

This is the result of what I did when I wanted to start implementing branch 
detection. I found that the existing code is not suitable and restructured it.

The main goal is to seperate svn revision parsing from git commit creation. 
Because for creating commits, you need to know on which branch to create the 
commit.
While for finding out which branch is the right one, you need to read the 
complete svn revision first to see what dirs are changed and how.

It is rather invasive and it doesn't make sense without using it later on.
So I'm not surprised that you may not like it.
Anyways it passes all existing tests (that doesn't mean it's good of course 
;))

Florian

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2012-10-30 13:17 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-02 23:20 What's cooking in git.git (Oct 2012, #01; Tue, 2) Junio C Hamano
2012-10-03 15:23 ` Nguyen Thai Ngoc Duy
2012-10-03 18:17   ` Junio C Hamano
2012-10-04  1:56     ` Nguyen Thai Ngoc Duy
2012-10-04  6:01       ` Junio C Hamano
2012-10-04  7:39         ` [PATCH 0/6] wildmatch part 2 Nguyễn Thái Ngọc Duy
2012-10-04  7:39           ` [PATCH 1/6] attr: remove the union in struct match_attr Nguyễn Thái Ngọc Duy
2012-10-04  7:39           ` [PATCH 2/6] attr: avoid strlen() on every match Nguyễn Thái Ngọc Duy
2012-10-04  7:39           ` [PATCH 3/6] attr: avoid searching for basename " Nguyễn Thái Ngọc Duy
2012-10-04  7:39           ` [PATCH 4/6] attr: more matching optimizations from .gitignore Nguyễn Thái Ngọc Duy
2012-10-04  7:39           ` [PATCH 5/6] gitignore: do not do basename match with patterns that have '**' Nguyễn Thái Ngọc Duy
2012-10-04 17:59             ` Junio C Hamano
2012-10-05  7:01             ` Johannes Sixt
2012-10-05 11:23               ` Nguyen Thai Ngoc Duy
2012-10-04  7:39           ` [PATCH 6/6] t3001: note about expected "**" behavior Nguyễn Thái Ngọc Duy
2012-10-04 18:04             ` Junio C Hamano
2012-10-04 17:43           ` [PATCH 0/6] wildmatch part 2 Junio C Hamano
2012-10-04  9:34     ` What's cooking in git.git (Oct 2012, #01; Tue, 2) Michael Haggerty
2012-10-04 11:46       ` Nguyen Thai Ngoc Duy
2012-10-04 15:17         ` Michael Haggerty
2012-10-04 16:39       ` Junio C Hamano
2012-10-05 12:19         ` Andreas Schwab
2012-10-05 12:30           ` Matthieu Moy
2012-10-05 14:15             ` Andreas Schwab
2012-10-05 13:21         ` Nguyen Thai Ngoc Duy
2012-10-04  8:17 ` David Michael Barr
2012-10-04  8:30   ` fa/remote-svn (Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)) Jonathan Nieder
2012-10-04 13:16     ` Stephen Bash
2012-10-04 16:30       ` Junio C Hamano
2012-10-04 16:27   ` What's cooking in git.git (Oct 2012, #01; Tue, 2) Junio C Hamano
2012-10-30 12:15 ` Florian Achleitner

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).