From: Erik Cervin Edin <erik@cervined.in>
To: Tao Klerks <tao@klerks.biz>
Cc: "Elijah Newren" <newren@gmail.com>,
"brian m. carlson" <sandals@crustytoothpaste.net>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
git <git@vger.kernel.org>
Subject: Re: icase pathspec magic support in ls-tree
Date: Fri, 14 Oct 2022 14:00:07 +0200 [thread overview]
Message-ID: <CA+JQ7M9nEHeALeHKO465xsNwmP8C3TXXDjuXAN9cFMmC-XEJnA@mail.gmail.com> (raw)
In-Reply-To: <CAPMMpog94YUDPZswcGZ0ns10QXhaWOGmE95mgZEpdcx4GKsV3w@mail.gmail.com>
On Fri, Oct 14, 2022 at 10:58 AM Tao Klerks <tao@klerks.biz> wrote:
>
> I don't understand this suggestion; doesn't it only catch duplicates
> where both instances were introduced in the same 100-commit range?
Yes. It was a bit half-baked but the main idea was to limit the tree
to a smaller subset (and not the whole tree) and incrementally
checking for introduced duplicates instead of a full tree search. I
think that's basically Elijah's idea. Get all (added?) files
introduced in a certain revision range (last change, since yesterday
etc.) and then only check those against the tree for duplicates in a
manner of how you define duplicates
On Fri, Oct 14, 2022 at 10:50 AM Tao Klerks <tao@klerks.biz> wrote:
>
> Directories have been the problem, in "my" repo, around one-third of
> the time - typically someone does a directory rename, and someone else
> does a bad merge and reintroduces the old directory.
That adds a bit of complexity :/
but should still be doable.
Not perfect but maybe something along these lines? (caveat, possibly GNU only)
#!/bin/sh
# files added between revisions x y
added_files() {
git diff --diff-filter=A --name-only --no-renames $1 $2 ;
}
# folders of those added files
added_folders() {
added_files $1 $2 |
sed -e '/[^\/]*/s@^@./@' -e 's@/[^/]*$@/@' |
sort -u ;
}
# all files tracked by git in *those* folders at HEAD
possible_dupes() {
added_folders $1 $2 |
xargs git ls-tree --name-only HEAD ;
}
# case insensitive columns separated by \x1
# eg.
#path\x1PaTh
#path\x1path
case_insensitive() {
sed -e 's@.*@\L\0\E\x1\0@' |
sort ;
}
x=$1
y=$2
# Find all duplicates paths (case insensitive)
# in directories which were added between $x $y
possible_dupes $x $y |
case_insensitive |
awk -F '\x1' '
# actual "duplicate" paths, column $2
# as determined by case-insensitive column $1
$1 in a { print a[$1]; print $2 }
{ a[$1]=$2 }
' | uniq
next prev parent reply other threads:[~2022-10-14 12:00 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-30 12:04 icase pathspec magic support in ls-tree Tao Klerks
2022-09-30 13:53 ` Ævar Arnfjörð Bjarmason
2022-10-02 19:07 ` brian m. carlson
2022-10-13 6:35 ` Tao Klerks
2022-10-14 4:51 ` Torsten Bögershausen
2022-10-14 8:31 ` Tao Klerks
2022-10-14 8:37 ` Erik Cervin Edin
2022-10-14 7:41 ` Elijah Newren
2022-10-14 8:03 ` Erik Cervin Edin
2022-10-14 8:57 ` Tao Klerks
2022-10-14 8:48 ` Tao Klerks
2022-10-14 9:07 ` Tao Klerks
2022-10-14 12:00 ` Erik Cervin Edin [this message]
2022-10-14 17:06 ` Elijah Newren
2022-10-15 22:06 ` Tao Klerks
2022-10-17 15:46 ` Tao Klerks
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CA+JQ7M9nEHeALeHKO465xsNwmP8C3TXXDjuXAN9cFMmC-XEJnA@mail.gmail.com \
--to=erik@cervined.in \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=newren@gmail.com \
--cc=sandals@crustytoothpaste.net \
--cc=tao@klerks.biz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).