git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Erik Cervin Edin <erik@cervined.in>
To: Tao Klerks <tao@klerks.biz>
Cc: "Elijah Newren" <newren@gmail.com>,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	git <git@vger.kernel.org>
Subject: Re: icase pathspec magic support in ls-tree
Date: Fri, 14 Oct 2022 14:00:07 +0200	[thread overview]
Message-ID: <CA+JQ7M9nEHeALeHKO465xsNwmP8C3TXXDjuXAN9cFMmC-XEJnA@mail.gmail.com> (raw)
In-Reply-To: <CAPMMpog94YUDPZswcGZ0ns10QXhaWOGmE95mgZEpdcx4GKsV3w@mail.gmail.com>

On Fri, Oct 14, 2022 at 10:58 AM Tao Klerks <tao@klerks.biz> wrote:
>
> I don't understand this suggestion; doesn't it only catch duplicates
> where both instances were introduced in the same 100-commit range?

Yes. It was a bit half-baked but the main idea was to limit the tree
to a smaller subset (and not the whole tree) and incrementally
checking for introduced duplicates instead of a full tree search. I
think that's basically Elijah's idea. Get all (added?) files
introduced in a certain revision range (last change, since yesterday
etc.) and then only check those against the tree for duplicates in a
manner of how you define duplicates

On Fri, Oct 14, 2022 at 10:50 AM Tao Klerks <tao@klerks.biz> wrote:
>
> Directories have been the problem, in "my" repo, around one-third of
> the time - typically someone does a directory rename, and someone else
> does a bad merge and reintroduces the old directory.

That adds a bit of complexity :/
but should still be doable.

Not perfect but maybe something along these lines? (caveat, possibly GNU only)

#!/bin/sh

# files added between revisions x y
added_files() {
    git diff --diff-filter=A --name-only --no-renames $1 $2 ;
}

# folders of those added files
added_folders() {
    added_files $1 $2 |
        sed -e '/[^\/]*/s@^@./@' -e 's@/[^/]*$@/@' |
         sort -u ;
}

# all files tracked by git in *those* folders at HEAD
possible_dupes() {
    added_folders $1 $2 |
        xargs git ls-tree --name-only HEAD ;
}

# case insensitive columns separated by \x1
# eg.
#path\x1PaTh
#path\x1path
case_insensitive() {
    sed -e 's@.*@\L\0\E\x1\0@' |
        sort ;
}


x=$1
y=$2
# Find all duplicates paths (case insensitive)
# in directories which were added between $x $y
possible_dupes $x $y |
    case_insensitive |
    awk -F '\x1' '
        # actual "duplicate" paths, column $2
        # as determined by case-insensitive column $1
        $1 in a { print a[$1]; print $2 }
        { a[$1]=$2 }
    '    | uniq

  reply	other threads:[~2022-10-14 12:00 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-30 12:04 icase pathspec magic support in ls-tree Tao Klerks
2022-09-30 13:53 ` Ævar Arnfjörð Bjarmason
2022-10-02 19:07   ` brian m. carlson
2022-10-13  6:35     ` Tao Klerks
2022-10-14  4:51       ` Torsten Bögershausen
2022-10-14  8:31         ` Tao Klerks
2022-10-14  8:37           ` Erik Cervin Edin
2022-10-14  7:41       ` Elijah Newren
2022-10-14  8:03         ` Erik Cervin Edin
2022-10-14  8:57           ` Tao Klerks
2022-10-14  8:48         ` Tao Klerks
2022-10-14  9:07           ` Tao Klerks
2022-10-14 12:00             ` Erik Cervin Edin [this message]
2022-10-14 17:06           ` Elijah Newren
2022-10-15 22:06             ` Tao Klerks
2022-10-17 15:46               ` Tao Klerks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+JQ7M9nEHeALeHKO465xsNwmP8C3TXXDjuXAN9cFMmC-XEJnA@mail.gmail.com \
    --to=erik@cervined.in \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    --cc=tao@klerks.biz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).