From: michael@platin.gs To: git@vger.kernel.org Cc: "Jeff King" <peff@peff.net>, "Stefan Beller" <stefanbeller@gmail.com>, "Jeff Smith" <whydoubt@gmail.com>, "Junio C Hamano" <gitster@pobox.com>, "René Scharfe" <l.s.r@web.de>, "Michael Platings" <michael@platin.gs> Subject: [RFC PATCH 0/1] Fuzzy blame Date: Sun, 24 Mar 2019 23:50:19 +0000 [thread overview] Message-ID: <20190324235020.49706-1-michael@platin.gs> (raw) From: Michael Platings <michael@platin.gs> Hi Git devs, Some of you may be familiar with the git-hyper-blame tool [1]. It's "useful if you have a commit that makes sweeping changes that are unlikely to be what you are looking for in a blame, such as mass reformatting or renaming." git-hyper-blame is useful but (a) it's not convenient to install; (b) it's missing functionality available in regular git blame; (c) it's method of matching lines between chunks is too simplistic for many use cases; and (d) it's not Git so it doesn't integrate well with tools that expect Git e.g. vim plugins. Therefore I'm hoping to add similar and hopefully superior functionality to Git itself. I have a very rough patch so I'd like to get your thoughts on the general approach, particularly in terms of its user-visible behaviour. My initial idea was to lift the design directly from git-hyper-blame. However the approach of picking single revisions to somehow ignore doesn't sit well with the -w, -M & -C options, which have a similar intent but apply to all revisions. I'd like to get your thoughts on whether we could allow applying the -M or -w options to specific revisions. For example, imagine it was agreed that all the #includes in a project should be reordered. In that case, it would be useful to be able to specify that the -M option should be used for blames on that revision specifically, so that in future when someone wants to know why a #include was added they don't have to run git blame twice to find out. Options that are specific to a particular revision could be stored in a ".gitrevisions" file or similar. If the principle of allowing blame options to be applied per-revision is agreeable then I'd like to add a -F/--fuzzy option, to sit alongside -w, -M & -C. I've implemented a prototype "fuzzy" option, patch attached. The option operates at the level of diff chunks. For each line in the "after" half of the chunk it uses a heuristic to choose which line in the "before" half of the chunk matches best. The heuristic I'm using at the moment is of matching "bigrams" as described in [2]. The initial pass typically gives reasonable results, but can jumble up the lines. As in the reformatting/renaming use case the content should stay in the same order, it's worth going to extra effort to avoid jumbling lines. Therefore, after the initial pass, the line that can be matched with the most confidence is used to partition the chunk into halves before and after it. The process is then repeated recursively on the halves above and below the partition line. I feel like a similar algorithm has probably already been invented in a better form - if anyone knows of such a thing then please let me know! I look forward to hearing your thoughts. Thanks, -Michael [1] https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/git-hyper-blame.html [2] https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient Michael Platings (1): Add git blame --fuzzy option. blame.c | 352 +++++++++++++++++++++++++++++++++++++++++++++++-- blame.h | 1 + builtin/blame.c | 3 + t/t8020-blame-fuzzy.sh | 264 +++++++++++++++++++++++++++++++++++++ 4 files changed, 609 insertions(+), 11 deletions(-) create mode 100755 t/t8020-blame-fuzzy.sh -- 2.14.3 (Apple Git-98)
next reply other threads:[~2019-03-24 23:52 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-03-24 23:50 michael [this message] 2019-03-24 23:50 ` [RFC PATCH 1/1] " michael 2019-03-25 2:39 ` [RFC PATCH 0/1] " Junio C Hamano 2019-03-25 9:32 ` Michael Platings 2019-03-25 16:04 ` Barret Rhoden 2019-03-25 23:21 ` Michael Platings 2019-03-25 23:35 ` Jeff King 2019-03-26 3:07 ` Jacob Keller 2019-03-26 20:26 ` Michael Platings 2019-03-27 6:36 ` Duy Nguyen 2019-03-27 8:26 ` Michael Platings 2019-03-27 9:02 ` Duy Nguyen 2019-04-03 15:25 ` Barret Rhoden 2019-04-03 21:49 ` Michael Platings
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190324235020.49706-1-michael@platin.gs \ --to=michael@platin.gs \ --cc=git@vger.kernel.org \ --cc=gitster@pobox.com \ --cc=l.s.r@web.de \ --cc=peff@peff.net \ --cc=stefanbeller@gmail.com \ --cc=whydoubt@gmail.com \ --subject='Re: [RFC PATCH 0/1] Fuzzy blame' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).