git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Jakub Narębski" <jnareb@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git <git@vger.kernel.org>, Taylor Blau <me@ttaylorr.com>,
	Garima Singh <garimasigit@gmail.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 1/3] revision: complicated pathspecs disable filters
Date: Thu, 16 Apr 2020 00:18:33 +0200	[thread overview]
Message-ID: <CANQwDwdcM-hc+Nyc0nBnp79W0BuH2PfdgM6WLqET3sPn1GYc-A@mail.gmail.com> (raw)
In-Reply-To: <99e0ae2c-6b65-24e4-3d2b-1dff619a5daa@gmail.com>

On Wed, 15 Apr 2020 at 20:37, Derrick Stolee <stolee@gmail.com> wrote:
[...]
> -->8--
> From 89beb9598daabb19e3c896bbceeb0fc1b9ccc6ca Mon Sep 17 00:00:00 2001
> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Wed, 15 Apr 2020 18:04:25 +0000
> Subject: [PATCH] bloom: compute all Bloom hashes from lowercase
>
> The changed-path Bloom filters currently hash path strings using
> the exact string for the path. This makes it difficult* to use the
> filters when restricting to case-insensitive pathspecs.
>
> * I say "difficult" because it is possible to generate all 2^n
>   options for the case of a path and test them all, but this is
>   a bad idea and should not be done. "Impossible" is an appropriate
>   alternative.
>
> THIS IS A BREAKING CHANGE. Commit-graph files with changed-path
> Bloom filters computed by a previous commit will not be compatible
> with the filters computed in this commit, nor will we get correct
> results when testing across these incompatible versions. Normally,
> this would be a completely unacceptable change, but the filters
> have not been released and hence are still possible to update
> before release.
>
> TODO: If we decide to move in this direction, then the following
> steps should be done (and some of them should be done anyway):
>
> * We need to document the Bloom filter format to specify exactly
>   how we compute the filter data. The details should be careful
>   enough that someone can reproduce the exact file format without
>   looking at the C code.
>
> * That document would include the tolower() transformation that is
>   being done here.

Why not modify the BDAT chunk to include version of
case folding transformation or other collation algorithm
(other transformation).that is done prior to computing
the Bloom filter key? Though that might be unnecessary
flexibility...

For example the value of 0x00 in such field of BDAT
chunk header would mean no transformation, while
the value of 0x01 would mean per-character tolower()
or Unicode equivalent of it.

Best,
-- 
Jakub Narębski

  parent reply	other threads:[~2020-04-15 22:19 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-11  1:02 [PATCH 0/3] Integrate changed-path Bloom filters with 'git blame' Derrick Stolee via GitGitGadget
2020-04-11  1:02 ` [PATCH 1/3] revision: complicated pathspecs disable filters Derrick Stolee via GitGitGadget
2020-04-11 21:40   ` Junio C Hamano
2020-04-13 11:49     ` Derrick Stolee
2020-04-14 18:25       ` Junio C Hamano
2020-04-15 13:27         ` Derrick Stolee
2020-04-15 18:37           ` Derrick Stolee
2020-04-15 19:32             ` Junio C Hamano
2020-04-15 19:39               ` Junio C Hamano
2020-04-15 21:25             ` Junio C Hamano
2020-04-16  0:56               ` Taylor Blau
2020-04-15 22:18             ` Jakub Narębski [this message]
2020-04-16  0:52               ` Taylor Blau
2020-04-16 13:26                 ` Derrick Stolee
2020-04-16 16:33                   ` Taylor Blau
2020-04-16 18:02                     ` Junio C Hamano
2020-04-12 22:22   ` Taylor Blau
2020-04-12 22:30     ` Junio C Hamano
2020-04-13  0:07       ` Taylor Blau
2020-04-13 11:54         ` Derrick Stolee
2020-04-11  1:03 ` [PATCH 2/3] commit: write commit-graph with bloom filters Derrick Stolee via GitGitGadget
2020-04-11 21:57   ` Junio C Hamano
2020-04-12 20:51     ` Taylor Blau
2020-04-13 12:08       ` Derrick Stolee
2020-04-13 22:11         ` Junio C Hamano
2020-04-11  1:03 ` [PATCH 3/3] blame: use changed-path Bloom filters Derrick Stolee via GitGitGadget
2020-04-11 22:03   ` Junio C Hamano
2020-04-12  7:39     ` Eric Sunshine
2020-04-11 21:30 ` [PATCH 0/3] Integrate changed-path Bloom filters with 'git blame' Junio C Hamano
2020-04-13 14:45 ` [PATCH v2 0/4] " Derrick Stolee via GitGitGadget
2020-04-13 14:45   ` [PATCH v2 1/4] revision: complicated pathspecs disable filters Derrick Stolee via GitGitGadget
2020-04-13 16:09     ` Taylor Blau
2020-04-13 22:18       ` Junio C Hamano
2020-04-13 14:45   ` [PATCH v2 2/4] commit: write commit-graph with Bloom filters Derrick Stolee via GitGitGadget
2020-04-13 16:12     ` Taylor Blau
2020-04-13 22:21       ` Junio C Hamano
2020-04-14 15:04         ` Derrick Stolee
2020-04-14 17:26           ` Junio C Hamano
2020-04-14 17:40             ` Derrick Stolee
2020-04-15  0:17               ` Taylor Blau
2020-04-13 14:45   ` [PATCH v2 3/4] commit-graph: write commit-graph in more tests Derrick Stolee via GitGitGadget
2020-04-13 14:45   ` [PATCH v2 4/4] blame: use changed-path Bloom filters Derrick Stolee via GitGitGadget
2020-04-13 16:21   ` [PATCH v2 0/4] Integrate changed-path Bloom filters with 'git blame' Taylor Blau
2020-04-16 20:14   ` [PATCH v3 0/3] " Derrick Stolee via GitGitGadget
2020-04-16 20:14     ` [PATCH v3 1/3] revision: complicated pathspecs disable filters Derrick Stolee via GitGitGadget
2020-06-07 20:33       ` SZEDER Gábor
2020-04-16 20:14     ` [PATCH v3 2/3] tests: write commit-graph with Bloom filters Derrick Stolee via GitGitGadget
2020-04-16 20:14     ` [PATCH v3 3/3] blame: use changed-path " Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANQwDwdcM-hc+Nyc0nBnp79W0BuH2PfdgM6WLqET3sPn1GYc-A@mail.gmail.com \
    --to=jnareb@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=garimasigit@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).