git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Matthew DeVore <matvore@comcast.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Matthew DeVore <matvore@google.com>,
	jonathantanmy@google.com, jrn@google.com, git@vger.kernel.org,
	dstolee@microsoft.com, jeffhost@microsoft.com,
	jrnieder@gmail.com
Subject: Re: [RFC PATCH 3/3] list-objects-filter: implement composite filters
Date: Mon, 20 May 2019 11:24:47 -0700	[thread overview]
Message-ID: <B379F2FB-77F2-4FAC-B39A-BB3CFE685681@comcast.net> (raw)
In-Reply-To: <1E174CAA-BD57-400B-A83B-4AABFAFBC04B@comcast.net>



> On 2019/05/17, at 6:17, Matthew DeVore <matvore@comcast.net> wrote:
> 
> 
> 
>> On May 16, 2019, at 8:25 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>> 
>>> 	$ git rev-list --filter=tree:2 --filter:blob:limit=32k
>> 
>> Shouldn't the second one say "--filter=blob:limit=32k" (i.e. the
>> first colon should be an equal sign)?
> 
> That's right. Fixed locally.
> 
>> 
>>> Such usage is currently an error, so giving it a meaning is backwards-
>>> compatible.
>> 
>> Two minor comments.  
>> 
>> If combine means "must satisfy all of these", '+' is probably a poor
>> choice (perhaps we want '&' instead).  Also, it seems to me that
> 
> I think I agree. & is more intuitive.

After I tried this in code, I noticed two problems with & which make
me prefer + again:

a. the "&" char must be quoted or escaped in the shell, even if it is
   hugged by alphanumeric characters on either side:

	$ echo a&b
	[1] 17083
	a
	-bash: b: command not found
	[1]+  Done                    echo a
	$

b. visually speaking, "&" doesn't stand out very well unless it's
   surrounded by whitespace, and currently it must *not* be surrounded
   by whitespace:

	--filter=combine:blob:none&tree:3&sparse:../foo

	vs.

	--filter=combine:blob:none+tree:3+sparse:../foo

> 
>> having to worry about url encoding and parsing encoded data
>> correctly and securely would be far more work than simply taking
>> multiple command line parameters, accumulating them in a string
>> list, and then at the end of command line parsing, building a
>> combined filter out of all of them at once (a degenerate case may
>> end up attempting to build a combined filter that combines a single
>> filter), iow just biting the bullet and do the "potentially be
>> improved" step from the beginning.
> 
> My intention actually is to support the repeated flag pretty soon, but I only want to write the code if there's agreement on my current approach.
> 
> My justification for the URL-encoding scheme is:
> 
> 1. The combined filters will eventually have to travel over the wire.
> 
> 2. The Git protocol will either have repeated "filter" lines or it will continue to use a single filter line with an encoding scheme.
> 
> 3. Continuing to use a single filter line seemed the least disruptive considering both this codebase and Git clones like JGit. Other clones will likely fail saying "unknown filter combine:" or something like that until it gets implemented. A paranoid consideration is that clones and proprietary server implementations may currently allow the "filter" line to be silently overridden if it is repeated.
> 
> 4. Assuming we *do* use a single filter line over the wire, it makes sense to allow the user to specify the raw filter line as well as have the more friendly UI of repeating --filter flags.
> 
> 5. If we use repeated "filter" lines over the wire, and later start implementing a more complete DSL for specifying filters (see Mercurial's "revsets") the repeated-filter-line feature in the protocol may end up becoming deprecated and we will end up back-pedaling to allow integration of the "&" operator with whatever new operators we need.
> 
> (I very much doubt I will be the one implementing such a DSL for filters or resets, but I think it's a possibility)
> 
>> So why are we allowing %3A there that does not even have to be
>> encoded?  Shouldn't it be an error?
> 
> We do have to require the combine operator (& or +) and % be encoded. For other operators, there are three options:
> 
> 1. Allow anything to be encoded. I chose this because it's how I usually think of URL encoding working. For instance, if I go to https://public-inbox.org/git/?q=cod%65+coverage in Chrome, the browser automatically decodes the %65 to an e in the address bar. Safari does not automatically decode, but the server apparently interprets the %65 as an e. I am not really attached to this choice.
> 
> 2. Do not allow or require anything else to be encoded.
> 
> 3. Require encoding of a couple of "reserved" characters that don't appear in filters now, and don't typically appear in UNIX path names. This would allow for expansion later. For instance, "~&%*+|(){}!\" plus the ASCII range [0, 0x20] and single and double quotes - do not allow encoding of anything else.
> 
> 4. Same requirements as 3, but permit encoding of other arbitrary characters.
> 
> I kind of like 3 now that I've thought it out more.
> 
>> 
>> In any case, I am not quite convinced that we need to complicate the
>> parameters with URLencoding, so I'd skip reviewing large part this
>> patch that is about "decoding".
> 
> It's fine if we drop the encoding scheme. I intentionally tried to limit the amount of work I stacked on top of it until I got agreement. Please let me know if anything I've said changes your perspective.
> 
>> 
>> Once the combined filter definition is built in-core, the code that
>> evaluates the intersection of all conditions seems to be written
>> sanely to me.
> 
> Great! I actually did simplify it a bit since I sent the first roll-up.
> 
> Thanks.
> 


  parent reply	other threads:[~2019-05-20 18:25 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-28 15:55 Proposal: object negotiation for partial clones Matthew DeVore
2019-05-06 18:25 ` Jonathan Nieder
2019-05-06 19:28 ` Jonathan Tan
2019-05-06 19:46   ` Jonathan Nieder
2019-05-06 23:20     ` Matthew DeVore
2019-05-07  0:02       ` Jonathan Nieder
2019-05-06 22:47   ` Matthew DeVore
2019-05-07 18:34     ` Jonathan Tan
2019-05-07 21:57       ` Matthew DeVore
2019-05-09 18:00         ` Jonathan Tan
2019-05-14  0:09           ` Matthew DeVore
2019-05-14  0:16             ` Jonathan Nieder
2019-05-16 18:56               ` [RFC PATCH 0/3] implement composite filters Matthew DeVore
2019-05-16 18:56                 ` [RFC PATCH 1/3] list-objects-filter: refactor into a context struct Matthew DeVore
2019-05-16 18:56                 ` [RFC PATCH 2/3] list-objects-filter-options: error is localizeable Matthew DeVore
2019-05-16 18:56                 ` [RFC PATCH 3/3] list-objects-filter: implement composite filters Matthew DeVore
2019-05-17  3:25                   ` Junio C Hamano
2019-05-17 13:17                     ` Matthew DeVore
2019-05-19  1:12                       ` Junio C Hamano
2019-05-20 18:24                       ` Matthew DeVore [this message]
2019-05-20 18:28                       ` Matthew DeVore
2019-05-16 22:41                 ` [RFC PATCH 0/3] " Jonathan Tan
2019-05-17  0:01                   ` Matthew DeVore

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=B379F2FB-77F2-4FAC-B39A-BB3CFE685681@comcast.net \
    --to=matvore@comcast.net \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=jonathantanmy@google.com \
    --cc=jrn@google.com \
    --cc=jrnieder@gmail.com \
    --cc=matvore@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).