git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* RFD: concatening textconv filters
@ 2013-02-21 10:28 Michael J Gruber
  2013-02-21 17:39 ` Junio C Hamano
  0 siblings, 1 reply; 2+ messages in thread
From: Michael J Gruber @ 2013-02-21 10:28 UTC (permalink / raw
  To: Git Mailing List

During my day-to-day UGFWIINIT I noticed that we don't do textconv
iteratively. E.g.: I have a file

SuperSecretButDumbFormat.pdf.gpg

and textconv filters with attributes set for *.gpg and *.pdf (using
"gpg" resp. "pdftotext"). For Git, the file has only the "gpg"
attribute, of course. In this case, I would have wanted to pass the gpg
output through pdftotext.

Now, I can set up an extra filter "gpgtopdftotext" for *.pdf.gpg (hoping
I get the ordering in .gitattributes right), of course, but wondering
whether we could and should support concatenating filters by either

- making it easy to request it (say by setting
"filter.gpgtopdftotext.textconvpipe" to a list of textconv filter names
which are to be applied in sequence)

or

- doing it automatically (remove the pattern which triggered the filter,
and apply attributes again to the resulting pathspec)

Maybe it's just not worth the effort. Or a nice GSoC project ;)

Michael

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: RFD: concatening textconv filters
  2013-02-21 10:28 RFD: concatening textconv filters Michael J Gruber
@ 2013-02-21 17:39 ` Junio C Hamano
  0 siblings, 0 replies; 2+ messages in thread
From: Junio C Hamano @ 2013-02-21 17:39 UTC (permalink / raw
  To: Michael J Gruber; +Cc: Git Mailing List

Michael J Gruber <git@drmicha.warpmail.net> writes:

> ... but wondering
> whether we could and should support concatenating filters by either
>
> - making it easy to request it (say by setting
> "filter.gpgtopdftotext.textconvpipe" to a list of textconv filter names
> which are to be applied in sequence)
>
> or
>
> - doing it automatically (remove the pattern which triggered the filter,
> and apply attributes again to the resulting pathspec)

I think what you are getting at is to start from something like this:

	= .gitattributes =
	*.gpg	diff=gpg
        *.pdf	diff=pdf

and have "git cat-file --textconv frotz.pdf.gpg" (and other textconv
users) notice that:

 (1) The path matches "*.gpg" pattern, and calls for
     diff.gpg.textconv conversion.  This already happens in the
     current system.

 (2) After stripping the "*.gpg" pattern (i.e. look at the part of
     the path that matched the wildcard part * in the attribute
     selector), notice that the remainder, "frotz.pdf", could match
     the "*.pdf" pattern.  The output from the previous filter could
     be treated as if it were a blob that is stored in that path.

A few issues that need to be addressed while designing this feature
that come to my mind at random are:

 * This seems to call for a new concept, but what exactly is that
   concept?  Your RFD sounds as if you desire a "cascadable
   textconv", but it may be of a somewhat larger scope, "virtual
   blob at a virtual path", which the last sentence in (2) above
   seems to suggest.

 * What is this new concept an attribute to?  If we express this as
   "the textconv conversion result of any path with attribute
   diff=gpg can be treated as the contents of a virtual blob", then
   we are making it an attribute of the gpg "type", i.e.

	= .git/config =
	[diff "gpg"]
		textconv = gpg -v
		textconvProducesVirtualBlob = yes

   To me, that seems sufficient for this particular application at
   the first glance, but are there other attributes that may want to
   produce such virtual blob for further processing?  Is limiting
   this to textconv too restrictive?  I do not know.

 * What is the rule to come up with the "virtual path" to base the
   attribute look-up on for the "virtual blob contents"?  In the
   above example, the pattern was a simple "*.gpg", and we used a
   naïve "what did the asterisk match?", but imagine a case where
   you have some documents that you want to do "gpg -v" and some you
   don't.  You express this by having the former class of files
   named with "conv-" prefix, or some convention that is convenient
   for you.

   Your .gitattributes may say something like:

	= .gitattributes =
        conv-*.gpg	diff=gpg

   When deciding what attributes to use to further process the
   result of conversion (i.e. "virtual blob contents") for
   conv-frotz.pdf.gpg, what virtual path should we use?  Should we
   use "conv-frotz.pdf", or just "frotz.pdf"?

   "The difference does not matter--either would work" is not a
   satisfactory answer, once you consider that you may want to have
   two or more classes of pdf files that you may want to treat
   differently, just like you did for gpg encrypted files in this
   example setting.  It seems to suggest that we want to use
   conv-frotz.pdf as the virtual path, but how would we derive that
   from the pattern "conv-*.gpg" and path "conv-frotz.pdf.gpg"?  It
   appears to me that you would need a way to say between the two
   literal parts in the pattern, "conv-" part needs to be kept but
   ".gpg" part needs to be stripped when forming the result.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-02-21 17:40 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-21 10:28 RFD: concatening textconv filters Michael J Gruber
2013-02-21 17:39 ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).