git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Alexander Miseler <alexander@miseler.de>
To: Pete Wyckoff <pw@padd.com>
Cc: Jeff King <peff@peff.net>,
	git@vger.kernel.org, emontellese@gmail.com, schacon@gmail.com,
	joey@kitenet.net
Subject: Re: Fwd: Git and Large Binaries: A Proposed Solution
Date: Thu, 10 Mar 2011 22:02:53 +0100	[thread overview]
Message-ID: <4D793C7D.1000502@miseler.de> (raw)
In-Reply-To: <20110123141417.GA6133@mew.padd.com>

I've been debating whether to resurrect this thread, but since it has been referenced by the SoC2011Ideas wiki article I will just go ahead.
I've spent a few hours trying to make this work to make git with big files usable under Windows.

> Just a quick aside.  Since (a2b665d, 2011-01-05) you can provide
> the filename as an argument to the filter script:
> 
>     git config --global filter.huge.clean huge-clean %f
> 
> then use it in place:
> 
>     $ cat >huge-clean 
>     #!/bin/sh
>     f="$1"
>     echo orig file is "$f" >&2
>     sha1=`sha1sum "$f" | cut -d' ' -f1`
>     cp "$f" /tmp/big_storage/$sha1
>     rm -f "$f"
>     echo $sha1
> 
> 		-- Pete

First off, the commit mentioned here is no help at all. This commit changes nothing about the input and output of filters. The file is still loaded completely into memory, still streamed to the filter via stdin, still streamed from the filter via stdout into yet another memory buffer. The two of which, IIRC, exist simultaneous for at least some time, thus doubling the memory requirements. This change only additionally provides the file name to the filter and nothing else. If one carefully rereads the commit message this apparently was the intention.

After this I started digging into the git source code. To change the filter input would be extremely trivial. However, the function that returns the filter output in a memory buffer is called from 8 places (all details from wetware memory and therefore unreliable). Most, maybe all, of the callers just dump the buffer into a file, which could easily be relocated into the filter calling function itself. But two callers detached the buffer from the strbuf and kept it beyond writing the file. I didn't track it any further since I decided to rather spend my time on improving big file handling in git itself, rather than targeting a workaround. Though of course a completely big-file-ready git should also provide a sane way to feed big files to and from filters.

If the two detached buffers are no complication this might be a trivial project. If they do it might become demanding though.

  parent reply	other threads:[~2011-03-10 21:03 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <AANLkTin=UySutWLS0Y7OmuvkE=T=+YB8G8aUCxLH=GKa@mail.gmail.com>
2011-01-21 18:57 ` Fwd: Git and Large Binaries: A Proposed Solution Eric Montellese
2011-01-21 21:36   ` Wesley J. Landaker
2011-01-21 22:00     ` Eric Montellese
2011-01-21 22:24   ` Jeff King
2011-01-21 23:15     ` Eric Montellese
2011-01-22  3:05       ` Sverre Rabbelier
2011-01-23 14:14     ` Pete Wyckoff
2011-01-26  3:42       ` Scott Chacon
2011-01-26 16:23         ` Eric Montellese
2011-01-26 17:42         ` Joey Hess
2011-01-26 21:40         ` Jakub Narebski
2011-03-10 21:02       ` Alexander Miseler [this message]
2011-03-10 22:24         ` Jeff King
2011-03-13  1:53           ` Eric Montellese
2011-03-13  2:52             ` Jeff King
2011-03-13 19:33               ` Alexander Miseler
2011-03-14 19:32                 ` Jeff King
2011-03-16  0:35                   ` Eric Montellese
2011-03-16 14:40                   ` Nguyen Thai Ngoc Duy
2011-01-22  0:07   ` Joey Hess

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D793C7D.1000502@miseler.de \
    --to=alexander@miseler.de \
    --cc=emontellese@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=joey@kitenet.net \
    --cc=peff@peff.net \
    --cc=pw@padd.com \
    --cc=schacon@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).