git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: ardi <ardillasdelmonte@gmail.com>, git@vger.kernel.org
Subject: Re: Settings for minimizing repacking (and keeping 'rsync' happy)
Date: Mon, 29 Jul 2019 16:16:10 -0400	[thread overview]
Message-ID: <20190729201610.GG14943@sigill.intra.peff.net> (raw)
In-Reply-To: <87tvb55799.fsf@evledraar.gmail.com>

On Mon, Jul 29, 2019 at 02:56:34PM +0200, Ævar Arnfjörð Bjarmason wrote:

> The thread I started at
> https://public-inbox.org/git/87bmhiykvw.fsf@evledraar.gmail.com/ should
> also be of interest. I.e. we could have some knobs to create more
> "stable" packs, I know rsync does some in-file hashing, but I don't
> if/how that works if you have 1 file split into N where some chunks in
> the N are in the one file.
> 
> But it's possible to imagine a repacking algorithm that would keep
> producing entirely new packs but arrange for it to be ordered/delta'd in
> such a way that it optimizes for page-by-page similarity to an older
> pack to some degree.

I actually think that's the part that rsync does well. We don't keep
page-by-page similarity, but rsync (and other tools like borg) are
really good at finding the moved chunks. The problem is just that it
doesn't know to compare chunks between two files with unrelated names.

-Peff

  reply	other threads:[~2019-07-29 20:16 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-27 23:41 Settings for minimizing repacking (and keeping 'rsync' happy) ardi
2019-07-29  9:42 ` Jeff King
2019-07-29 12:56   ` Ævar Arnfjörð Bjarmason
2019-07-29 20:16     ` Jeff King [this message]
2019-07-29 14:35 ` Konstantin Ryabitsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190729201610.GG14943@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=ardillasdelmonte@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).