git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Damien Robert <damien.olivier.robert@gmail.com>
To: James Ramsay <james@jramsay.com.au>
Cc: git@vger.kernel.org
Subject: Re: [TOPIC 3/17] Obliterate
Date: Sun, 15 Mar 2020 23:19:40 +0100	[thread overview]
Message-ID: <20200315221940.bdgi5mluxuetq2lz@doriath> (raw)
In-Reply-To: <5B2FEA46-A12F-4DE7-A184-E8856EF66248@jramsay.com.au>

From James Ramsay, Thu 12 Mar 2020 at 14:57:24 (+1100) :
> 6. Elijah: replace refs helps, but not supported by hosts like GitHub etc
>     a. Stolee: breaks commit graph because of generation numbers.
>     b. Replace refs for blobs, then special packfile, there were edge cases.

I am interested in more details on how to handle this using replace.

My situation: coworkers push big files by mistake, I don't want to rewrite
history because they are not too well versed with git, but I want to keep
*my* repo clean.

Partial solution:
- identify the large blobs (easy)
- write a replace ref (easy):
  $ git replace b5f74037bb91 $(git hash-object -w -t blob /dev/null)
  and replace the file (if it is still in the repo) by an empty file.

Now the pain points start:
- first the index does not handle replace (I think), so the replaced file
  appear as changed in git status, even through eg git diff shows nothing.

=> Solution: configure .git/info/sparse-checkout

- secondly, I want to remove the large blob from my repo.

Ideally I'd like to repack everything but filter this blob, except that
repack does not understand --filter. So I need to use `git pack-objects`
directly and then do the naming and clean up that repack usually does
manually, which is error prone.

Furthermore, while `git pack-objects` accepts --filter, I can only filter on
blob size, not blob oid. (there is filter=sparse:oid where I could reuse my
sparse checkout file, but I would need to make a blob of it first). And if I
have one large file I want to keep, I cannot filter by blob size.

Another solution would be to use `git unpack-objects` to unpack all objects
(except I would need to do that in an empty git dir), remove the blob, and
then repack everything.

Am I missing a simpler solution?

- finally, checkouting to a ref including the replaced (now missing) blob
  gives error messages of the form:
error: invalid object 100644 b5f74037bb91c45606b233b0ad6aad86f8e3875e for 'Silverman-Height-NonTorsion.pdf'

On the one hand it is reassuring that git checks that the real object
(rather than only the replaced object) is still there, on the other hand it
would be nice to ask git to completely forget about the original object
(except fsck of course).

Thanks,
Damien

  parent reply	other threads:[~2020-03-15 23:13 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-12  3:55 Notes from Git Contributor Summit, Los Angeles (April 5, 2020) James Ramsay
2020-03-12  3:56 ` [TOPIC 1/17] Reftable James Ramsay
2020-03-12  3:56 ` [TOPIC 2/17] Hooks in the future James Ramsay
2020-03-12 14:16   ` Emily Shaffer
2020-03-13 17:56     ` Junio C Hamano
2020-04-07 23:01       ` Emily Shaffer
2020-04-07 23:51         ` Emily Shaffer
2020-04-08  0:40           ` Junio C Hamano
2020-04-08  1:09             ` Emily Shaffer
2020-04-10 21:31           ` Jeff King
2020-04-13 19:15             ` Emily Shaffer
2020-04-13 21:52               ` Jeff King
2020-04-14  0:54                 ` [RFC PATCH v2 0/2] configuration-based hook management (was: [TOPIC 2/17] Hooks in the future) Emily Shaffer
2020-04-14  0:54                   ` [RFC PATCH v2 1/2] hook: scaffolding for git-hook subcommand Emily Shaffer
2020-04-14  0:54                   ` [RFC PATCH v2 2/2] hook: add --list mode Emily Shaffer
2020-04-14 15:15                   ` [RFC PATCH v2 0/2] configuration-based hook management Phillip Wood
2020-04-14 19:24                     ` Emily Shaffer
2020-04-14 20:27                       ` Jeff King
2020-04-15 10:01                         ` Phillip Wood
2020-04-14 20:03                     ` Josh Steadmon
2020-04-15 10:08                       ` Phillip Wood
2020-04-14 20:32                     ` Jeff King
2020-04-15 10:01                       ` Phillip Wood
2020-04-15 14:51                         ` Junio C Hamano
2020-04-15 20:30                           ` Emily Shaffer
2020-04-15 22:19                             ` Junio C Hamano
2020-04-15  3:45                 ` [TOPIC 2/17] Hooks in the future Jonathan Nieder
2020-04-15 20:59                   ` Emily Shaffer
2020-04-20 23:53                     ` [PATCH] doc: propose hooks managed by the config Emily Shaffer
2020-04-21  0:22                       ` Emily Shaffer
2020-04-21  1:20                         ` Junio C Hamano
2020-04-24 23:14                           ` Emily Shaffer
2020-04-25 20:57                       ` brian m. carlson
2020-05-06 21:33                         ` Emily Shaffer
2020-05-06 23:13                           ` brian m. carlson
2020-05-19 20:10                           ` Emily Shaffer
2020-04-15 22:42                   ` [TOPIC 2/17] Hooks in the future Jeff King
2020-04-15 22:48                     ` Emily Shaffer
2020-04-15 22:57                       ` Jeff King
2020-03-12  3:57 ` [TOPIC 3/17] Obliterate James Ramsay
2020-03-12 18:06   ` Konstantin Ryabitsev
2020-03-15 22:19   ` Damien Robert [this message]
2020-03-16 12:55     ` Konstantin Tokarev
2020-03-26 22:27       ` Damien Robert
2020-03-16 16:32     ` Elijah Newren
2020-03-26 22:30       ` Damien Robert
2020-03-16 18:32     ` Phillip Susi
2020-03-26 22:37       ` Damien Robert
2020-03-16 20:01     ` Philip Oakley
2020-05-16  2:21       ` nbelakovski
2020-03-12  3:58 ` [TOPIC 4/17] Sparse checkout James Ramsay
2020-03-12  4:00 ` [TOPIC 5/17] Partial Clone James Ramsay
2020-03-17  7:38   ` Allowing only blob filtering was: " Christian Couder
2020-03-17 20:39     ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Taylor Blau
2020-03-17 20:39       ` [RFC PATCH 1/2] list_objects_filter_options: introduce 'list_object_filter_config_name' Taylor Blau
2020-03-17 20:53         ` Eric Sunshine
2020-03-18 10:03           ` Jeff King
2020-03-18 19:40             ` Junio C Hamano
2020-03-18 22:38             ` Eric Sunshine
2020-03-19 17:15               ` Jeff King
2020-03-18 21:05           ` Taylor Blau
2020-03-17 20:39       ` [RFC PATCH 2/2] upload-pack.c: allow banning certain object filter(s) Taylor Blau
2020-03-17 21:11         ` Eric Sunshine
2020-03-18 21:18           ` Taylor Blau
2020-03-18 11:18         ` Philip Oakley
2020-03-18 21:20           ` Taylor Blau
2020-03-18 10:18       ` [RFC PATCH 0/2] upload-pack.c: limit allowed filter choices Jeff King
2020-03-18 18:26         ` Re*: " Junio C Hamano
2020-03-19 17:03           ` Jeff King
2020-03-18 21:28         ` Taylor Blau
2020-03-18 22:41           ` Junio C Hamano
2020-03-19 17:10             ` Jeff King
2020-03-19 17:09           ` Jeff King
2020-04-17  9:41         ` Christian Couder
2020-04-17 17:40           ` Taylor Blau
2020-04-17 18:06             ` Jeff King
2020-04-21 12:34               ` Christian Couder
2020-04-22 20:41                 ` Taylor Blau
2020-04-22 20:42               ` Taylor Blau
2020-04-21 12:17             ` Christian Couder
2020-03-12  4:01 ` [TOPIC 6/17] GC strategies James Ramsay
2020-03-12  4:02 ` [TOPIC 7/17] Background operations/maintenance James Ramsay
2020-03-12  4:03 ` [TOPIC 8/17] Push performance James Ramsay
2020-03-12  4:04 ` [TOPIC 9/17] Obsolescence markers and evolve James Ramsay
2020-05-09 21:31   ` Noam Soloveichik
2020-05-15 22:26     ` Jeff King
2020-03-12  4:05 ` [TOPIC 10/17] Expel ‘git shell’? James Ramsay
2020-03-12  4:07 ` [TOPIC 11/17] GPL enforcement James Ramsay
2020-03-12  4:08 ` [TOPIC 12/17] Test harness improvements James Ramsay
2020-03-12  4:09 ` [TOPIC 13/17] Cross implementation test suite James Ramsay
2020-03-12  4:11 ` [TOPIC 14/17] Aspects of merge-ort: cool, or crimes against humanity? James Ramsay
2020-03-12  4:13 ` [TOPIC 15/17] Reachability checks James Ramsay
2020-03-12  4:14 ` [TOPIC 16/17] “I want a reviewer” James Ramsay
2020-03-12 13:31   ` Emily Shaffer
2020-03-12 17:31     ` Konstantin Ryabitsev
2020-03-12 17:42       ` Jonathan Nieder
2020-03-12 18:00         ` Konstantin Ryabitsev
2020-03-17  0:43     ` Philippe Blain
2020-03-13 21:25   ` Eric Wong
2020-03-14 17:27     ` Jeff King
2020-03-15  0:36       ` inbox indexing wishlist [was: [TOPIC 16/17] “I want a reviewer”] Eric Wong
2020-03-12  4:16 ` [TOPIC 17/17] Security James Ramsay
2020-03-12 14:38 ` Notes from Git Contributor Summit, Los Angeles (April 5, 2020) Derrick Stolee
2020-03-13 20:47 ` Jeff King
2020-03-15 18:42 ` Jakub Narebski
2020-03-16 19:31   ` Jeff King
  -- strict thread matches above, loose matches on Subject: below --
2019-12-10  2:33 [PATCH 0/6] configuration-based hook management Emily Shaffer
2019-12-10  2:33 ` [PATCH 1/6] hook: scaffolding for git-hook subcommand Emily Shaffer
2019-12-12  9:41   ` Bert Wesarg
2019-12-12 10:47   ` SZEDER Gábor
2019-12-10  2:33 ` [PATCH 2/6] config: add string mapping for enum config_scope Emily Shaffer
2019-12-10 11:16   ` Philip Oakley
2019-12-10 17:21     ` Philip Oakley
2019-12-10  2:33 ` [PATCH 3/6] hook: add --list mode Emily Shaffer
2019-12-12  9:38   ` Bert Wesarg
2019-12-12 10:58   ` SZEDER Gábor
2019-12-10  2:33 ` [PATCH 4/6] hook: support reordering of hook list Emily Shaffer
2019-12-11 19:21   ` Junio C Hamano
2019-12-10  2:33 ` [PATCH 5/6] hook: remove prior hook with '---' Emily Shaffer
2019-12-10  2:33 ` [PATCH 6/6] hook: teach --porcelain mode Emily Shaffer
2019-12-11 19:33   ` Junio C Hamano
2019-12-11 22:00     ` Emily Shaffer
2019-12-11 22:07       ` Junio C Hamano
2019-12-11 23:15         ` Emily Shaffer
2019-12-11 22:42 ` [PATCH 0/6] configuration-based hook management Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200315221940.bdgi5mluxuetq2lz@doriath \
    --to=damien.olivier.robert@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=james@jramsay.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).