git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Reducing Git Repository size - git-filter-repo doesn't help
@ 2023-01-09  2:22 fawaz ahmed0
  2023-01-09  2:36 ` rsbecker
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: fawaz ahmed0 @ 2023-01-09  2:22 UTC (permalink / raw)
  To: git@vger.kernel.org

Hi,

I have this huge repo: https://github.com/fawazahmed0/currency-api#readme  and I am trying to reduce its size.

I have run filter-repo script on this repo (  https://github.com/fawazahmed0/currency-api/blob/1/.github/workflows/cleanup-repo.yml )
The commits were reduced from 1k to 600 , but the space used is still same. (i.e size-pack: 6.47 GiB , https://github.com/fawazahmed95/currency-api/actions/runs/3865919157/jobs/6589710845#step:5:1498 )

Almost all commits of this repo were applied on partially cloned repository: ( https://github.com/fawazahmed0/currency-api/blob/1/.github/workflows/run.yml )
So I guess it had never run any git maintenance task in it's life.

I am not sure what needs to be done to reduce it's space utilization. ( https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github#:~:text=less%20than%205%20GB%20is%20strongly%20recommended )


Thanks,
Fawaz Ahmed








^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Reducing Git Repository size - git-filter-repo doesn't help
  2023-01-09  2:22 Reducing Git Repository size - git-filter-repo doesn't help fawaz ahmed0
@ 2023-01-09  2:36 ` rsbecker
  2023-01-09  3:29   ` fawaz ahmed0
  2023-01-09 10:22 ` Erik Cervin Edin
  2023-01-10  2:42 ` Elijah Newren
  2 siblings, 1 reply; 11+ messages in thread
From: rsbecker @ 2023-01-09  2:36 UTC (permalink / raw)
  To: 'fawaz ahmed0', git

On January 8, 2023 9:22 PM, fawaz ahmed0 wrote:
>I have this huge repo: https://github.com/fawazahmed0/currency-api#readme
>and I am trying to reduce its size.
>
>I have run filter-repo script on this repo (
>https://github.com/fawazahmed0/currency-
>api/blob/1/.github/workflows/cleanup-repo.yml ) The commits were reduced
>from 1k to 600 , but the space used is still same. (i.e size-pack: 6.47 GiB
,
>https://github.com/fawazahmed95/currency-
>api/actions/runs/3865919157/jobs/6589710845#step:5:1498 )
>
>Almost all commits of this repo were applied on partially cloned
repository: (
>https://github.com/fawazahmed0/currency-
>api/blob/1/.github/workflows/run.yml ) So I guess it had never run any git
>maintenance task in it's life.
>
>I am not sure what needs to be done to reduce it's space utilization. (
>https://docs.github.com/en/repositories/working-with-files/managing-large-
>files/about-large-files-on-
>github#:~:text=less%20than%205%20GB%20is%20strongly%20recommended )

The first thing you can try is 'git gc --aggressive' to reduce the clone
size. Github automatically does garbage collection. If this is a question of
the size of the working index, look at the sparse-checkout command.

--Randall


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Reducing Git Repository size - git-filter-repo doesn't help
  2023-01-09  2:36 ` rsbecker
@ 2023-01-09  3:29   ` fawaz ahmed0
  0 siblings, 0 replies; 11+ messages in thread
From: fawaz ahmed0 @ 2023-01-09  3:29 UTC (permalink / raw)
  To: git@vger.kernel.org, rsbecker@nexbridge.com




From: rsbecker@nexbridge.com <rsbecker@nexbridge.com>
Sent: Monday, January 9, 2023 8:06 AM
To: 'fawaz ahmed0' <fawazahmed0@hotmail.com>; git@vger.kernel.org <git@vger.kernel.org>
Subject: RE: Reducing Git Repository size - git-filter-repo doesn't help 
 
On January 8, 2023 9:22 PM, fawaz ahmed0 wrote:
>I have this huge repo: https://github.com/fawazahmed0/currency-api#readme
>and I am trying to reduce its size.
>
>I have run filter-repo script on this repo (
>https://github.com/fawazahmed0/currency-
>api/blob/1/.github/workflows/cleanup-repo.yml ) The commits were reduced
>from 1k to 600 , but the space used is still same. (i.e size-pack: 6.47 GiB
,
>https://github.com/fawazahmed95/currency-
>api/actions/runs/3865919157/jobs/6589710845#step:5:1498 )
>
>Almost all commits of this repo were applied on partially cloned
repository: (
>https://github.com/fawazahmed0/currency-
>api/blob/1/.github/workflows/run.yml ) So I guess it had never run any git
>maintenance task in it's life.
>
>I am not sure what needs to be done to reduce it's space utilization. (
>https://docs.github.com/en/repositories/working-with-files/managing-large-
>files/about-large-files-on-
>github#:~:text=less%20than%205%20GB%20is%20strongly%20recommended )

The first thing you can try is 'git gc --aggressive' to reduce the clone
size. Github automatically does garbage collection. If this is a question of
the size of the working index, look at the sparse-checkout command.

--Randall

------------

Yes I have already tried git gc --aggressive ( https://github.com/fawazahmed0/currency-api/blob/1a1fb65703a2fc352b0ee452ce908bee545698c2/.github/workflows/cleanup-repo.yml#L78 ) . I am using sparse-checkout to manage the repository. 

The thing is the space utilized by the repository is not being reduced,  even on deleting the data using git-filter-repo script. 

Thanks,
Fawaz Ahmed


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Reducing Git Repository size - git-filter-repo doesn't help
  2023-01-09  2:22 Reducing Git Repository size - git-filter-repo doesn't help fawaz ahmed0
  2023-01-09  2:36 ` rsbecker
@ 2023-01-09 10:22 ` Erik Cervin Edin
  2023-01-10  2:42 ` Elijah Newren
  2 siblings, 0 replies; 11+ messages in thread
From: Erik Cervin Edin @ 2023-01-09 10:22 UTC (permalink / raw)
  To: fawaz ahmed0; +Cc: git@vger.kernel.org

On Mon, Jan 9, 2023 at 3:24 AM fawaz ahmed0 <fawazahmed0@hotmail.com> wrote:
> I have this huge repo: https://github.com/fawazahmed0/currency-api#readme  and I am trying to reduce its size.
>
> I have run filter-repo script on this repo (  https://github.com/fawazahmed0/currency-api/blob/1/.github/workflows/cleanup-repo.yml )

Can you elaborate exactly how you're trying to reduce the repository?
Looking at the script it seems you're removing /latest? And/or folders
corresponding to certain years?

> The commits were reduced from 1k to 600 , but the space used is still same. (i.e size-pack: 6.47 GiB , https://github.com/fawazahmed95/currency-api/actions/runs/3865919157/jobs/6589710845#step:5:1498 )

The number of commits is actually irrelevant, what matters is really
only how much of the tree was pruned. And only if what was pruned
wasn't duplicated.

Say you commit
  2018/big.json
  2019/same-identical-big.json
and then delete 2018, the size of the repository in its packed state
will be virtually identical.

You can analyze which files and directories are occupying the most
space by running
  git filter-repo --analyze
and checking the output file. (It's somewhere like
.git/filter-repo/analysis I don't remember exactly)

However, it seems that you're using git in a highly unconventional
manner and I'd say it's probably worthwhile to consider if it's even
the appropriate tool for the task at hand.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Reducing Git Repository size - git-filter-repo doesn't help
  2023-01-09  2:22 Reducing Git Repository size - git-filter-repo doesn't help fawaz ahmed0
  2023-01-09  2:36 ` rsbecker
  2023-01-09 10:22 ` Erik Cervin Edin
@ 2023-01-10  2:42 ` Elijah Newren
  2023-01-10  8:18   ` fawaz ahmed0
  2 siblings, 1 reply; 11+ messages in thread
From: Elijah Newren @ 2023-01-10  2:42 UTC (permalink / raw)
  To: fawaz ahmed0; +Cc: git@vger.kernel.org

On Sun, Jan 8, 2023 at 6:54 PM fawaz ahmed0 <fawazahmed0@hotmail.com> wrote:
>
> Hi,
>
> I have this huge repo: https://github.com/fawazahmed0/currency-api#readme  and I am trying to reduce its size.
>
> I have run filter-repo script on this repo (  https://github.com/fawazahmed0/currency-api/blob/1/.github/workflows/cleanup-repo.yml )

Why are you cleaning up in a CI task?  filter-repo is intended for a
one-shot "flag day" type cleanup, not something you repeatedly do.
Something seems a bit off already.

> The commits were reduced from 1k to 600 , but the space used is still same. (i.e size-pack: 6.47 GiB , https://github.com/fawazahmed95/currency-api/actions/runs/3865919157/jobs/6589710845#step:5:1498 )

You show the ending size, but not the starting.  Could you provide
that number and how you got it, so we can see what you're measuring
(especially since below it's not at all clear what you're even
measuring?)

The reduction in commits suggests it certainly did do some kind of
pruning, and you might also want to look at the output of running
"python3 git-filter-repo --analyze", both before and after filtering,
to get an idea of what's is/was using lots of space.

Taking a closer look, I suspect you are missing some important
cleaning.  When there are multiple copies of a file in a repository,
git only stores one version.  Based on
https://github.com/fawazahmed0/currency-api/issues/55, all the files
that are now in directories that you are deleting used to be in the
root folder under another name.  The files with the old name aren't
going to be deleted by your pruning since you only requested that the
new names of the files be deleted.  If I'm understanding your
structure correctly (I didn't clone your repo or try this out; I'm
making inferences based on poking around at the links you provided and
looking at that issue), the upshot of that is that your filtering
probably won't shrink things much since you are still keeping a copy
of those files.  Again, "python3 git-filter-repo --analyze" both
before and after filtering will help you find these kinds of things
and/or other problems.

> Almost all commits of this repo were applied on partially cloned repository: ( https://github.com/fawazahmed0/currency-api/blob/1/.github/workflows/run.yml )
> So I guess it had never run any git maintenance task in it's life.

How exactly are you measuring the size, given that you have a partial
clone?  You don't even have the objects in order to measure, so I
don't understand how you are measuring.  I'm even suspecting you are
measuring something else entirely; could you clarify all your size
measurements and how you got them?

> I am not sure what needs to be done to reduce it's space utilization. ( https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github#:~:text=less%20than%205%20GB%20is%20strongly%20recommended )

Note that git-filter-repo only changes the size of the _local_ repo.
You made an additional clone within GitHub Actions, and then
filter-repo shrinks *that* clone.  Even if you had deleted all copies
of older files you don't want anymore locally (which is suspect as I
noted above), your force pushing isn't going to shrink the size of the
repo on the server (i.e. on GitHub) since there are pull requests in
your repo that GitHub won't allow you to overwrite via force-push, and
those pull requests still hold on to the old history.

You probably want to read the "DISCUSSION" section of the filter-repo
manual, and you may also want to see GitHub's documentation on
shrinking repos, up at
https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository.
It appears you've skipped the whole "Fully removing the data from
GitHub" section of their documentation.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Reducing Git Repository size - git-filter-repo doesn't help
  2023-01-10  2:42 ` Elijah Newren
@ 2023-01-10  8:18   ` fawaz ahmed0
  2023-01-11  2:18     ` Elijah Newren
  0 siblings, 1 reply; 11+ messages in thread
From: fawaz ahmed0 @ 2023-01-10  8:18 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git@vger.kernel.org, rsbecker@nexbridge.com


> Note that git-filter-repo only changes the size of the _local_ repo.
   You made an additional clone within GitHub Actions, and then
   filter-repo shrinks *that* clone.  Even if you had deleted all copies
   of older files you don't want anymore locally (which is suspect as I
   noted above), your force pushing isn't going to shrink the size of the
   repo on the server (i.e. on GitHub) since there are pull requests in
   your repo that GitHub won't allow you to overwrite via force-push, and
   those pull requests still hold on to the old history.

Thanks for your inputs. Yes, the issue is with the GitHub itself (I have verified it locally), so when I force push this reduced sized repo to Github and reclone it back the size gets back to  4+ GB.
So what do you suggest? 

I have already contacted Github few days back, and they have cleared the cache etc for this repo.(but that did not reduce any size). ( I will try contacting them again)




> Why are you cleaning up in a CI task?
Task is yearly cleanup, to keep repo under 5GB limit as recommened by Github  ( To avoid receiving email from GitHub : https://github.com/whosonfirst-data/whosonfirst-data/issues/1507 )


> Could you provide that number and how you got it, so we can see what you're measuring
I was checking the repo size here: https://github.com/settings/repositories   (it takes around 1 hour for github to show updated size after any commit)







^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Reducing Git Repository size - git-filter-repo doesn't help
  2023-01-10  8:18   ` fawaz ahmed0
@ 2023-01-11  2:18     ` Elijah Newren
  2023-01-11  8:41       ` Elijah Newren
  2023-01-11 20:08       ` fawaz ahmed0
  0 siblings, 2 replies; 11+ messages in thread
From: Elijah Newren @ 2023-01-11  2:18 UTC (permalink / raw)
  To: fawaz ahmed0; +Cc: git@vger.kernel.org, rsbecker@nexbridge.com

On Tue, Jan 10, 2023 at 12:18 AM fawaz ahmed0 <fawazahmed0@hotmail.com> wrote:
>
>
> > Note that git-filter-repo only changes the size of the _local_ repo.
>    You made an additional clone within GitHub Actions, and then
>    filter-repo shrinks *that* clone.  Even if you had deleted all copies
>    of older files you don't want anymore locally (which is suspect as I
>    noted above), your force pushing isn't going to shrink the size of the
>    repo on the server (i.e. on GitHub) since there are pull requests in
>    your repo that GitHub won't allow you to overwrite via force-push, and
>    those pull requests still hold on to the old history.
>
> Thanks for your inputs. Yes, the issue is with the GitHub itself (I have verified it locally), so when I force push this reduced sized repo to Github and reclone it back the size gets back to  4+ GB.
> So what do you suggest?

Well, digging a bit further, I see:
```
$ git ls-remote https://github.com/fawazahmed0/currency-api
5536f2096aec71cee6d35772697ce1efc6148521 HEAD
5536f2096aec71cee6d35772697ce1efc6148521 refs/heads/1
7ed955eebe2e5a670566d5fb4e11142bae6afed6 refs/heads/patch-1
0292ec7340ec1ac8cd11d5839b8a25dd0c1db38f refs/pull/1/head
fa18a4ae50cc8ad84643d1f1f92da77c35139bbc refs/pull/16/head
bca74c53f66681f2d74a98d2f785e1610abeb794 refs/pull/18/head
4da5bcd8c7852bb306059a856f95e9bc1aeaefd3 refs/pull/47/head
5f1408be857734745a7a3349e9d16a8200e993be refs/pull/49/head
00ff16478e95f11347bc3e769e310c1a254579c9 refs/pull/57/head
5e3121acaddc7be810f2b9ad92641a04f9b923fb refs/pull/58/head
8afe04f52b8174c789691b4097c70f95d33c99c4 refs/pull/59/head
55b0d38d156145d8f9726746da7bb4d5ba9d988c refs/pull/60/head
bca74c53f66681f2d74a98d2f785e1610abeb794 refs/reviewable/pr18/r1
4da5bcd8c7852bb306059a856f95e9bc1aeaefd3 refs/reviewable/pr47/r1
5f1408be857734745a7a3349e9d16a8200e993be refs/reviewable/pr49/r1
00ff16478e95f11347bc3e769e310c1a254579c9 refs/reviewable/pr57/r1
5e3121acaddc7be810f2b9ad92641a04f9b923fb refs/reviewable/pr58/r1
8afe04f52b8174c789691b4097c70f95d33c99c4 refs/reviewable/pr59/r1
55b0d38d156145d8f9726746da7bb4d5ba9d988c refs/reviewable/pr60/r1
```

So, you have 19 references.  And according to the output of the job
you linked, you only force pushed refs/heads/1.  That means the other
branches and refs are holding on to the old history.  You'll either
need to delete or rewrite refs/heads/patch-1, and
refs/reviewable/pr{18,47,49,57,58,59,60}/r1 as well.  Rewriting the
refs/reviewable stuff may confuse and mess up Reviewable if you ever
look at those reviews again; it may be that deleting those refs
results in a cleaner error message within Reviewable.

You also need to have refs/pull/{1,16,18,47,49,57,58,59,60}/head all
rewritten too, but you don't have access.  You need to get GitHub
support to do that.  If they don't ask for the exact filtering you
have done, and are only "clearing caches", then it's not going to do
anyone any good.  Those references need to be filtered the same way,
or else those special refs need to be deleted.  But please do note
that you should not ask them to do that until you've already cleaned
up all the stuff that you can.

The steps look roughly as follows:

1. Clone the repository.  Since you have refs outside of refs/heads/
and refs/tags/ (namely refs/reviewable/), you'll need a mirror clone.
Also, make sure to NOT use a partial clone (which would defeat steps 2
& 5).
2. Note the size of the *local* clone ('du -hs' should come in handy).
3. Run `python3 git-filter-repo --analyze`, and look at the reports it
generates to find out the big files/directories and all the names of
those paths.
4. Filter the repository.
5. Verify that the local clone actually shrinks from your filtering
operation (not just has fewer commits, but 'du -hs' now reports a much
smaller number).  If it does not shrink as much as expected, run
`python3 git-filter-repo --analyze` again and see if you missed
alternate historical names of some files or whether you only filtered
what turned out to be small files or whatever.

Note that step 3 is important.  Rather than guessing what is big and
taking up lots of space, you find out what is big and take action upon
it.  For example, you once committed the node_modules/ folder.  I do
not know its size (too lazy to do a full clone and find out), but
those are often quite large.  Creating a commit that deletes that
directory does not remove it from history, only from the current
version.  So, expunging that directory from history may be important.
There may be other nuggets you find too, such as alternate names of
files that also need to be deleted if you've renamed things.

After you've succeeded with all the above:
6. force push *all* references back to GitHub -- or at least the ones
that GitHub will permit you to force push (should be everything other
than refs/pull/*).  If you only force push your "1" branch, you still
leave every other reference holding on to the old history.
7. Contact GitHub support to ensure they clean up the remaining
references (i.e. the refs/pull/* ones) _and_ clear their caches.  They
will either have to ask for what filtering you did so they can do it
as well, or they'll need to nuke those pull requests.
8. Wait to hear back
9. Check the `git ls-remote
https://github.com/fawazahmed0/currency-api` output again.  If the
refs/pull/* lines still exist and still have the same hashes at the
beginning of their lines, GitHub did not filter those references and
they are still holding on to the old history.  Contact them again, and
either get them to filter those refs the same way you filtered yours,
or get them to delete those pull requests.

> I have already contacted Github few days back, and they have cleared the cache etc for this repo.(but that did not reduce any size). ( I will try contacting them again)

Contacting them now is a waste of their time.  Filter your repo first.
Your whole repo (implying you need a mirror clone rather than a
regular clone given you have the unusual refs/reviewable/* references
that need filtering too).  And push the whole repo back, not just a
single branch.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Reducing Git Repository size - git-filter-repo doesn't help
  2023-01-11  2:18     ` Elijah Newren
@ 2023-01-11  8:41       ` Elijah Newren
  2023-01-11 20:08       ` fawaz ahmed0
  1 sibling, 0 replies; 11+ messages in thread
From: Elijah Newren @ 2023-01-11  8:41 UTC (permalink / raw)
  To: fawaz ahmed0; +Cc: git@vger.kernel.org, rsbecker@nexbridge.com

On Tue, Jan 10, 2023 at 6:18 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Tue, Jan 10, 2023 at 12:18 AM fawaz ahmed0 <fawazahmed0@hotmail.com> wrote:
> >
> >
> > > Note that git-filter-repo only changes the size of the _local_ repo.
> >    You made an additional clone within GitHub Actions, and then
> >    filter-repo shrinks *that* clone.  Even if you had deleted all copies
> >    of older files you don't want anymore locally (which is suspect as I
> >    noted above), your force pushing isn't going to shrink the size of the
> >    repo on the server (i.e. on GitHub) since there are pull requests in
> >    your repo that GitHub won't allow you to overwrite via force-push, and
> >    those pull requests still hold on to the old history.
> >
> > Thanks for your inputs. Yes, the issue is with the GitHub itself (I have verified it locally), so when I force push this reduced sized repo to Github and reclone it back the size gets back to  4+ GB.
> > So what do you suggest?
>
> Well, digging a bit further, I see:
> ```
> $ git ls-remote https://github.com/fawazahmed0/currency-api
> 5536f2096aec71cee6d35772697ce1efc6148521 HEAD
> 5536f2096aec71cee6d35772697ce1efc6148521 refs/heads/1
> 7ed955eebe2e5a670566d5fb4e11142bae6afed6 refs/heads/patch-1
> 0292ec7340ec1ac8cd11d5839b8a25dd0c1db38f refs/pull/1/head
> fa18a4ae50cc8ad84643d1f1f92da77c35139bbc refs/pull/16/head
> bca74c53f66681f2d74a98d2f785e1610abeb794 refs/pull/18/head
> 4da5bcd8c7852bb306059a856f95e9bc1aeaefd3 refs/pull/47/head
> 5f1408be857734745a7a3349e9d16a8200e993be refs/pull/49/head
> 00ff16478e95f11347bc3e769e310c1a254579c9 refs/pull/57/head
> 5e3121acaddc7be810f2b9ad92641a04f9b923fb refs/pull/58/head
> 8afe04f52b8174c789691b4097c70f95d33c99c4 refs/pull/59/head
> 55b0d38d156145d8f9726746da7bb4d5ba9d988c refs/pull/60/head
> bca74c53f66681f2d74a98d2f785e1610abeb794 refs/reviewable/pr18/r1
> 4da5bcd8c7852bb306059a856f95e9bc1aeaefd3 refs/reviewable/pr47/r1
> 5f1408be857734745a7a3349e9d16a8200e993be refs/reviewable/pr49/r1
> 00ff16478e95f11347bc3e769e310c1a254579c9 refs/reviewable/pr57/r1
> 5e3121acaddc7be810f2b9ad92641a04f9b923fb refs/reviewable/pr58/r1
> 8afe04f52b8174c789691b4097c70f95d33c99c4 refs/reviewable/pr59/r1
> 55b0d38d156145d8f9726746da7bb4d5ba9d988c refs/reviewable/pr60/r1
> ```
>
> So, you have 19 references.  And according to the output of the job
> you linked, you only force pushed refs/heads/1.  That means the other
> branches and refs are holding on to the old history.  You'll either
> need to delete or rewrite refs/heads/patch-1, and
> refs/reviewable/pr{18,47,49,57,58,59,60}/r1 as well.  Rewriting the
> refs/reviewable stuff may confuse and mess up Reviewable if you ever
> look at those reviews again; it may be that deleting those refs
> results in a cleaner error message within Reviewable.
>
> You also need to have refs/pull/{1,16,18,47,49,57,58,59,60}/head all
> rewritten too, but you don't have access.  You need to get GitHub
> support to do that.  If they don't ask for the exact filtering you
> have done, and are only "clearing caches", then it's not going to do
> anyone any good.  Those references need to be filtered the same way,
> or else those special refs need to be deleted.  But please do note
> that you should not ask them to do that until you've already cleaned
> up all the stuff that you can.
>
> The steps look roughly as follows:
>
> 1. Clone the repository.  Since you have refs outside of refs/heads/
> and refs/tags/ (namely refs/reviewable/), you'll need a mirror clone.
> Also, make sure to NOT use a partial clone (which would defeat steps 2
> & 5).
> 2. Note the size of the *local* clone ('du -hs' should come in handy).
> 3. Run `python3 git-filter-repo --analyze`, and look at the reports it
> generates to find out the big files/directories and all the names of
> those paths.
> 4. Filter the repository.
> 5. Verify that the local clone actually shrinks from your filtering
> operation (not just has fewer commits, but 'du -hs' now reports a much
> smaller number).  If it does not shrink as much as expected, run
> `python3 git-filter-repo --analyze` again and see if you missed
> alternate historical names of some files or whether you only filtered
> what turned out to be small files or whatever.
>
> Note that step 3 is important.  Rather than guessing what is big and
> taking up lots of space, you find out what is big and take action upon
> it.  For example, you once committed the node_modules/ folder.  I do
> not know its size (too lazy to do a full clone and find out), but
> those are often quite large.  Creating a commit that deletes that
> directory does not remove it from history, only from the current
> version.  So, expunging that directory from history may be important.
> There may be other nuggets you find too, such as alternate names of
> files that also need to be deleted if you've renamed things.
>
> After you've succeeded with all the above:
> 6. force push *all* references back to GitHub -- or at least the ones
> that GitHub will permit you to force push (should be everything other
> than refs/pull/*).  If you only force push your "1" branch, you still
> leave every other reference holding on to the old history.
> 7. Contact GitHub support to ensure they clean up the remaining
> references (i.e. the refs/pull/* ones) _and_ clear their caches.  They
> will either have to ask for what filtering you did so they can do it
> as well, or they'll need to nuke those pull requests.
> 8. Wait to hear back
> 9. Check the `git ls-remote
> https://github.com/fawazahmed0/currency-api` output again.  If the
> refs/pull/* lines still exist and still have the same hashes at the
> beginning of their lines, GitHub did not filter those references and
> they are still holding on to the old history.  Contact them again, and
> either get them to filter those refs the same way you filtered yours,
> or get them to delete those pull requests.

I may need to take this partially back.  Since you've already
contacted GitHub support, they may have already done filtering on
these references and further filtering of those isn't needed.  It may
be that after filtering the references and your control and pushing
those all back, that no further work is needed.

> > I have already contacted Github few days back, and they have cleared the cache etc for this repo.(but that did not reduce any size). ( I will try contacting them again)
>
> Contacting them now is a waste of their time.  Filter your repo first.
> Your whole repo (implying you need a mirror clone rather than a
> regular clone given you have the unusual refs/reviewable/* references
> that need filtering too).  And push the whole repo back, not just a
> single branch.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Reducing Git Repository size - git-filter-repo doesn't help
  2023-01-11  2:18     ` Elijah Newren
  2023-01-11  8:41       ` Elijah Newren
@ 2023-01-11 20:08       ` fawaz ahmed0
  2023-01-12  1:54         ` fawaz ahmed0
  1 sibling, 1 reply; 11+ messages in thread
From: fawaz ahmed0 @ 2023-01-11 20:08 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git@vger.kernel.org

Thank you for looking into this. 

I have removed all the unnecessary refs on remote, and rerun the filter-repo and force pushed the changes again.
To be be sure, I have also requested Github support to completely nuke the refs/pull and clear cache etc.

Also would like to let you know that git-filter-repo --analyze (which was also suggested by EriK), causes OOM in my case (also tried running in github actions with huge swap, but it exceeds 6 hours timeout limit).

I have verified size reduction (after running filter-repo) using:
`git gc && git count-objects -vH`

Thank you once again for the help

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Reducing Git Repository size - git-filter-repo doesn't help
  2023-01-11 20:08       ` fawaz ahmed0
@ 2023-01-12  1:54         ` fawaz ahmed0
  2023-01-12  2:01           ` Elijah Newren
  0 siblings, 1 reply; 11+ messages in thread
From: fawaz ahmed0 @ 2023-01-12  1:54 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git@vger.kernel.org

Hi,

I have checked the repository size at here: https://github.com/settings/repositories  and it got reduced from 4.7Gb to 3.3Gb.

Thank you, have been trying to reduce the repo size from last 15 days.

Regards,
Fawaz Ahmed

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Reducing Git Repository size - git-filter-repo doesn't help
  2023-01-12  1:54         ` fawaz ahmed0
@ 2023-01-12  2:01           ` Elijah Newren
  0 siblings, 0 replies; 11+ messages in thread
From: Elijah Newren @ 2023-01-12  2:01 UTC (permalink / raw)
  To: fawaz ahmed0; +Cc: git@vger.kernel.org

On Wed, Jan 11, 2023 at 5:54 PM fawaz ahmed0 <fawazahmed0@hotmail.com> wrote:
>
> Hi,
>
> I have checked the repository size at here: https://github.com/settings/repositories  and it got reduced from 4.7Gb to 3.3Gb.
>
> Thank you, have been trying to reduce the repo size from last 15 days.
>
> Regards,
> Fawaz Ahmed

Awesome, glad you got it squared away!

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-01-12  2:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-09  2:22 Reducing Git Repository size - git-filter-repo doesn't help fawaz ahmed0
2023-01-09  2:36 ` rsbecker
2023-01-09  3:29   ` fawaz ahmed0
2023-01-09 10:22 ` Erik Cervin Edin
2023-01-10  2:42 ` Elijah Newren
2023-01-10  8:18   ` fawaz ahmed0
2023-01-11  2:18     ` Elijah Newren
2023-01-11  8:41       ` Elijah Newren
2023-01-11 20:08       ` fawaz ahmed0
2023-01-12  1:54         ` fawaz ahmed0
2023-01-12  2:01           ` Elijah Newren

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).