git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] bloom: ignore renames when computing changed paths
@ 2020-04-08 16:38 Derrick Stolee via GitGitGadget
  2020-04-08 19:11 ` Junio C Hamano
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-04-08 16:38 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The changed-path Bloom filters record an entry in the filter for
every path that was changed. This includes every add and delete,
regardless of whther a rename was detected. Detecting renames
causes significant performance issues, but also will trigger
downloading missing blobs in partial clone.

The simple fix is to disable rename detection when computing a
changed-path Bloom filter.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
    bloom: ignore renames when computing changed paths
    
    I promised [1] I would adapt the commit that was dropped from
    gs/commit-graph-path-filter [2] on top of gs/commit-graph-path-filter
    and jt/avoid-prefetch-when-able-in-diff. However, I noticed that the
    change was extremely simple and has value without basing it on
    jt/avoid-prefetch-when-able-in-diff.
    
    This change applied to gs/commit-graph-path-filter has obvious CPU time
    improvements for computing changed-path Bloom filters (that I did not
    measure). The partial clone improvements require
    jt/avoid-prefetch-when-able-in-diff to be included, too, but the code
    does not depend on it at compile time.
    
    Thanks, -Stolee
    
    [1] 
    https://lore.kernel.org/git/7de2f54b-8704-a0e1-12aa-0ca9d3d70f6f@gmail.com/
    [2] 
    https://lore.kernel.org/git/55824cda89c1dca7756c8c2d831d6e115f4a9ddb.1585528298.git.gitgitgadget@gmail.com/

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-601%2Fderrickstolee%2Fdiff-and-bloom-filters-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-601/derrickstolee/diff-and-bloom-filters-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/601

 bloom.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/bloom.c b/bloom.c
index c5b461d1cfe..dd9bab9bbd6 100644
--- a/bloom.c
+++ b/bloom.c
@@ -189,6 +189,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
 
 	repo_diff_setup(r, &diffopt);
 	diffopt.flags.recursive = 1;
+	diffopt.detect_rename = 0;
 	diffopt.max_changes = max_changes;
 	diff_setup_done(&diffopt);
 

base-commit: d5b873c832d832e44523d1d2a9d29afe2b84c84f
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] bloom: ignore renames when computing changed paths
  2020-04-08 16:38 [PATCH] bloom: ignore renames when computing changed paths Derrick Stolee via GitGitGadget
@ 2020-04-08 19:11 ` Junio C Hamano
  2020-04-08 19:13 ` Philip Oakley
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2020-04-08 19:11 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The changed-path Bloom filters record an entry in the filter for
> every path that was changed. This includes every add and delete,
> regardless of whther a rename was detected. Detecting renames
> causes significant performance issues, but also will trigger
> downloading missing blobs in partial clone.
>
> The simple fix is to disable rename detection when computing a
> changed-path Bloom filter.

Makes perfect sense to me.

> diff --git a/bloom.c b/bloom.c
> index c5b461d1cfe..dd9bab9bbd6 100644
> --- a/bloom.c
> +++ b/bloom.c
> @@ -189,6 +189,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
>  
>  	repo_diff_setup(r, &diffopt);
>  	diffopt.flags.recursive = 1;
> +	diffopt.detect_rename = 0;
>  	diffopt.max_changes = max_changes;
>  	diff_setup_done(&diffopt);
>  
>
> base-commit: d5b873c832d832e44523d1d2a9d29afe2b84c84f

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] bloom: ignore renames when computing changed paths
  2020-04-08 16:38 [PATCH] bloom: ignore renames when computing changed paths Derrick Stolee via GitGitGadget
  2020-04-08 19:11 ` Junio C Hamano
@ 2020-04-08 19:13 ` Philip Oakley
  2020-04-08 22:31 ` Jeff King
  2020-04-09 13:00 ` [PATCH v2] " Derrick Stolee via GitGitGadget
  3 siblings, 0 replies; 9+ messages in thread
From: Philip Oakley @ 2020-04-08 19:13 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git; +Cc: Derrick Stolee

spelling nit.

On 08/04/2020 17:38, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The changed-path Bloom filters record an entry in the filter for
> every path that was changed. This includes every add and delete,
> regardless of whther a rename was detected. Detecting renames
whether
> causes significant performance issues, but also will trigger
> downloading missing blobs in partial clone.
>
> The simple fix is to disable rename detection when computing a
> changed-path Bloom filter.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>     bloom: ignore renames when computing changed paths
>     
>     I promised [1] I would adapt the commit that was dropped from
>     gs/commit-graph-path-filter [2] on top of gs/commit-graph-path-filter
>     and jt/avoid-prefetch-when-able-in-diff. However, I noticed that the
>     change was extremely simple and has value without basing it on
>     jt/avoid-prefetch-when-able-in-diff.
>     
>     This change applied to gs/commit-graph-path-filter has obvious CPU time
>     improvements for computing changed-path Bloom filters (that I did not
>     measure). The partial clone improvements require
>     jt/avoid-prefetch-when-able-in-diff to be included, too, but the code
>     does not depend on it at compile time.
>     
>     Thanks, -Stolee
>     
>     [1] 
>     https://lore.kernel.org/git/7de2f54b-8704-a0e1-12aa-0ca9d3d70f6f@gmail.com/
>     [2] 
>     https://lore.kernel.org/git/55824cda89c1dca7756c8c2d831d6e115f4a9ddb.1585528298.git.gitgitgadget@gmail.com/
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-601%2Fderrickstolee%2Fdiff-and-bloom-filters-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-601/derrickstolee/diff-and-bloom-filters-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/601
>
>  bloom.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/bloom.c b/bloom.c
> index c5b461d1cfe..dd9bab9bbd6 100644
> --- a/bloom.c
> +++ b/bloom.c
> @@ -189,6 +189,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
>  
>  	repo_diff_setup(r, &diffopt);
>  	diffopt.flags.recursive = 1;
> +	diffopt.detect_rename = 0;
>  	diffopt.max_changes = max_changes;
>  	diff_setup_done(&diffopt);
>  
>
> base-commit: d5b873c832d832e44523d1d2a9d29afe2b84c84f
Philip

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] bloom: ignore renames when computing changed paths
  2020-04-08 16:38 [PATCH] bloom: ignore renames when computing changed paths Derrick Stolee via GitGitGadget
  2020-04-08 19:11 ` Junio C Hamano
  2020-04-08 19:13 ` Philip Oakley
@ 2020-04-08 22:31 ` Jeff King
  2020-04-09 11:56   ` Derrick Stolee
  2020-04-09 13:00 ` [PATCH v2] " Derrick Stolee via GitGitGadget
  3 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2020-04-08 22:31 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, Derrick Stolee

On Wed, Apr 08, 2020 at 04:38:27PM +0000, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
> 
> The changed-path Bloom filters record an entry in the filter for
> every path that was changed. This includes every add and delete,
> regardless of whther a rename was detected. Detecting renames
> causes significant performance issues, but also will trigger
> downloading missing blobs in partial clone.
> 
> The simple fix is to disable rename detection when computing a
> changed-path Bloom filter.

Yes, we should be doing as simple a tree-diff as possible.

I wonder if this might actually be fixing a potential bug, too. The loop
below this code only looks at the "two" half of each queued diff pair.
With renames enabled, we might see the source path only in the "one"
half of the rename pair.

However, this seems to work already. If I do:

  echo content >file
  git add file
  git commit -m added
  echo change >file
  git commit -am changed
  git mv file other
  git commit -am 'rename away'
  git mv other file
  git commit -am 'rename back'
  git rm file
  git commit -am removed

  git commit-graph write --reachable --changed-paths
  git log --oneline -- file | cat

then I see all of the commits. Instrumenting Git like:

diff --git a/bloom.c b/bloom.c
index c5b461d1cf..fb2a758e1d 100644
--- a/bloom.c
+++ b/bloom.c
@@ -207,6 +207,10 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
 		for (i = 0; i < diff_queued_diff.nr; i++) {
 			const char *path = diff_queued_diff.queue[i]->two->path;
 
+			warning("queuing touched %c path %s for %s",
+				diff_queued_diff.queue[i]->status,
+				path, oid_to_hex(&c->object.oid));
+
 			/*
 			* Add each leading directory of the changed file, i.e. for
 			* 'dir/subdir/file' add 'dir' and 'dir/subdir' as well, so

results in:

  warning: queuing touched A path file for 2346d88b0cb4bca11c38ee545d007a7a14ca472a
  warning: queuing touched M path file for 991cd7f0696ae29fea738ca1b8340c90dae4b201
  warning: queuing touched D path file for d3642c9fb27459ea09f6c967a1e6ad119e265d6f
  warning: queuing touched A path other for d3642c9fb27459ea09f6c967a1e6ad119e265d6f
  warning: queuing touched A path file for bc908eb29e562d97ebb8cf718e41b69d3aa1d834
  warning: queuing touched D path other for bc908eb29e562d97ebb8cf718e41b69d3aa1d834
  warning: queuing touched D path file for 7433b46bd6aa170ab17a651c10658a5b0c10ba4f

So we really aren't detecting renames in the first place! And indeed,
checking diffopt.detect_rename shows that it is unset. So I'm curious if
there is a case where that would not be true. I _think_ it would only be
true in a program which ran init_diff_ui_defaults(), but never in
git-commit-graph.

Even if it does nothing in practice, I'm not at all opposed to having it
in there as an explicit documentation of our expectations/requirements
for the loop below. But it's probably worth saying so in the commit
message.

-Peff

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] bloom: ignore renames when computing changed paths
  2020-04-08 22:31 ` Jeff King
@ 2020-04-09 11:56   ` Derrick Stolee
  2020-04-09 13:47     ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Derrick Stolee @ 2020-04-09 11:56 UTC (permalink / raw)
  To: Jeff King, Derrick Stolee via GitGitGadget
  Cc: git, Derrick Stolee, Philip Oakley

On 4/8/2020 6:31 PM, Jeff King wrote:
> On Wed, Apr 08, 2020 at 04:38:27PM +0000, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> The changed-path Bloom filters record an entry in the filter for
>> every path that was changed. This includes every add and delete,
>> regardless of whther a rename was detected. Detecting renames
>> causes significant performance issues, but also will trigger
>> downloading missing blobs in partial clone.
>>
>> The simple fix is to disable rename detection when computing a
>> changed-path Bloom filter.
> 
> Yes, we should be doing as simple a tree-diff as possible.
> 
> I wonder if this might actually be fixing a potential bug, too. The loop
> below this code only looks at the "two" half of each queued diff pair.
> With renames enabled, we might see the source path only in the "one"
> half of the rename pair.
> 
> However, this seems to work already. If I do:
> 
>   echo content >file
>   git add file
>   git commit -m added
>   echo change >file
>   git commit -am changed
>   git mv file other
>   git commit -am 'rename away'
>   git mv other file
>   git commit -am 'rename back'
>   git rm file
>   git commit -am removed
> 
>   git commit-graph write --reachable --changed-paths
>   git log --oneline -- file | cat
> 
> then I see all of the commits. Instrumenting Git like:
> 
> diff --git a/bloom.c b/bloom.c
> index c5b461d1cf..fb2a758e1d 100644
> --- a/bloom.c
> +++ b/bloom.c
> @@ -207,6 +207,10 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
>  		for (i = 0; i < diff_queued_diff.nr; i++) {
>  			const char *path = diff_queued_diff.queue[i]->two->path;
>  
> +			warning("queuing touched %c path %s for %s",
> +				diff_queued_diff.queue[i]->status,
> +				path, oid_to_hex(&c->object.oid));
> +
>  			/*
>  			* Add each leading directory of the changed file, i.e. for
>  			* 'dir/subdir/file' add 'dir' and 'dir/subdir' as well, so
> 
> results in:
> 
>   warning: queuing touched A path file for 2346d88b0cb4bca11c38ee545d007a7a14ca472a
>   warning: queuing touched M path file for 991cd7f0696ae29fea738ca1b8340c90dae4b201
>   warning: queuing touched D path file for d3642c9fb27459ea09f6c967a1e6ad119e265d6f
>   warning: queuing touched A path other for d3642c9fb27459ea09f6c967a1e6ad119e265d6f
>   warning: queuing touched A path file for bc908eb29e562d97ebb8cf718e41b69d3aa1d834
>   warning: queuing touched D path other for bc908eb29e562d97ebb8cf718e41b69d3aa1d834
>   warning: queuing touched D path file for 7433b46bd6aa170ab17a651c10658a5b0c10ba4f
> 
> So we really aren't detecting renames in the first place! And indeed,
> checking diffopt.detect_rename shows that it is unset. So I'm curious if
> there is a case where that would not be true. I _think_ it would only be
> true in a program which ran init_diff_ui_defaults(), but never in
> git-commit-graph.

So our issue was really that the partial clone prefetch logic was just
overly aggressive.

> Even if it does nothing in practice, I'm not at all opposed to having it
> in there as an explicit documentation of our expectations/requirements
> for the loop below. But it's probably worth saying so in the commit
> message.

I will update the message and send a v2. I'll fix the typo that Philip pointed
out, too.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] bloom: ignore renames when computing changed paths
  2020-04-08 16:38 [PATCH] bloom: ignore renames when computing changed paths Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2020-04-08 22:31 ` Jeff King
@ 2020-04-09 13:00 ` Derrick Stolee via GitGitGadget
  3 siblings, 0 replies; 9+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-04-09 13:00 UTC (permalink / raw)
  To: git; +Cc: philipoakley, peff, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The changed-path Bloom filters record an entry in the filter for
every path that was changed. This includes every add and delete,
regardless of whether a rename was detected. Detecting renames
causes significant performance issues, but also will trigger
downloading missing blobs in partial clone.

The simple fix is to disable rename detection when computing a
changed-path Bloom filter. This should already be disabled by
default, but it is good to explicitly enforce the intended
behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
    bloom: ignore renames when computing changed paths
    
    I promised [1] I would adapt the commit that was dropped from
    gs/commit-graph-path-filter [2] on top of gs/commit-graph-path-filter
    and jt/avoid-prefetch-when-able-in-diff. However, I noticed that the
    change was extremely simple and has value without basing it on
    jt/avoid-prefetch-when-able-in-diff.
    
    This change applied to gs/commit-graph-path-filter has obvious CPU time
    improvements for computing changed-path Bloom filters (that I did not
    measure). The partial clone improvements require
    jt/avoid-prefetch-when-able-in-diff to be included, too, but the code
    does not depend on it at compile time.
    
    Thanks, -Stolee
    
    [1] 
    https://lore.kernel.org/git/7de2f54b-8704-a0e1-12aa-0ca9d3d70f6f@gmail.com/
    [2] 
    https://lore.kernel.org/git/55824cda89c1dca7756c8c2d831d6e115f4a9ddb.1585528298.git.gitgitgadget@gmail.com/

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-601%2Fderrickstolee%2Fdiff-and-bloom-filters-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-601/derrickstolee/diff-and-bloom-filters-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/601

Range-diff vs v1:

 1:  5fae00adcf0 ! 1:  f4df00a0dd4 bloom: ignore renames when computing changed paths
     @@ Commit message
      
          The changed-path Bloom filters record an entry in the filter for
          every path that was changed. This includes every add and delete,
     -    regardless of whther a rename was detected. Detecting renames
     +    regardless of whether a rename was detected. Detecting renames
          causes significant performance issues, but also will trigger
          downloading missing blobs in partial clone.
      
          The simple fix is to disable rename detection when computing a
     -    changed-path Bloom filter.
     +    changed-path Bloom filter. This should already be disabled by
     +    default, but it is good to explicitly enforce the intended
     +    behavior.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      


 bloom.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/bloom.c b/bloom.c
index c5b461d1cfe..dd9bab9bbd6 100644
--- a/bloom.c
+++ b/bloom.c
@@ -189,6 +189,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
 
 	repo_diff_setup(r, &diffopt);
 	diffopt.flags.recursive = 1;
+	diffopt.detect_rename = 0;
 	diffopt.max_changes = max_changes;
 	diff_setup_done(&diffopt);
 

base-commit: d5b873c832d832e44523d1d2a9d29afe2b84c84f
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] bloom: ignore renames when computing changed paths
  2020-04-09 11:56   ` Derrick Stolee
@ 2020-04-09 13:47     ` Jeff King
  2020-04-09 14:00       ` Derrick Stolee
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2020-04-09 13:47 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, Derrick Stolee,
	Philip Oakley

On Thu, Apr 09, 2020 at 07:56:43AM -0400, Derrick Stolee wrote:

> > So we really aren't detecting renames in the first place! And indeed,
> > checking diffopt.detect_rename shows that it is unset. So I'm curious if
> > there is a case where that would not be true. I _think_ it would only be
> > true in a program which ran init_diff_ui_defaults(), but never in
> > git-commit-graph.
> 
> So our issue was really that the partial clone prefetch logic was just
> overly aggressive.

Right, but I'm not sure how this patch could ever have helped, since
it's just setting a variable to the value it _should_ have already had.

Or do you just mean that the issue would have gone away with Jonathan's
patch to make the prefetching less aggressive?

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] bloom: ignore renames when computing changed paths
  2020-04-09 13:47     ` Jeff King
@ 2020-04-09 14:00       ` Derrick Stolee
  2020-04-09 14:15         ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Derrick Stolee @ 2020-04-09 14:00 UTC (permalink / raw)
  To: Jeff King
  Cc: Derrick Stolee via GitGitGadget, git, Derrick Stolee,
	Philip Oakley

On 4/9/2020 9:47 AM, Jeff King wrote:
> On Thu, Apr 09, 2020 at 07:56:43AM -0400, Derrick Stolee wrote:
> 
>>> So we really aren't detecting renames in the first place! And indeed,
>>> checking diffopt.detect_rename shows that it is unset. So I'm curious if
>>> there is a case where that would not be true. I _think_ it would only be
>>> true in a program which ran init_diff_ui_defaults(), but never in
>>> git-commit-graph.
>>
>> So our issue was really that the partial clone prefetch logic was just
>> overly aggressive.
> 
> Right, but I'm not sure how this patch could ever have helped, since
> it's just setting a variable to the value it _should_ have already had.
> 
> Or do you just mean that the issue would have gone away with Jonathan's
> patch to make the prefetching less aggressive?

Yes, with Jonathan's patch we stop downloading blobs during Bloom filter
computations. The patch this is "replacing" disabled that download in a
different way, depending only on if detect_renames is false and the diff
output doesn't need file contents. (Jonathan's is better.)

-Stolee

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] bloom: ignore renames when computing changed paths
  2020-04-09 14:00       ` Derrick Stolee
@ 2020-04-09 14:15         ` Jeff King
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2020-04-09 14:15 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, Derrick Stolee,
	Philip Oakley

On Thu, Apr 09, 2020 at 10:00:46AM -0400, Derrick Stolee wrote:

> >> So our issue was really that the partial clone prefetch logic was just
> >> overly aggressive.
> > 
> > Right, but I'm not sure how this patch could ever have helped, since
> > it's just setting a variable to the value it _should_ have already had.
> > 
> > Or do you just mean that the issue would have gone away with Jonathan's
> > patch to make the prefetching less aggressive?
> 
> Yes, with Jonathan's patch we stop downloading blobs during Bloom filter
> computations. The patch this is "replacing" disabled that download in a
> different way, depending only on if detect_renames is false and the diff
> output doesn't need file contents. (Jonathan's is better.)

OK. I'm willing to leave it there, but I'm still puzzled as to why the
original patch would have done anything based on detect_renames, which
would always be false.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-04-09 14:16 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-08 16:38 [PATCH] bloom: ignore renames when computing changed paths Derrick Stolee via GitGitGadget
2020-04-08 19:11 ` Junio C Hamano
2020-04-08 19:13 ` Philip Oakley
2020-04-08 22:31 ` Jeff King
2020-04-09 11:56   ` Derrick Stolee
2020-04-09 13:47     ` Jeff King
2020-04-09 14:00       ` Derrick Stolee
2020-04-09 14:15         ` Jeff King
2020-04-09 13:00 ` [PATCH v2] " Derrick Stolee via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).