git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: "SZEDER Gábor" <szeder.dev@gmail.com>,
	"Garima Singh via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, jonathantanmy@google.com,
	Garima Singh <garima.singh@microsoft.com>
Subject: Re: [PATCH v4 09/15] commit-graph: write Bloom filters to commit graph file
Date: Fri, 29 May 2020 09:35:17 -0400	[thread overview]
Message-ID: <72cff41c-bb2e-5f87-5db6-d4e9ead25a47@gmail.com> (raw)
In-Reply-To: <20200529085721.GA25128@szeder.dev>

On 5/29/2020 4:57 AM, SZEDER Gábor wrote:
> On Mon, Apr 06, 2020 at 04:59:49PM +0000, Garima Singh via GitGitGadget wrote:
>> From: Garima Singh <garima.singh@microsoft.com>
>>
>> Update the technical documentation for commit-graph-format with
>> the formats for the Bloom filter index (BIDX) and Bloom filter
>> data (BDAT) chunks. Write the computed Bloom filters information
>> to the commit graph file using this format.
>>
>> Helped-by: Derrick Stolee <dstolee@microsoft.com>
>> Signed-off-by: Garima Singh <garima.singh@microsoft.com>
>> ---
>>  .../technical/commit-graph-format.txt         |  30 +++++
>>  commit-graph.c                                | 113 +++++++++++++++++-
>>  commit-graph.h                                |   5 +
>>  3 files changed, 147 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
>> index a4f17441aed..de56f9f1efd 100644
>> --- a/Documentation/technical/commit-graph-format.txt
>> +++ b/Documentation/technical/commit-graph-format.txt
>> @@ -17,6 +17,9 @@ metadata, including:
>>  - The parents of the commit, stored using positional references within
>>    the graph file.
>>  
>> +- The Bloom filter of the commit carrying the paths that were changed between
>> +  the commit and its first parent, if requested.
>> +
>>  These positional references are stored as unsigned 32-bit integers
>>  corresponding to the array position within the list of commit OIDs. Due
>>  to some special constants we use to track parents, we can store at most
>> @@ -93,6 +96,33 @@ CHUNK DATA:
>>        positions for the parents until reaching a value with the most-significant
>>        bit on. The other bits correspond to the position of the last parent.
>>  
>> +  Bloom Filter Index (ID: {'B', 'I', 'D', 'X'}) (N * 4 bytes) [Optional]
>> +    * The ith entry, BIDX[i], stores the number of 8-byte word blocks in all
> 
> This is inconsistent with the implementation: according to the code in
> one of the previous patches these entries are simple byte offsets, not
> 8-byte word offsets, i.e. the combined size of all modified path
> Bloom filters can be at most 2^32 bytes.

The documentation was fixed in 88093289cdc (Documentation: changed-path Bloom
filters use byte words, 2020-05-11).

> The commit-graph file can contain information about at most 2^31-1
> commits.  This means that with that many commits each commit can have
> a merely 2 byte Bloom filter on average.  When using 7 hashes we'd
> need 10 bits per path, so in two bytes we could store only a single
> path.
> 
> Clearly, using 4 byte index entries significantly lowers the max
> number of commits that can be stored with modified path Bloom filters.

This is a good point, and certainly the reason for 8-byte multiples.

> IMO every new chunk must support at least 2^31-1 commits.

I'm not sure this is a valid requirement. Even extremely large repositories
(that are created by actual use, not synthetic) are on the scale of 2^24
commits.

You are right that we should make the commit-graph write process more robust
to reaching these limits. You point out that we have a new limit when these
filters are enabled.

For reference, the Windows OS repo has ~4.25 million commits and the
commit-graph file with changed-path Bloom filters is around 520mb. That's
the whole file size, and without the filters it's around 240mb, so the
filters are taking <300mb ~ 2^29 and we would need to grow the repo by 8x
to hit this limit. That's not an unreasonable amount of growth, but is
also far enough away that we can handle it in time.

The incremental commit-graph can actually save us here (and is similar to
how we solved a scale issue in Azure Repos around the multi-pack-index):
we can refuse to merge layers of an incremental commit-graph if the
changed-path filters would exceed the size limit. Of course, the _first_
write of such a commit-graph would need to be aware of this limit and
plan for it in advance, but that's also a theoretical issue.

I'm tracking some follow-up work [1] for the changed-path filters,
including a way to limit the number of filters computed in one
"git commit-graph write" process. I'll make note of your concerns here,
too.

[1] https://github.com/microsoft/git/issues/272

>> +      Bloom filters from commit 0 to commit i (inclusive) in lexicographic
>> +      order. The Bloom filter for the i-th commit spans from BIDX[i-1] to
>> +      BIDX[i] (plus header length), where BIDX[-1] is 0.
>> +    * The BIDX chunk is ignored if the BDAT chunk is not present.
>> +
>> +  Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional]
>> +    * It starts with header consisting of three unsigned 32-bit integers:
>> +      - Version of the hash algorithm being used. We currently only support
>> +	value 1 which corresponds to the 32-bit version of the murmur3 hash
>> +	implemented exactly as described in
>> +	https://en.wikipedia.org/wiki/MurmurHash#Algorithm and the double
>> +	hashing technique using seed values 0x293ae76f and 0x7e646e2 as
>> +	described in https://doi.org/10.1007/978-3-540-30494-4_26 "Bloom Filters
>> +	in Probabilistic Verification"
> 
> How should double hashing compute the k hashes, i.e. using 64 bit or
> 32 bit unsigned integer arithmetic?
> 
> I'm puzzled that you link to this paper and still use double hashing.
> 
> Two of the contributions of that paper are that it points out some
> shortcomings of the double hashing scheme and provides a better
> alternative in the form of enhanced double hashing, which can cut the
> false positive rate in half.
> 
> However, that paper considers the hashing scheme only in the context
> of one big Bloom filter.  I've found that when it comes to many small
> Bloom filters then the k hashes produced by any double hashing variant
> are not independent enough, and "standard" double hashing fares the
> worst among them.  There are real repositories out there where double
> hashing has over an order of magnitude higher average false positive
> rate than enhanced double hashing.  Though that's not to say that
> enhanced double hashing is good...
> 
> For details on these issues see
> 
>   https://public-inbox.org/git/20200529085038.26008-16-szeder.dev@gmail.com

That message includes very detailed experimental analysis, which is nice.
We will need to do some concrete side-by-side comparisons to see if there
actually is a meaningful difference. (You may have already done this.)

>> +      - The number of times a path is hashed and hence the number of bit positions
>> +	      that cumulatively determine whether a file is present in the commit.
>> +      - The minimum number of bits 'b' per entry in the Bloom filter. If the filter
>> +	      contains 'n' entries, then the filter size is the minimum number of 64-bit
>> +	      words that contain n*b bits.
> 
> Since the ideal number of bits per element depends only on the number
> of hashes per path (k / ln(2) ≈ k * 10 / 7), why is this value stored
> in the commit-graph?

The ideal number depends also on what false-positive rate you want. In a
hypothetical future where we want to allow customization here, we want
the filters to be consistently sized across all filters.

>> +    * The rest of the chunk is the concatenation of all the computed Bloom
>> +      filters for the commits in lexicographic order.
>> +    * Note: Commits with no changes or more than 512 changes have Bloom filters
>> +      of length zero.
> 
> What does this "Note:" prefix mean in the file format specification?
> 
> Can an implementation use a one byte Bloom filter with no bits set for
> a commit with no changes?  Can an implementation still store a Bloom
> filter for commits that modify more than 512 paths?

This is currently due to a hard-coded value in the implementation. It's not a
requirement of the file format.

>> +    * The BDAT chunk is present if and only if BIDX is present.
>> +
>>    Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional]
>>        This list of H-byte hashes describe a set of B commit-graph files that
>>        form a commit-graph chain. The graph position for the ith commit in this
>> diff --git a/commit-graph.c b/commit-graph.c
>> index 732c81fa1b2..a8b6b5cca5d 100644
>> --- a/commit-graph.c
>> +++ b/commit-graph.c
> 
>> @@ -1034,6 +1071,59 @@ static void write_graph_chunk_extra_edges(struct hashfile *f,
>>  	}
>>  }
>>  
>> +static void write_graph_chunk_bloom_indexes(struct hashfile *f,
>> +					    struct write_commit_graph_context *ctx)
>> +{
>> +	struct commit **list = ctx->commits.list;
>> +	struct commit **last = ctx->commits.list + ctx->commits.nr;
>> +	uint32_t cur_pos = 0;
>> +	struct progress *progress = NULL;
>> +	int i = 0;
>> +
>> +	if (ctx->report_progress)
>> +		progress = start_delayed_progress(
>> +			_("Writing changed paths Bloom filters index"),
>> +			ctx->commits.nr);
>> +
>> +	while (list < last) {
>> +		struct bloom_filter *filter = get_bloom_filter(ctx->r, *list);
>> +		cur_pos += filter->len;
> 
> Given a sufficiently large number of commits with large enough Bloom
> filters this will silently overflow.

Worth fixing, but we are not in a rush. I noted it in my GitHub issue.

Thanks,
-Stolee


  reply	other threads:[~2020-05-29 13:35 UTC|newest]

Thread overview: 159+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-20 22:05 [PATCH 0/9] [RFC] Changed Paths Bloom Filters Garima Singh via GitGitGadget
2019-12-20 22:05 ` [PATCH 1/9] commit-graph: add --changed-paths option to write Garima Singh via GitGitGadget
2020-01-01 20:20   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 2/9] commit-graph: write changed paths bloom filters Garima Singh via GitGitGadget
2019-12-21 16:48   ` Philip Oakley
2020-01-06 18:44   ` Jakub Narebski
2020-01-13 19:48     ` Garima Singh
2019-12-20 22:05 ` [PATCH 3/9] commit-graph: use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-01-07 12:19   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 4/9] commit-graph: document bloom filter format Garima Singh via GitGitGadget
2020-01-07 14:46   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 5/9] commit-graph: write changed path bloom filters to commit-graph file Garima Singh via GitGitGadget
2020-01-07 16:01   ` Jakub Narebski
2020-01-14 15:14     ` Garima Singh
2019-12-20 22:05 ` [PATCH 6/9] commit-graph: test commit-graph write --changed-paths Garima Singh via GitGitGadget
2020-01-08  0:32   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 7/9] commit-graph: reuse existing bloom filters during write Garima Singh via GitGitGadget
2020-01-09 19:12   ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 8/9] revision.c: use bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-01-11  0:27   ` Jakub Narebski
2020-01-15  0:08     ` Garima Singh
2019-12-20 22:05 ` [PATCH 9/9] commit-graph: add GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS test flag Garima Singh via GitGitGadget
2020-01-11 19:56   ` Jakub Narebski
2020-01-15  0:55     ` Garima Singh
2019-12-20 22:14 ` [PATCH 0/9] [RFC] Changed Paths Bloom Filters Junio C Hamano
2019-12-22  9:26 ` Christian Couder
2019-12-22  9:38   ` Jeff King
2020-01-01 12:04     ` Jakub Narebski
2019-12-22  9:30 ` Jeff King
2019-12-22  9:32   ` [PATCH 1/3] commit-graph: examine changed-path objects in pack order Jeff King
2019-12-27 14:51     ` Derrick Stolee
2019-12-29  6:12       ` Jeff King
2019-12-29  6:28         ` Jeff King
2019-12-30 14:37         ` Derrick Stolee
2019-12-30 14:51           ` Derrick Stolee
2019-12-22  9:32   ` [PATCH 2/3] commit-graph: free large diffs, too Jeff King
2019-12-27 14:52     ` Derrick Stolee
2019-12-22  9:32   ` [PATCH 3/3] commit-graph: stop using full rev_info for diffs Jeff King
2019-12-27 14:53     ` Derrick Stolee
2019-12-26 14:21   ` [PATCH 0/9] [RFC] Changed Paths Bloom Filters Derrick Stolee
2019-12-29  6:03     ` Jeff King
2019-12-27 16:11   ` Derrick Stolee
2019-12-29  6:24     ` Jeff King
2019-12-30 16:04       ` Derrick Stolee
2019-12-30 17:02       ` Junio C Hamano
2019-12-31 16:45 ` Jakub Narebski
2020-01-13 16:54   ` Garima Singh
2020-01-20 13:48     ` Jakub Narebski
2020-01-21 16:14       ` Garima Singh
2020-02-02 18:43         ` Jakub Narebski
2020-01-21 23:40 ` Emily Shaffer
2020-01-27 18:24   ` Garima Singh
2020-02-01 23:32   ` Jakub Narebski
2020-02-05 22:56 ` [PATCH v2 00/11] " Garima Singh via GitGitGadget
2020-02-05 22:56   ` [PATCH v2 01/11] commit-graph: use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-02-09 12:39     ` Jakub Narebski
2020-02-05 22:56   ` [PATCH v2 02/11] bloom: core Bloom filter implementation for changed paths Garima Singh via GitGitGadget
2020-02-15 17:17     ` Jakub Narebski
2020-02-16 16:49     ` Jakub Narebski
2020-02-22  0:32       ` Garima Singh
2020-02-23 13:38         ` Jakub Narebski
2020-02-24 17:34           ` Garima Singh
2020-02-24 18:20             ` Jakub Narebski
2020-02-05 22:56   ` [PATCH v2 03/11] diff: halt tree-diff early after max_changes Derrick Stolee via GitGitGadget
2020-02-17  0:00     ` Jakub Narebski
2020-02-22  0:37       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 04/11] commit-graph: compute Bloom filters for changed paths Garima Singh via GitGitGadget
2020-02-17 21:56     ` Jakub Narebski
2020-02-22  0:55       ` Garima Singh
2020-02-23 17:34         ` Jakub Narebski
2020-02-05 22:56   ` [PATCH v2 05/11] commit-graph: examine changed-path objects in pack order Jeff King via GitGitGadget
2020-02-18 17:59     ` Jakub Narebski
2020-02-24 18:29       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 06/11] commit-graph: examine commits by generation number Derrick Stolee via GitGitGadget
2020-02-19  0:32     ` Jakub Narebski
2020-02-24 20:45       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 07/11] commit-graph: write Bloom filters to commit graph file Garima Singh via GitGitGadget
2020-02-19 15:13     ` Jakub Narebski
2020-02-24 21:14       ` Garima Singh
2020-02-25 11:40         ` Jakub Narebski
2020-02-25 15:58           ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 08/11] commit-graph: reuse existing Bloom filters during write Garima Singh via GitGitGadget
2020-02-20 18:48     ` Jakub Narebski
2020-02-24 21:45       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 09/11] commit-graph: add --changed-paths option to write subcommand Garima Singh via GitGitGadget
2020-02-20 20:28     ` Jakub Narebski
2020-02-24 21:51       ` Garima Singh
2020-02-25 12:10         ` Jakub Narebski
2020-02-20 22:10     ` Bryan Turner
2020-02-22  1:44       ` Garima Singh
2020-02-05 22:56   ` [PATCH v2 10/11] revision.c: use Bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-02-21 17:31     ` Jakub Narebski
2020-02-21 22:45     ` Jakub Narebski
2020-02-05 22:56   ` [PATCH v2 11/11] commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag Garima Singh via GitGitGadget
2020-02-22  0:11     ` Jakub Narebski
2020-02-07 13:52   ` [PATCH v2 00/11] Changed Paths Bloom Filters SZEDER Gábor
2020-02-07 15:09     ` Garima Singh
2020-02-07 15:36       ` Derrick Stolee
2020-02-07 16:15         ` SZEDER Gábor
2020-02-07 16:33           ` Derrick Stolee
2020-02-11 19:08       ` Garima Singh
2020-02-08 23:04   ` Jakub Narebski
2020-02-21 17:41     ` Garima Singh
2020-03-29 18:36       ` Junio C Hamano
2020-03-30  0:31   ` [PATCH v3 00/16] " Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 01/16] commit-graph: define and use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 02/16] bloom.c: add the murmur3 hash implementation Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 03/16] bloom.c: introduce core Bloom filter constructs Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 04/16] bloom.c: core Bloom filter implementation for changed paths Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 05/16] diff: halt tree-diff early after max_changes Derrick Stolee via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 06/16] commit-graph: compute Bloom filters for changed paths Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 07/16] commit-graph: examine changed-path objects in pack order Jeff King via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 08/16] commit-graph: examine commits by generation number Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 09/16] diff: skip batch object download when possible Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 10/16] commit-graph: write Bloom filters to commit graph file Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 11/16] commit-graph: reuse existing Bloom filters during write Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 12/16] commit-graph: add --changed-paths option to write subcommand Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 13/16] revision.c: use Bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 14/16] revision.c: add trace2 stats around Bloom filter usage Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 15/16] t4216: add end to end tests for git log with Bloom filters Garima Singh via GitGitGadget
2020-03-30  0:31     ` [PATCH v3 16/16] commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag Garima Singh via GitGitGadget
2020-04-06 16:59     ` [PATCH v4 00/15] Changed Paths Bloom Filters Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 01/15] commit-graph: define and use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 02/15] bloom.c: add the murmur3 hash implementation Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 03/15] bloom.c: introduce core Bloom filter constructs Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 04/15] bloom.c: core Bloom filter implementation for changed paths Garima Singh via GitGitGadget
2020-06-27 15:53         ` SZEDER Gábor
2020-04-06 16:59       ` [PATCH v4 05/15] diff: halt tree-diff early after max_changes Derrick Stolee via GitGitGadget
2020-08-04 14:47         ` SZEDER Gábor
2020-08-04 16:25           ` Derrick Stolee
2020-08-04 17:00             ` SZEDER Gábor
2020-08-04 17:31               ` Derrick Stolee
2020-08-05 17:08                 ` Derrick Stolee
2020-04-06 16:59       ` [PATCH v4 06/15] commit-graph: compute Bloom filters for changed paths Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 07/15] commit-graph: examine changed-path objects in pack order Jeff King via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 08/15] commit-graph: examine commits by generation number Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 09/15] commit-graph: write Bloom filters to commit graph file Garima Singh via GitGitGadget
2020-05-29  8:57         ` SZEDER Gábor
2020-05-29 13:35           ` Derrick Stolee [this message]
2020-05-31 17:23             ` SZEDER Gábor
2020-07-09 17:00         ` [PATCH] commit-graph: fix "Writing out commit graph" progress counter SZEDER Gábor
2020-07-09 18:01           ` Derrick Stolee
2020-07-09 18:20             ` Derrick Stolee
2020-04-06 16:59       ` [PATCH v4 10/15] commit-graph: reuse existing Bloom filters during write Garima Singh via GitGitGadget
2020-06-19 14:02         ` SZEDER Gábor
2020-06-19 19:28           ` Junio C Hamano
2020-07-27 21:33         ` SZEDER Gábor
2020-04-06 16:59       ` [PATCH v4 11/15] commit-graph: add --changed-paths option to write subcommand Garima Singh via GitGitGadget
2020-06-07 22:21         ` SZEDER Gábor
2020-04-06 16:59       ` [PATCH v4 12/15] revision.c: use Bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-06-26  6:34         ` SZEDER Gábor
2020-04-06 16:59       ` [PATCH v4 13/15] revision.c: add trace2 stats around Bloom filter usage Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 14/15] t4216: add end to end tests for git log with Bloom filters Garima Singh via GitGitGadget
2020-04-06 16:59       ` [PATCH v4 15/15] commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag Garima Singh via GitGitGadget
2020-04-08 15:51       ` [PATCH v4 00/15] Changed Paths Bloom Filters Derrick Stolee
2020-04-08 19:21         ` Junio C Hamano
2020-04-08 20:05         ` Jakub Narębski
2020-04-12 20:34         ` Taylor Blau
2020-03-05 19:49 ` [PATCH 0/9] [RFC] " Garima Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=72cff41c-bb2e-5f87-5db6-d4e9ead25a47@gmail.com \
    --to=stolee@gmail.com \
    --cc=garima.singh@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).