user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* Has anyone tried importing lkml?
@ 2018-01-15 17:41 Konstantin Ryabitsev
  2018-01-15 17:55 ` Eric Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Konstantin Ryabitsev @ 2018-01-15 17:41 UTC (permalink / raw)
  To: meta


[-- Attachment #1.1: Type: text/plain, Size: 772 bytes --]

Hello, all:

Every time LKML.org goes down, there's discussion about kernel.org
hosting our own public archive of LKML. There are good reasons why this
hasn't been done before, but I won't bore you with them.

The question I do have is whether public-inbox is the right tool for
doing something like this. LKML message count is somewhere in the
millions, and I'm curious what that would look like when imported into a
git tree used by public-inbox. For comparison, the Linux kernel itself
is only about 700,000 commits, so a git repo of all LKML archives would
easily dwarf that.

Has anyone tried doing this at all, or should we blaze that trail on our
own?

Best,
-- 
Konstantin Ryabitsev
Director, IT Infrastructure Security
The Linux Foundation


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 878 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Has anyone tried importing lkml?
  2018-01-15 17:41 Has anyone tried importing lkml? Konstantin Ryabitsev
@ 2018-01-15 17:55 ` Eric Wong
  2018-01-15 18:04   ` Konstantin Ryabitsev
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Wong @ 2018-01-15 17:55 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> The question I do have is whether public-inbox is the right tool for
> doing something like this. LKML message count is somewhere in the
> millions, and I'm curious what that would look like when imported into a
> git tree used by public-inbox. For comparison, the Linux kernel itself
> is only about 700,000 commits, so a git repo of all LKML archives would
> easily dwarf that.

It's probably too slow for object walking (which impacts
clone-ability) at the moment due to tree object churn.  The
current 2/38 tree structure (which mirrors git loose objects)
turned out to be a not-so-great idea; maybe 2/2/36 would be

I would love to help improve public-inbox for LKML needs while
preserving compatibility.

> Has anyone tried doing this at all, or should we blaze that trail on our
> own?

I prefer we improve public-inbox together so it can benefit
other projects, too.  I know there's been interest in having
Debian archives using it, too.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Has anyone tried importing lkml?
  2018-01-15 17:55 ` Eric Wong
@ 2018-01-15 18:04   ` Konstantin Ryabitsev
  2018-01-15 18:27     ` Eric Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Konstantin Ryabitsev @ 2018-01-15 18:04 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On 2018-01-15 12:55 PM, Eric Wong wrote:
> Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
>> The question I do have is whether public-inbox is the right tool for
>> doing something like this. LKML message count is somewhere in the
>> millions, and I'm curious what that would look like when imported into a
>> git tree used by public-inbox. For comparison, the Linux kernel itself
>> is only about 700,000 commits, so a git repo of all LKML archives would
>> easily dwarf that.
> 
> It's probably too slow for object walking (which impacts
> clone-ability) at the moment due to tree object churn.  The
> current 2/38 tree structure (which mirrors git loose objects)
> turned out to be a not-so-great idea; maybe 2/2/36 would be

Does frequent repacking with --write-bitmap-index help at all? The
bitmaps usually dramatically improve cloning speeds for us.

Best,
-- 
Konstantin Ryabitsev
Director, IT Infrastructure Security
The Linux Foundation

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Has anyone tried importing lkml?
  2018-01-15 18:04   ` Konstantin Ryabitsev
@ 2018-01-15 18:27     ` Eric Wong
  2018-01-15 20:09       ` Konstantin Ryabitsev
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Wong @ 2018-01-15 18:27 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> Does frequent repacking with --write-bitmap-index help at all? The
> bitmaps usually dramatically improve cloning speeds for us.

Yes, but repacking itself is not easy.  Peff did some analysis
a while ago:

https://public-inbox.org/git/20160805092805.w3nwv2l6jkbuwlzf@sigill.intra.peff.net/

Storing object IDs in Xapian helps a lot avoid git graph/tree
lookup overheads for generating Atom feeds and such.

Also, does kernel.org have the complete archives anywhere?
I only have the last ~4 years or so.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Has anyone tried importing lkml?
  2018-01-15 18:27     ` Eric Wong
@ 2018-01-15 20:09       ` Konstantin Ryabitsev
  2018-01-15 20:42         ` Bram Adams
  0 siblings, 1 reply; 8+ messages in thread
From: Konstantin Ryabitsev @ 2018-01-15 20:09 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, bram.adams

On 2018-01-15 01:27 PM, Eric Wong wrote:
> Also, does kernel.org have the complete archives anywhere?
> I only have the last ~4 years or so.

Let me put you in touch with someone who does.

Bram, we're looking at various available options for hosting publicly
accessible archives of LKML, together with a searchable/threadable web
frontend. I know this is directly related to the work you've been doing
with cregit, and I wonder if you would be able to provide Eric with full
LKML archives for some initial testing. It's probably something cregit
can benefit from, too, should we start providing this service.

You can see the beginning of the discussion thread here:
https://public-inbox.org/meta/d5546b24-5840-4ae9-d25b-5e3e737ed73b@linuxfoundation.org/T/#u

I would be happy to act as an intermediary if you don't have a good
place to host them where they are publicly accessible.

Best,
-- 
Konstantin Ryabitsev
Director, IT Infrastructure Security
The Linux Foundation

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Has anyone tried importing lkml?
  2018-01-15 20:09       ` Konstantin Ryabitsev
@ 2018-01-15 20:42         ` Bram Adams
  2018-01-15 20:54           ` Konstantin Ryabitsev
  0 siblings, 1 reply; 8+ messages in thread
From: Bram Adams @ 2018-01-15 20:42 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Eric Wong, meta

Hi Konstantin,

> Bram, we're looking at various available options for hosting publicly
> accessible archives of LKML, together with a searchable/threadable web
> frontend. I know this is directly related to the work you've been doing
> with cregit, and I wonder if you would be able to provide Eric with full
> LKML archives for some initial testing. It's probably something cregit
> can benefit from, too, should we start providing this service.

Yes, we have a copy of the full history until August 2016 of LKML and 130 other mailing lists in mbox form. They are on the cregit server you provided us.

We obtained these mbox files from Richard Ellis (ellis@spinics.net), who maintains the spinics.net email archive and graciously allowed us to download the files.

From cregit’s point of view, we are interested in any form of email archive that:
 * is up-to-date
 * parses the emails into metadata, patches, patch series, …
 * is quick to query

Kind regards,

Bram Adams
MCIS, Polytechnique Montreal

http://mcis.polymtl.ca/




> 
> You can see the beginning of the discussion thread here:
> https://public-inbox.org/meta/d5546b24-5840-4ae9-d25b-5e3e737ed73b@linuxfoundation.org/T/#u
> 
> I would be happy to act as an intermediary if you don't have a good
> place to host them where they are publicly accessible.
> 
> Best,
> -- 
> Konstantin Ryabitsev
> Director, IT Infrastructure Security
> The Linux Foundation
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Has anyone tried importing lkml?
  2018-01-15 20:42         ` Bram Adams
@ 2018-01-15 20:54           ` Konstantin Ryabitsev
  2018-01-15 21:03             ` Bram Adams
  0 siblings, 1 reply; 8+ messages in thread
From: Konstantin Ryabitsev @ 2018-01-15 20:54 UTC (permalink / raw)
  To: Bram Adams; +Cc: Eric Wong, meta

On 2018-01-15 03:42 PM, Bram Adams wrote:
> Hi Konstantin,
> 
>> Bram, we're looking at various available options for hosting publicly
>> accessible archives of LKML, together with a searchable/threadable web
>> frontend. I know this is directly related to the work you've been doing
>> with cregit, and I wonder if you would be able to provide Eric with full
>> LKML archives for some initial testing. It's probably something cregit
>> can benefit from, too, should we start providing this service.
> 
> Yes, we have a copy of the full history until August 2016 of LKML and 130 other mailing lists in mbox form. They are on the cregit server you provided us.

Found them. Is October 2000 the beginning of LKML, or just the beginning
of the archive? I'm actually unsure of the history of the list. :)

> From cregit’s point of view, we are interested in any form of email archive that:
>  * is up-to-date
>  * parses the emails into metadata, patches, patch series, …
>  * is quick to query

Looks like we're getting unblocked for these resources, so we may be
able to set up that LKML-specific patchwork instance for you -- but
that's a separate thread that should probably stay off this list.

Best,
-- 
Konstantin Ryabitsev
Director, IT Infrastructure Security
The Linux Foundation

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Has anyone tried importing lkml?
  2018-01-15 20:54           ` Konstantin Ryabitsev
@ 2018-01-15 21:03             ` Bram Adams
  0 siblings, 0 replies; 8+ messages in thread
From: Bram Adams @ 2018-01-15 21:03 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Eric Wong, meta

Hi,

>>> Bram, we're looking at various available options for hosting publicly
>>> accessible archives of LKML, together with a searchable/threadable web
>>> frontend. I know this is directly related to the work you've been doing
>>> with cregit, and I wonder if you would be able to provide Eric with full
>>> LKML archives for some initial testing. It's probably something cregit
>>> can benefit from, too, should we start providing this service.
>> 
>> Yes, we have a copy of the full history until August 2016 of LKML and 130 other mailing lists in mbox form. They are on the cregit server you provided us.
> 
> Found them. Is October 2000 the beginning of LKML, or just the beginning
> of the archive? I'm actually unsure of the history of the list. :)

It’s the second option, i.e., beginning of the archive. LKML goes back to 1996, see https://lkml.org/lkml (in fact 1910, but that’s probably not correct :-)).

> 
>> From cregit’s point of view, we are interested in any form of email archive that:
>> * is up-to-date
>> * parses the emails into metadata, patches, patch series, …
>> * is quick to query
> 
> Looks like we're getting unblocked for these resources, so we may be
> able to set up that LKML-specific patchwork instance for you -- but
> that's a separate thread that should probably stay off this list.

Cool, thanks!

Kind regards,

Bram Adams

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-01-15 21:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-15 17:41 Has anyone tried importing lkml? Konstantin Ryabitsev
2018-01-15 17:55 ` Eric Wong
2018-01-15 18:04   ` Konstantin Ryabitsev
2018-01-15 18:27     ` Eric Wong
2018-01-15 20:09       ` Konstantin Ryabitsev
2018-01-15 20:42         ` Bram Adams
2018-01-15 20:54           ` Konstantin Ryabitsev
2018-01-15 21:03             ` Bram Adams

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).