From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS6315 166.70.0.0/16 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_LOW,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 17D251F597; Fri, 20 Jul 2018 23:56:25 +0000 (UTC) Received: from in01.mta.xmission.com ([166.70.13.51]) by out03.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1fgfG3-0000ZN-H1; Fri, 20 Jul 2018 17:56:23 -0600 Received: from [97.119.167.31] (helo=x220.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1fgfG2-0004x1-79; Fri, 20 Jul 2018 17:56:23 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Eric Wong Cc: meta@public-inbox.org References: <87in5bdkbv.fsf@xmission.com> <20180719211216.GA1984@dcvr> <87601adfo7.fsf@xmission.com> <20180720061106.4f2u2zpdxnsilrxt@dcvr> <8736weaxsa.fsf@xmission.com> Date: Fri, 20 Jul 2018 18:56:13 -0500 In-Reply-To: <8736weaxsa.fsf@xmission.com> (Eric W. Biederman's message of "Fri, 20 Jul 2018 07:37:09 -0500") Message-ID: <87lga5797m.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1fgfG2-0004x1-79;;;mid=<87lga5797m.fsf@xmission.com>;;;hst=in01.mta.xmission.com;;;ip=97.119.167.31;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/JJU+Ba6tqzY735bbojZO3QgEDJxE0ERU= X-SA-Exim-Connect-IP: 97.119.167.31 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: Searching via git grep? X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) List-Id: ebiederm@xmission.com (Eric W. Biederman) writes: > Eric Wong writes: > >> "Eric W. Biederman" wrote: >>> My current goal is to make it pleasant to read linux-kernel and possibly >>> other large archives on my personal machine. Right now the git >>> trees for linux-kernel are aboug 6.8G. Small enough to fit in RAM. >>> >>> The Xapian indexes are about 63G. Not small enough to fit in ram. >>> They are also not fast to update when I pull in a new batch of messages >>> from linux-kernel. >> >> Interesting, how long does it take to do an incremental index >> medium/full for you? Setting XAPIAN_FLUSH_THRESHOLD after my >> patch yesterday should help noticeably, especially if you're on >> HDD. > > For a small sample less than a days worth of lkml messages > I get: > > $ git --git-dir git/6.git/ fetch > Enter passphrase for key '/home/eric/.ssh/id_rsa': > Fetching origin >> From https://git.kernel.org/pub/scm/public-inbox/vger.kernel.org/lkml/6 > 35280da650..0a97acb7e7 master -> master > remote: Counting objects: 1791, done. > remote: Compressing objects: 100% (1085/1085), done. > remote: Total 1791 (delta 109), reused 1791 (delta 109) > Receiving objects: 100% (1791/1791), 1.94 MiB | 1.98 MiB/s, done. > Resolving deltas: 100% (109/109), done. >> From git:/public-inbox/vger.kernel.org/linux-kernel/6 > 35280da65057..0a97acb7e709 master -> master > > $ time public-inbox-index > real 2m1.482s > user 0m26.084s > sys 0m20.792s > > I am not on a HDD. I will play with XAPIAN_FLUSH_THRESHOLD next time > and see if things get better. Initially building the Xapian index was > extremely painful, with swapping and took over a day. > > Subjectively searcing all of 6.git feels faster than those 2 minutes. > If for no other reason than I get some of the results back immediately. XAPIAN_FLUSH_THRESHOLD seems to help. $ git --git-dir git/6.git/ fetch Enter passphrase for key '/home/eric/.ssh/id_rsa': Fetching origin >From https://git.kernel.org/pub/scm/public-inbox/vger.kernel.org/lkml/6 0a97acb7e7..61d959b624 master -> master remote: Counting objects: 2562, done. remote: Compressing objects: 100% (1384/1384), done. remote: Total 2562 (delta 324), reused 2562 (delta 324) Receiving objects: 100% (2562/2562), 2.45 MiB | 4.03 MiB/s, done. Resolving deltas: 100% (324/324), done. >From git:/public-inbox/vger.kernel.org/linux-kernel/6 0a97acb7e709..61d959b62473 master -> master $ (export XAPIAN_FLUSH_THRESHOLD=4000000000; time public-inbox-index ) Use of uninitialized value in lc at /usr/share/perl5/Email/Simple/Header.pm line 181, <$in_r> line 121. Use of uninitialized value in lc at /usr/share/perl5/Email/Simple/Header.pm line 181, <$in_r> line 121. Use of uninitialized value in lc at /usr/share/perl5/Email/Simple/Header.pm line 181, <$in_r> line 121. Use of uninitialized value in lc at /usr/share/perl5/Email/Simple/Header.pm line 181, <$in_r> line 121. Use of uninitialized value in lc at /usr/share/perl5/Email/Simple/Header.pm line 181, <$r> line 41. Use of uninitialized value in lc at /usr/share/perl5/Email/Simple/Header.pm line 181, <$r> line 41. Use of uninitialized value in lc at /usr/share/perl5/Email/Simple/Header.pm line 181, <$r> line 41. Use of uninitialized value in lc at /usr/share/perl5/Email/Simple/Header.pm line 181, <$r> line 41. real 0m58.239s user 0m15.820s sys 0m11.088s It looks like it cut a minute off running with a slighlty larger pool of objects. Eric