Received: by 2002:ac0:aed5:0:0:0:0:0 with SMTP id t21csp3681836imb; Tue, 5 Mar 2019 16:24:58 -0800 (PST) X-Google-Smtp-Source: APXvYqwzZ/bRpVUKrFjcGFkkqFCSg4g5AOr+Ckol73Grndw5B0rbT9Jz2NtRvn+ocX3hJqat0pOt X-Received: by 2002:a17:902:6a83:: with SMTP id n3mr3971898plk.313.1551831898039; Tue, 05 Mar 2019 16:24:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551831898; cv=none; d=google.com; s=arc-20160816; b=0pwGhoTcJTc1ZwwfYAw5oJ4kUjee1xy819wtwB7D9FfiQ5gS1xoMky64h7H8oxrPrI x1pNU03YXXWfV8kHSOcQCXMuwlZUidaDiuSAGKrpLyRR+wJZQEI4kga44Lrmk1PDrfVG RoKHTJSzpDxtq/tx3hEgayXjlyRZTdT3ZmC+N/hbN77sWR0PfFDFa36hOdDMkjiaFLWx ogkuUhxhlQ9iEYpWY43INc/RjDXjLdIHIP8u9P60ISFbzzxoOj51457P/tLyY/+KCD4W 0Kk/r57P2HdgQk55p/0+EklXEljIUq11eOLmsNXeX5oCV0k6Kf5snvYQLcIFiqb0W1eY ZgTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=7iJoE0/Yb+6Rez9vJ9ghZ5yc69ISIRxwiUyE0a7aiYg=; b=HfYEBrbwF9wykqsoMYuZMkHwquXp1Zxep5ZoYKUXJ87lfWzyMEyjHCTS/wXT1XsvEE lAibXJv6b4rsNTltgq7O5p6xDFgrpeHWGklcHYoN1KHXjdwYmloFwIFAq8V0JkNSCEhl 7mexpp8SCHeRz2+e2EwHwG4HalcAEqgiXL3+MI/ypbcriDz14iL3xvTXtZVGKTUOL3GH 8QjQd3CGKXNG6g8gA+DSNvBREbOCXfS5CTaMVu0rY2OzRABTEcyBhMm5dsU/L1SWWX1R EzPleT4LLFhdJy8A9ms2rMy2qJd5KWyx7Sj+9LpqF3UgCQtxvIH70cvdQeK5VlMw41gv UlFQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c2si208071pfd.113.2019.03.05.16.24.16; Tue, 05 Mar 2019 16:24:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727089AbfCEX0B (ORCPT + 99 others); Tue, 5 Mar 2019 18:26:01 -0500 Received: from dcvr.yhbt.net ([64.71.152.64]:47144 "EHLO dcvr.yhbt.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726069AbfCEX0A (ORCPT ); Tue, 5 Mar 2019 18:26:00 -0500 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 5D6E120248; Tue, 5 Mar 2019 23:26:00 +0000 (UTC) Date: Tue, 5 Mar 2019 23:26:00 +0000 From: Eric Wong To: Bjorn Helgaas Cc: Joey Pabalinas , linux-kernel@vger.kernel.org, kernelnewbies@kernelnewbies.org, Linus Torvalds , Greg Kroah-Hartman , Konstantin Ryabitsev , Eric Biederman , Jasper Spaans Subject: Re: [RFC] LKML Archive in Maildir Format Message-ID: <20190305232600.GA12110@dcvr> References: <20181216190639.6safwjqwdphkce67@gmail.com> <20181216194649.GA7732@pure.paranoia.local> <20181216195343.idnt2y5y5wjky5gu@gmail.com> <20190104013522.stng6gwauwnr6wbi@starla> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Bjorn Helgaas wrote: > OK, so I understand how to clone archives from lore.kernel.org and how > to convert a git archive to a maildir (thanks, Konstantin!) > > What I *don't* understand is how to effectively read this locally. > Ideally I'd like to run mutt, possibly with notmuch for indexing. But > a maildir with 3M files seems impractical. I did actually try it > (without notmuch), but it takes mutt about 5 minutes to start up. And > the maildir is about 23G, compared with 7.5G for the git archive. Right, relying on Maildir for long-term storage of giant archives is not a usable solution with any general purpose FSes I know about. git itself had the same problem with loose object scalability in the old days and packs were invented as a result. > Any pointers? I guess there's no mutt backend that can read a > public-inbox archive directly? There's mutt patches to support reading over NNTP, so that works: mutt -f news://$INBOX_HOST/$INBOX_NEWSGROUP I don't think mutt handles mboxrd 100% correctly, but it's close enough that you can can download the gzipped mboxrd of a search query and open it via "mutt -f /path/to/downloaded/mbox.gz" curl -XPOST -OJ "$INBOX_URL/?q=$SEARCH_QUERY&x=m" POST is required(*), and -OJ lets it use the Content-Disposition: header for a meaningful server-generated name, but you can also redirect the result to whatever you want. For all messages since March 1, you could use: SEARCH_QUERY=d:20190301.. All the supported search queries are documented in $INBOX_URL/_/text/help/ and the search prefixes (e.g. "d:", "s:", "b:") are modeled after what's in mairix. You'll need to escape the queries for URIs (e.g. " " => "+", and so on). Xapian requires date ranges to be denoted with ".." whereas mairix uses "-" for ranges. The main thing public-inbox search misses from mairix is support for "-t" which grabs non-matching messages from the same thread. I would like to support that someday, but don't have enough time (or funding) to make it happen at the moment. (*) to reliably avoid wasting resources from spiders/prefetchers