Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp5104704yba; Wed, 10 Apr 2019 11:25:16 -0700 (PDT) X-Google-Smtp-Source: APXvYqzyDp++NdzayOluPXcAzY4Yl7IfsPIw3KiBcymNq5SUcsag1M0rheNjcZ+Rg+/NenX+6Q8H X-Received: by 2002:a17:902:e4:: with SMTP id a91mr27095111pla.2.1554920716269; Wed, 10 Apr 2019 11:25:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554920716; cv=none; d=google.com; s=arc-20160816; b=Xi7GGZtgCUs+TXK5ocJ+DsSZbCVNi0/7bwyTdZX/pi6evUDthppUOP/qlDqYowVZWW XheZWZ9Q6q7621qZ84IKsuFLQK49Yu6UIT3jh8Rs4KxKPJidX2xMH4haE+mrzh9qSva1 Kb/uY1YZizWCC0LuJIP8K8PdUKHuGcEv6ILjGF2Fpd6X4lgNs0R7yUu/yOSPhPq0ST6s VNiY0CyhLSMl4EXNwZQJrdE3iEN5gcnhdeW4JUvZqOFb59H1cyk/dD1I/bqZSxSGYVKA K2iD1yE+IZ0iwi1+F0W8FgEjc2cgLJrRhT44AxVANsq12oho3Z0VZwzenEIu7Tt3SF8l 967Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=I8SE0hQdgATafth9AsqVC7r1ns9guNMUwC2veAj2keA=; b=rqXtu6RPl8S7X5+Xi1PKlwLAmAzGOYSjNmS3btIl6px1FK1RO3QqVSOdz6YJyqYbn1 MqSJhAtSek8L3ntYPjle/HSS2LeSxY0NHJJjvY04ebeNdzZ7l0EFXFU64E5qTLP6Ut6q JouGs22eBwUtiirK+tbvwrgIF5n67uuDw6bx3rpuG2hxncsxFL8/jujq7B8n+RuVbN/+ mPsGFg+Dkf1+6UoqfeHSl+42KUmCUtR02wsMWfbXJ09cV2i9sELGArlTDj6pkMCnMHte l5w8cQremHSZ02rHjR/IDPUTw+0lVgwaoNU0fR97IT6vR1RgKBrrLrS+3DcjAr5fVZ23 DfLA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=kJtgUynG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g37si11157922plb.180.2019.04.10.11.24.59; Wed, 10 Apr 2019 11:25:16 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=kJtgUynG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387464AbfDJPvF (ORCPT + 99 others); Wed, 10 Apr 2019 11:51:05 -0400 Received: from mail-qt1-f196.google.com ([209.85.160.196]:41633 "EHLO mail-qt1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731183AbfDJPvE (ORCPT ); Wed, 10 Apr 2019 11:51:04 -0400 Received: by mail-qt1-f196.google.com with SMTP id w30so3409878qta.8 for ; Wed, 10 Apr 2019 08:51:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=I8SE0hQdgATafth9AsqVC7r1ns9guNMUwC2veAj2keA=; b=kJtgUynGAHE1JRUInC+YKxM3RWQ9GSYDP6Y3rVu6lzNNJpj74oZMigmJktVSwyHNjB nMn14W6udtj6pvGHZ52Ry+xWMSpuNWtB0/oxozJfRiX4m6fAkwZgIEhQuWZLtQfJPCgG 4PE/JZr9FenVRSb3+B7dUX3YhsLlKUiP5fFAVrVYdhjq9LR38MN3InHEeXbxF2orffUn 53zoCs2ELvlKxspkOwWV8etCU1BuIdVoo7790dV6Ey74nNFvgk5stpKRsm4aEDZ/bprF TQFam94sMoiN78g739ntjrH0zuBQqDtpoXJXV8IN3vVT19fwP7tORCAQ9fsqWwejJEOr gDXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=I8SE0hQdgATafth9AsqVC7r1ns9guNMUwC2veAj2keA=; b=RXZeEx4jFbO/6qC2Z+ZUSwUGJ8XpqPMPxFcdn6x9REpoS/jSey1XApjDpXiapiR/nC IeR66zU7SXOARarzoWKDkdpDSZtgrHAZjEwkyIRH/ZwEqn5/4pEs2DzGAx2lG9N4xldp 7INLzEoAGmT1wkfPiB4MD3gOB3tYXvh0gV5gB+cH+xQBbmxf0HykkOZQd0C7AjpELsbS +iHF0vfYNCvJF6F1lhO7pQYpkYWDyVbrp/LJGCYYjmqvJfESy//Hd1Lo2SLcp/CJBjFB Px3ibhqnBN9lY8DjZk5PLAJRMAzR2b7wKrSrz0c5lzWFaCSnDcsTcjaUIqt08w6H2TzS u88Q== X-Gm-Message-State: APjAAAX9aZXT9AIS/afRmMknFnjBmOGIlE5GtUwhK67xOH9m+Tx2lNJ7 6Umx2BtZg83MaW419A7tcy/obamUdBLNlMYVG4p8SAz/zhU= X-Received: by 2002:a0c:b018:: with SMTP id k24mr35721367qvc.218.1554911462775; Wed, 10 Apr 2019 08:51:02 -0700 (PDT) MIME-Version: 1.0 References: <20190320163116.39275-1-joel@joelfernandes.org> <20190408203601.GF133872@google.com> In-Reply-To: From: Joel Fernandes Date: Wed, 10 Apr 2019 11:50:51 -0400 Message-ID: Subject: Re: [PATCH v5 1/3] Provide in-kernel headers to make extending kernel easier To: Olof Johansson Cc: Joel Fernandes , Linux Kernel Mailing List , Qais Yousef , Dietmar Eggemann , Manoj Rao , Andrew Morton , Alexei Starovoitov , atish patra , Daniel Colascione , Dan Williams , Greg Kroah-Hartman , Guenter Roeck , Jonathan Corbet , Karim Yaghmour , Kees Cook , Android Kernel Team , "open list:DOCUMENTATION" , "open list:KERNEL SELFTEST FRAMEWORK" , linux-trace-devel@vger.kernel.org, Masahiro Yamada , Masami Hiramatsu , Randy Dunlap , Steven Rostedt , Shuah Khan , Yonghong Song Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 10, 2019 at 11:07 AM Olof Johansson wrote: [snip] > > > Wouldn't it be more convenient to provide it in a standardized format > > > such that you won't have to take an additional step, and always have > > > This is that form IMO. > > > > The location of the archive is fixed/known. If you are talking of the > > location where the user decompresses it to, then they a;ready know where they > > are decompressing to. > > The location _of_ the archive, sure. But the format of what is in the > tarball, how it is versioned, and how to manage it will have to be > done by every user. > > For any script that doesn't depend on some shared system state that > wants to, say, build a eBPF program and load it, it would need to > extract the tarball from scratch to make sure it is the current > correct version of it. > > If that's required by all users, why not just present the data in a > way that it can be used directly? That is the part that is unclear from your proposal. If we present a filesystem view, then I am assuming the data will have to be decompressed first into memory. That means you are proposing use of 30MB uncompressed memory. The whole archive has to be decompressed but the whole archive if compressed with XZ for a maximum compression ratio. > > > Having to copy and extract the tarball is the most awkward step, IMHO. > > > I also find the waste of kernel memory for it to be an issue, but > > > given that it can be built as a module I guess that's the obvious > > > solution for those who care about memory consumption. > > > > Yes. We discussed in previous threads that for users who really want the > > archive to be completely uncompressed and in-memory, can just load the > > module, decompress into tmpfs, and unload the module. That is an extra step, > > yes. > > Most users will need to decompress it every time they use it anyway, > especially if there's no versioned prefix in the tarball that they can > use to key to a previously decompressed version with the exact same > kernel version and config. > > So, if you need to do that anyway, wouldn't it be easier if you just > mounted a FS to get to it. If you're on a system where you can't use > it in-place for resource reasons, you can copy it off and unmount it. > No extra tools needed in userspace then at run/use time. > > Said filesystem could be populated by a compressed cpio archive since > we already have code in the kernel to do this for initramfs, and could > do so at mount time -- and at unmount time it'd be freed up. But still, decompressing to the filesystem in a scratch area may be better than decompressing to RAM, for some users who have lesser RAM. This patchset does not enforce a certain way of doing things and leaves it to the user. > If you absolutely need to export a file to userspace with the archive, > my suggestion is to do it through debugfs. That way the format isn't > in a /proc ABI that can't be changed in the future (debugfs isn't > required to be stable in the same way). This way we can change the > format carried in the kernel over time without changing the official > way we present the data to userspace (via a filesystem view). > > As far as format goes; there's clear precedent on cpio being used and > supported; we already have build time requirements on the userspace > tools with some options. Using tar would actually be a new dependency > even if it is a common tool to have installed. With a self-populating > FS, there's no new tool requirements on the runtime side either. debugfs is going away for Android and is controversial in the fact that its functionality isn't guaranteed to be there (debugfs breakages aren't necessarily bugs AFAIK). So this isn't an option. > > We had close to 2-3 months of discussions now with various folks up until v5. > > I am about to post v6 which is in line with Masahiro Yamada's expecations. In > > that I will be dropping module building artifacts due to his module building > > concerns and only include the headers. > > I've found some of the old discussion and read up on it. I think it > was pretty quick at dismissing ideas for more robust implementations > ("it needs squashfs-tools"), and had some narrow viewpoints (exporting > a tarball is the least amount of kernel change, while adding > complexity at the system/usage side). Honestly, that's kind of unfair to be quoting just a few points like that. If I remember there were 100s of emails and many good view points were brought up by many people. We have done the diligence in the discussions of this over a period of time. > I'd also like to clarify: I'm not opposed to the general idea of > providing the needed headers with the kernel somehow. I just think > it's worth spending effort making sure an interface for it that we'll > need to live with forever is appropriately thought through and not > rushed in, especially since we're likely to get substantial > infrastructure on top of it quickly (eBPF and friends in particular). We have spent the time :) This seems like the best solution of all. Greg KH and other maintainers are also supportive of it as can be seen in other threads. We can consider an alternate proposal if it is better, but I don't see any better one proposed at the moment. - squashfs-tools requirement on the build really sucks - cpio uncompressed to memory equally sucks because it consumes all the memory uncompressed instead of reclaimable pages - decompressing into tmpfs will suck for Android because we don't use disk-based swap and we run into the same cpio issue above. We use ZRAM for compressed swap. - debugfs is a non-option for Android The tar+xz is trivially created without depending on squashfs-tools. And xz provides the maximum compression ratio in our experiments. Decompression time is a non-issue since trace tools are using it. The filesystem view sounds using mount/unmount like a pony to me, but it does not meet the requirements above. Let me know if I am missing something. thanks, - Joel