Received: by 10.223.185.82 with SMTP id b18csp18189wrg; Thu, 8 Mar 2018 18:14:15 -0800 (PST) X-Google-Smtp-Source: AG47ELsgqX+y7mLliANa9dFlsq0i5Jrz5iA+TFcA51yXGso6M0DJ1GF0Sjw3S0Efjdlat61olBda X-Received: by 2002:a17:902:6b04:: with SMTP id o4-v6mr17721231plk.201.1520561655695; Thu, 08 Mar 2018 18:14:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520561655; cv=none; d=google.com; s=arc-20160816; b=J+PERmIcH8qIdnT2W4LPrdB+Z2fy0bQUxlwSwr0CT4Iczk6nqJBlLQ7pwK0CPNZudd 9ZipfHFJSSRWi5zNww4QcbWKMXdSRUL14xr0ew3IkP8ZGSzYGckvZ/Rb1Q/hILeJQ2wz Qe6uEZ3jP0IrZfPx2uAUt8+w0+JNSxrzGai3HhboYjfzdl5xj/3KialxWNbEzii7BQla 0psRT8pn7blIUdqHR+RuCo77GJCpZdFUpVxFSE8HrppQITdbDwva9pzlH46yrfUpwySr JyKeIuwhtg96pVBpuE9XZCXUAyD/uvX25nt4ZPQebhB8ljlDMJ48HTdiE+O+RhkMfPEh mRsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dmarc-filter :arc-authentication-results; bh=Bx7r85FHuC7zscQvPxVuPryHimuRGD1eQfxSi7iO474=; b=iMgBueuKC77TTNLXdyhfD09Jiiw28J/pDVU8ztSRKK5jxoF1p+NcPPrB0qxZh6ogzs ny2m2eQtBXd0HmOxcxZ8zZBncSSYENdy+CYo8o8fYxo08bAPqLEHFMGVkFUQn6d3R10z znFtreOwKPM2JS+Kac9ve1OdgG7Q8XgV2tzrLutPg71XRgkMzv+/5vEzaPhZPU19lSAJ XOiUwGO5b/Nq6nMrQVLSAOUXrIRLKb6PPV2l+qdnpW5tchuQGkRPRxqoiEy0ppYtZIMW aDkZb3v2D0Dn6qPbHXRYhy6t3fYaWMGUX060Rld1XSwSJFpevb0knsP/KZXztpNge3Tg oLDQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w20-v6si15141126plp.638.2018.03.08.18.14.01; Thu, 08 Mar 2018 18:14:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751086AbeCICMt (ORCPT + 99 others); Thu, 8 Mar 2018 21:12:49 -0500 Received: from mail.kernel.org ([198.145.29.99]:38456 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750913AbeCICMr (ORCPT ); Thu, 8 Mar 2018 21:12:47 -0500 Received: from mail-io0-f182.google.com (mail-io0-f182.google.com [209.85.223.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 72DE8217A0 for ; Fri, 9 Mar 2018 02:12:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 72DE8217A0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org Received: by mail-io0-f182.google.com with SMTP id h23so1898652iob.11 for ; Thu, 08 Mar 2018 18:12:46 -0800 (PST) X-Gm-Message-State: AElRT7HLL1QbURxT0+V9YBRFLZju2w139XHVK070HbhQejtAMVpuAk9X FgEAxJFnTbiKm/i8mwaRvkUSdubsCLBQwIeXPspVSA== X-Received: by 10.107.151.209 with SMTP id z200mr25844742iod.150.1520561565708; Thu, 08 Mar 2018 18:12:45 -0800 (PST) MIME-Version: 1.0 Received: by 10.2.61.205 with HTTP; Thu, 8 Mar 2018 18:12:24 -0800 (PST) In-Reply-To: <20180309012046.6kcivmzzkap3a4xc@ast-mbp> References: <20180306013457.1955486-1-ast@kernel.org> <20180309012046.6kcivmzzkap3a4xc@ast-mbp> From: Andy Lutomirski Date: Fri, 9 Mar 2018 02:12:24 +0000 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH net-next] modules: allow modprobe load regular elf binaries To: Alexei Starovoitov Cc: Kees Cook , Alexei Starovoitov , Djalal Harouni , Al Viro , "David S. Miller" , Daniel Borkmann , Linus Torvalds , Greg KH , "Luis R. Rodriguez" , Network Development , LKML , kernel-team@fb.com, Linux API Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 9, 2018 at 1:20 AM, Alexei Starovoitov wrote: > On Fri, Mar 09, 2018 at 12:59:36AM +0000, Andy Lutomirski wrote: >> >> Alexei, can you give an example use case? I'm sure it's upthread >> somewhere, but I'm having trouble finding it. > > at the time of iptable's setsockopt() the kernel will do > err = request_module("bpfilter"); > once. > The rough POC code: > https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/tree/net/ipv4/bpfilter/sockopt.c?h=ipt_bpf#n25 Here's what I gather from reading that code: you have a new kernel feature (consisting of actual kernel code) that wants to defer some of its implementation to user mode. I like this idea a lot. But I have a suggestion for a slightly different way of accomplishing the same thing. Rather than extending init_module() to accept ELF input, except the call_umh code to be able to call blobs. You'd use it it very roughly like this: First, compile your user code and emit a staitc binary. Use objdump fiddling or a trivial .S file to make that static binary into a variable. Then write a tiny shim module like this: extern unsigned char __begin_user_code[], __end_user_code[]; int __init init_shim_module(void) { return call_umh_blob(__begin_user_code, __end_user_code - __begin_user_code); } By itself, this is clearly a worse solution than yours, but it has two benefits, one small and two big. The small benefit is that it is completely invisible to userspace: the .ko file is a bona fide module. The big benefits are: 1. It works even in a non-modular kernel! (Okay, it probably only works if you can arrange for the built-in module to be initialized late enough, but that's straightforward.) 2. It allows future extensions to change the way the glue works. For example, maybe you want the module to integrate properly with lsmod, etc. Rather than adding a mechanism for general privileged programs to register themselves with lsmod (ick!), you could do it entirely in the kernel where lsmod would know that a particular umh task is special. More usefully, you could extend call_umh_blob() to pass in some pre-initialized struct files, which would give a clean way to *synchronously* create a communication channel to user code for whatever service the user code provides. And it would be more straightforward to make the umh blob do what it needs to do without relying on any particular filesystems being mounted. I think we don't want to end up in a situation where we ship a program with a .ko extension that opens something in /dev, for example. call_umh_blob() would create an anon_inode or similar object backed by the blob and exec it.