Received: by 10.223.185.116 with SMTP id b49csp4564176wrg; Mon, 26 Feb 2018 21:34:22 -0800 (PST) X-Google-Smtp-Source: AH8x22446wNOUt/Rx9iqTBGRBMKn23pPfgOt6Yp5y0DQvfEpb83xVwHSzZGSnTYiKfr3ijXMDG5y X-Received: by 2002:a17:902:ab89:: with SMTP id f9-v6mr12832818plr.369.1519709662379; Mon, 26 Feb 2018 21:34:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519709662; cv=none; d=google.com; s=arc-20160816; b=sTq24rI89gZFksOAfurV+uLv/ZBMioz6x3PZnXvZyNPxBvwJGW/Mpm8ilFoabhKPb2 A4YZ26+XrEwGSS8buAI0DJVCxFyOnEaoEn4gx4ZHuygWeZSbDwEcRLRQsm9EWMZBTA2i p/W29i+5Mq4QhgSctvpIpqVlhh3JhfIPrJT5xn1LAwIa4sJmo44yFp/d6iJsxLdfxIHn M2Vs6F3OafSDVhbsKOHKfcyz+EHp582Rjdooc0QQUu6WnYbBa3cULoulGRPNEekew935 /ln4SVJVK2ufi7y+lrjIQd8Nog1J21As3j4ViMv1bCc1xgNupOZoOThj74fI9XZwuc7g ox4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=NKHqbp5cI95uSYRnQY4csOAi6MEFHuabn/IeMhzi7HY=; b=fCOI0hDlETJlwY/cVk6QhRxfIALNuKYbl1avdJWyYd8JCuiO1hP00QCim3VLNvqIfA cttfYO7t3K634upjPCBRxEqIxZFA7xMqZTDeFAyTMheGESEojbCoXW18GSSlCDGxHTB2 WX5xy6swrf0QVMszAQogPd/xz/VtssND3G120gbcIsWCY8E8hrfFEPmiPSiDGdoqZf+M 0WbEiAyNLamlNr18BTxLTYA77+g+qGcHUjk9MXs2hNcBmeOq+aRN1qvNYrGQJ3SZOkZN /defSgOhSX6F7SWpmNxMVCFXKe54s5p5OU+0MKpKza1GNC2eZVU8ZKi/zeNCJyQ5ezYA kmDA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Zy1dIlbG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p11si1016118pgd.392.2018.02.26.21.34.07; Mon, 26 Feb 2018 21:34:22 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Zy1dIlbG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751867AbeB0FdF (ORCPT + 99 others); Tue, 27 Feb 2018 00:33:05 -0500 Received: from mail-pg0-f51.google.com ([74.125.83.51]:46623 "EHLO mail-pg0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750828AbeB0FdC (ORCPT ); Tue, 27 Feb 2018 00:33:02 -0500 Received: by mail-pg0-f51.google.com with SMTP id r26so1902547pgv.13; Mon, 26 Feb 2018 21:33:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=NKHqbp5cI95uSYRnQY4csOAi6MEFHuabn/IeMhzi7HY=; b=Zy1dIlbG3rmAtleGwajAYRU6P7DUC8qXvx2Y65Ci8WYLV0s5CkVwyQ4IJ2Tr0M6uLg WlRN5jclAeb7mHlpo8Kv4/+tsR4DCHZOgAQopoQm89lPzKiwTh7/k++PPLli4FB+xRTW kqbr8v0VxELhD14OAUTfCc1JxqYeG9v6z0ctkAFOb0/2F5fL1UpulIffMbjFgMhfweS+ bqV9fe4IGH26w85eP/Sn4DR+HhK3GL1EXd9OMpWkNnN5XqQcSmYpkEmuSIcegTDr1OaM p4LP1QY1hq7H5m5CnLkeJ4QcsmI7KHvs40H2FHVLk/yQke+sSuke5PSJ0fyjCwEt45O4 8CAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=NKHqbp5cI95uSYRnQY4csOAi6MEFHuabn/IeMhzi7HY=; b=arsLyaqZC9qBAI3T7xQ1QpZzETTtp95EQsTeP5kUiZKw3wKF4mwUAcN2ROlCIqPFZ4 vsp+AIY3EApETcgEeEsAd5/Ztu4Klkxq92IUEvCj0faHmbR/A2N4U2GrKisKI//DI/+1 gnZ41i/dFHVtLW+qsD6bddFwn2aTDoWg6tQA9p7BVz8mJSIn0kEC87FVWzO78RR/r/15 FD2SY4G8kMP8cs8uwyEufZwdAKqqESUTkXZbgIxjvBKMSeEY99Cl6LJQwuiIzqUS5PwE Nx4k67HyL8zzms0fXQnXws76cxnPHL6Bj1K5h7nbhd3mGTS+W19oPJamyOkQcihtmFwP vUsg== X-Gm-Message-State: APf1xPCVMBYoEYprpAL2XsN/4FoCBp/9qfspfkmseEjKD9HwgrQAol/x FkqfpvL2msaVhVC3vCu1IHM= X-Received: by 10.98.237.12 with SMTP id u12mr1228598pfh.72.1519709581065; Mon, 26 Feb 2018 21:33:01 -0800 (PST) Received: from ast-mbp ([2620:10d:c090:180::1:c617]) by smtp.gmail.com with ESMTPSA id q6sm14678630pga.37.2018.02.26.21.32.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Feb 2018 21:33:00 -0800 (PST) Date: Mon, 26 Feb 2018 21:32:57 -0800 From: Alexei Starovoitov To: Andy Lutomirski Cc: =?iso-8859-1?Q?Micka=EBl_Sala=FCn?= , LKML , Alexei Starovoitov , Arnaldo Carvalho de Melo , Casey Schaufler , Daniel Borkmann , David Drysdale , "David S . Miller" , "Eric W . Biederman" , Jann Horn , Jonathan Corbet , Michael Kerrisk , Kees Cook , Paul Moore , Sargun Dhillon , "Serge E . Hallyn" , Shuah Khan , Tejun Heo , Thomas Graf , Tycho Andersen , Will Drewry , Kernel Hardening , Linux API , LSM List , Network Development , Andrew Morton Subject: Re: [PATCH bpf-next v8 05/11] seccomp,landlock: Enforce Landlock programs per process hierarchy Message-ID: <20180227053255.a7ua24kjd6tvei2a@ast-mbp> References: <20180227004121.3633-1-mic@digikod.net> <20180227004121.3633-6-mic@digikod.net> <20180227020856.teq4hobw3zwussu2@ast-mbp> <20180227045458.wjrbbsxf3po656du@ast-mbp> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: NeoMutt/20170421 (1.8.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 27, 2018 at 05:20:55AM +0000, Andy Lutomirski wrote: > On Tue, Feb 27, 2018 at 4:54 AM, Alexei Starovoitov > wrote: > > On Tue, Feb 27, 2018 at 04:40:34AM +0000, Andy Lutomirski wrote: > >> On Tue, Feb 27, 2018 at 2:08 AM, Alexei Starovoitov > >> wrote: > >> > On Tue, Feb 27, 2018 at 01:41:15AM +0100, Micka?l Sala?n wrote: > >> >> The seccomp(2) syscall can be used by a task to apply a Landlock program > >> >> to itself. As a seccomp filter, a Landlock program is enforced for the > >> >> current task and all its future children. A program is immutable and a > >> >> task can only add new restricting programs to itself, forming a list of > >> >> programss. > >> >> > >> >> A Landlock program is tied to a Landlock hook. If the action on a kernel > >> >> object is allowed by the other Linux security mechanisms (e.g. DAC, > >> >> capabilities, other LSM), then a Landlock hook related to this kind of > >> >> object is triggered. The list of programs for this hook is then > >> >> evaluated. Each program return a 32-bit value which can deny the action > >> >> on a kernel object with a non-zero value. If every programs of the list > >> >> return zero, then the action on the object is allowed. > >> >> > >> >> Multiple Landlock programs can be chained to share a 64-bits value for a > >> >> call chain (e.g. evaluating multiple elements of a file path). This > >> >> chaining is restricted when a process construct this chain by loading a > >> >> program, but additional checks are performed when it requests to apply > >> >> this chain of programs to itself. The restrictions ensure that it is > >> >> not possible to call multiple programs in a way that would imply to > >> >> handle multiple shared values (i.e. cookies) for one chain. For now, > >> >> only a fs_pick program can be chained to the same type of program, > >> >> because it may make sense if they have different triggers (cf. next > >> >> commits). This restrictions still allows to reuse Landlock programs in > >> >> a safe way (e.g. use the same loaded fs_walk program with multiple > >> >> chains of fs_pick programs). > >> >> > >> >> Signed-off-by: Micka?l Sala?n > >> > > >> > ... > >> > > >> >> +struct landlock_prog_set *landlock_prepend_prog( > >> >> + struct landlock_prog_set *current_prog_set, > >> >> + struct bpf_prog *prog) > >> >> +{ > >> >> + struct landlock_prog_set *new_prog_set = current_prog_set; > >> >> + unsigned long pages; > >> >> + int err; > >> >> + size_t i; > >> >> + struct landlock_prog_set tmp_prog_set = {}; > >> >> + > >> >> + if (prog->type != BPF_PROG_TYPE_LANDLOCK_HOOK) > >> >> + return ERR_PTR(-EINVAL); > >> >> + > >> >> + /* validate memory size allocation */ > >> >> + pages = prog->pages; > >> >> + if (current_prog_set) { > >> >> + size_t i; > >> >> + > >> >> + for (i = 0; i < ARRAY_SIZE(current_prog_set->programs); i++) { > >> >> + struct landlock_prog_list *walker_p; > >> >> + > >> >> + for (walker_p = current_prog_set->programs[i]; > >> >> + walker_p; walker_p = walker_p->prev) > >> >> + pages += walker_p->prog->pages; > >> >> + } > >> >> + /* count a struct landlock_prog_set if we need to allocate one */ > >> >> + if (refcount_read(¤t_prog_set->usage) != 1) > >> >> + pages += round_up(sizeof(*current_prog_set), PAGE_SIZE) > >> >> + / PAGE_SIZE; > >> >> + } > >> >> + if (pages > LANDLOCK_PROGRAMS_MAX_PAGES) > >> >> + return ERR_PTR(-E2BIG); > >> >> + > >> >> + /* ensure early that we can allocate enough memory for the new > >> >> + * prog_lists */ > >> >> + err = store_landlock_prog(&tmp_prog_set, current_prog_set, prog); > >> >> + if (err) > >> >> + return ERR_PTR(err); > >> >> + > >> >> + /* > >> >> + * Each task_struct points to an array of prog list pointers. These > >> >> + * tables are duplicated when additions are made (which means each > >> >> + * table needs to be refcounted for the processes using it). When a new > >> >> + * table is created, all the refcounters on the prog_list are bumped (to > >> >> + * track each table that references the prog). When a new prog is > >> >> + * added, it's just prepended to the list for the new table to point > >> >> + * at. > >> >> + * > >> >> + * Manage all the possible errors before this step to not uselessly > >> >> + * duplicate current_prog_set and avoid a rollback. > >> >> + */ > >> >> + if (!new_prog_set) { > >> >> + /* > >> >> + * If there is no Landlock program set used by the current task, > >> >> + * then create a new one. > >> >> + */ > >> >> + new_prog_set = new_landlock_prog_set(); > >> >> + if (IS_ERR(new_prog_set)) > >> >> + goto put_tmp_lists; > >> >> + } else if (refcount_read(¤t_prog_set->usage) > 1) { > >> >> + /* > >> >> + * If the current task is not the sole user of its Landlock > >> >> + * program set, then duplicate them. > >> >> + */ > >> >> + new_prog_set = new_landlock_prog_set(); > >> >> + if (IS_ERR(new_prog_set)) > >> >> + goto put_tmp_lists; > >> >> + for (i = 0; i < ARRAY_SIZE(new_prog_set->programs); i++) { > >> >> + new_prog_set->programs[i] = > >> >> + READ_ONCE(current_prog_set->programs[i]); > >> >> + if (new_prog_set->programs[i]) > >> >> + refcount_inc(&new_prog_set->programs[i]->usage); > >> >> + } > >> >> + > >> >> + /* > >> >> + * Landlock program set from the current task will not be freed > >> >> + * here because the usage is strictly greater than 1. It is > >> >> + * only prevented to be freed by another task thanks to the > >> >> + * caller of landlock_prepend_prog() which should be locked if > >> >> + * needed. > >> >> + */ > >> >> + landlock_put_prog_set(current_prog_set); > >> >> + } > >> >> + > >> >> + /* prepend tmp_prog_set to new_prog_set */ > >> >> + for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++) { > >> >> + /* get the last new list */ > >> >> + struct landlock_prog_list *last_list = > >> >> + tmp_prog_set.programs[i]; > >> >> + > >> >> + if (last_list) { > >> >> + while (last_list->prev) > >> >> + last_list = last_list->prev; > >> >> + /* no need to increment usage (pointer replacement) */ > >> >> + last_list->prev = new_prog_set->programs[i]; > >> >> + new_prog_set->programs[i] = tmp_prog_set.programs[i]; > >> >> + } > >> >> + } > >> >> + new_prog_set->chain_last = tmp_prog_set.chain_last; > >> >> + return new_prog_set; > >> >> + > >> >> +put_tmp_lists: > >> >> + for (i = 0; i < ARRAY_SIZE(tmp_prog_set.programs); i++) > >> >> + put_landlock_prog_list(tmp_prog_set.programs[i]); > >> >> + return new_prog_set; > >> >> +} > >> > > >> > Nack on the chaining concept. > >> > Please do not reinvent the wheel. > >> > There is an existing mechanism for attaching/detaching/quering multiple > >> > programs attached to cgroup and tracing hooks that are also > >> > efficiently executed via BPF_PROG_RUN_ARRAY. > >> > Please use that instead. > >> > > >> > >> I don't see how that would help. Suppose you add a filter, then > >> fork(), and then the child adds another filter. Do you want to > >> duplicate the entire array? You certainly can't *modify* the array > >> because you'll affect processes that shouldn't be affected. > >> > >> In contrast, doing this through seccomp like the earlier patches > >> seemed just fine to me, and seccomp already had the right logic. > > > > it doesn't look to me that existing seccomp side of managing fork > > situation can be reused. Here there is an attempt to add 'chaining' > > concept which sort of an extension of existing seccomp style, > > but somehow heavily done on bpf side and contradicts cgroup/tracing. > > > > I don't see why the seccomp way can't be used. I agree with you that > the seccomp *style* shouldn't be used in bpf code like this, but I > think that Landlock programs can and should just live in the existing > seccomp chain. If the existing seccomp code needs some modification > to make this work, then so be it. +1 if that was the case... but that's not my reading of the patch set. > In other words, the kernel already has two kinds of chaining: > seccomp's and bpf's. bpf's doesn't work right for this type of usage > across fork(), whereas seccomp's already handles that case correctly. > (In contrast, seccomp's is totally wrong for cgroup-attached filters.) > So IMO Landlock should use the seccomp core code and call into bpf > for the actual filtering. +1 in cgroup we had to invent this new BPF_PROG_RUN_ARRAY mechanism, since cgroup hierarchy can be complicated with bpf progs attached at different levels with different override/multiprog properties, so walking link list and checking all flags at run-time would have been too slow. That's why we added compute_effective_progs(). imo doing similar stuff at fork is not a big deal either. allocating new bpf_prog_array for the task and computing array of progs is trivial. > For what it's worth, I haven't figured out that Micka?l means about > chaining restrictions involving cookies. This seems like something > wrong and the design is too complicated. the cookie part I don't like either. In networking land we have cb in skb and xdp metadata to pass data between programs. imo right now for landlock there is no need to do any of this stuff. It's purely a feature extension that is clearly controversial and blocking even basic review of the rest of the patches.