Received: by 10.192.165.148 with SMTP id m20csp3973747imm; Mon, 23 Apr 2018 16:16:32 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+EGnnfTF76ttQBaKpuXvLn8J79LBGPJ2z8/AGxPbKHfOJ6XuSRZIt1mHJJ1YRxtSYLiXCp X-Received: by 2002:a17:902:b2c7:: with SMTP id x7-v6mr22791326plw.124.1524525392274; Mon, 23 Apr 2018 16:16:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524525392; cv=none; d=google.com; s=arc-20160816; b=iFizv6bz38BCascU7J0CBzZIIORQb25NEl5gI1v5rqOs+vtQlOVIwuHeNi/Tu6rbz7 ZTPKQFUQUM3hHpCu9EHWOzyFzxdLN0eXeV2rmvSw06JPnQNCceLfmUyHpHzs562YsyYA BDl00DqAIjqcz3lawg+uGppyjPJBXfVxBcEDkzmI7tCV7/wS0nkFlp8kWgL2w41HF/j8 VA90rMISyOlza7IzstbtVwCj4MOrrniZYkF1oOGxZdmyR1l2pOgWwTabKsS95QnH/nQb mD4em6rl8tJHMRvRif5z35Qx4++hWOzyRuGKdv1HxhTFF00sSdsjL95p0GF6WLuiUhiV WIDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=LJbbBUr6pT4Eh+/T5UesoYvqOZVE8aSb5y9nnCEZDrk=; b=PQkXkfIXZj3JIR4XqF5/lrOO4jYB2+rdjbwGqYONDmhbmGM+a8PmjPmk2eOQjjdtRY O2jyhB0wxeJ5YVtgGBlBKWH3NarQ7ZBdOMmThyQr7KGTcHW5t7YcLofpscmBVNQ8aq7j 0uggRMsTbG6YYqhRwBZco265GKte12r5dSQiXOPwZ4KyaJQrSJqKj9ebLCdrG9XZtRIO Xi6EpAnAHO24uTgYfjyUuC4ERtLbstBR7yJ+K/JRFiJLZyCc+a0qQjJUtdSqpWMpXkJs d4NJfaHlP4aqxJ9TlOHmpnvb2UNR2HO48VRhxdjMYzVxUiWC/J+sf2GqJmO5u8hdPSSJ +pig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@paul-moore-com.20150623.gappssmtp.com header.s=20150623 header.b=Noi4KujT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x11-v6si12578552plv.149.2018.04.23.16.16.17; Mon, 23 Apr 2018 16:16:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@paul-moore-com.20150623.gappssmtp.com header.s=20150623 header.b=Noi4KujT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932639AbeDWXPQ (ORCPT + 99 others); Mon, 23 Apr 2018 19:15:16 -0400 Received: from mail-lf0-f44.google.com ([209.85.215.44]:40878 "EHLO mail-lf0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932611AbeDWXPL (ORCPT ); Mon, 23 Apr 2018 19:15:11 -0400 Received: by mail-lf0-f44.google.com with SMTP id j16-v6so195135lfb.7 for ; Mon, 23 Apr 2018 16:15:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paul-moore-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=LJbbBUr6pT4Eh+/T5UesoYvqOZVE8aSb5y9nnCEZDrk=; b=Noi4KujTZj/CNpbCxGx16qTjGhmfjRkDacr+q/DyKK/uzfHZfI962T0oaO0pmvJOgj H/V7X95rgNmdQDnUXk4gsB2vteNUJpVN+K+zn0E0F2VlP2hugP6tfrbbiO7Ex26HEmtp hF1mC6H+gacFoNkbp3LQwzi0HZuTL+FRaUmA+hkvAG1gCCdadDt13wRXSw0d13MVxrfG 2bGKfK6FFA+5sosBk8qMqgc4tDUpMgxvj/LjyTeDdA74HK9yqSR/2JJr0HKBOmqMdInr DW0Idn9Ik4QCKrAzxxz2/d3uwIRTOdxPLTkoNZ/KEa6iyx5PpUtqwseBonYlfcilZOny awug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=LJbbBUr6pT4Eh+/T5UesoYvqOZVE8aSb5y9nnCEZDrk=; b=HqWWkPzSTF5o5iABPGRwPAD6/+KLezSXxWe9/Vv6bEtWByY2m+J1TCFNlW0lQlJhr0 KLdHeY6DPH0FpMgSvj/j8GL/gVQGgDvqttgbp+TZiAKaavzDj7y3X/QYgSRwPwNQelrK 4JqWiPDWvLaGjHcBuDFkNIoYFc/wPaG7WXSw31rpiJ7IwEgAK/7f4OOw7rPZMUS2RUf6 0iKe/Poss0gfCpqS4P4/hsZQ4Mhs/7PU9OeZy9uxEcT2hVUQMY5pyRn3/a0PGWF8c5Vi V5KZSkLyc+djnAmf/rwl8ZA0Yxjt4xv+aJrAUj9j1NtgE6ykDbpDq4W/ehQHadrLGqqY 5Xww== X-Gm-Message-State: ALQs6tCxndo906W8OPVPYrj6CDhokR6BFrXLrPT3TcOhJX/FMSZZOZ2k 0MKg9tMhPDDJv8Nbk2+0UIzfe3pMRY3lHGPWSWJH X-Received: by 2002:a19:b587:: with SMTP id g7-v6mr10348791lfk.90.1524525309862; Mon, 23 Apr 2018 16:15:09 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a19:a5c3:0:0:0:0:0 with HTTP; Mon, 23 Apr 2018 16:15:08 -0700 (PDT) X-Originating-IP: [108.20.156.165] In-Reply-To: <20180421143443.faaput5g2rn6ul7p@madcap2.tricolour.ca> References: <20180421143443.faaput5g2rn6ul7p@madcap2.tricolour.ca> From: Paul Moore Date: Mon, 23 Apr 2018 19:15:08 -0400 Message-ID: Subject: Re: [RFC PATCH ghak32 V2 01/13] audit: add container id To: Richard Guy Briggs Cc: cgroups@vger.kernel.org, containers@lists.linux-foundation.org, linux-api@vger.kernel.org, Linux-Audit Mailing List , linux-fsdevel@vger.kernel.org, LKML , netdev@vger.kernel.org, ebiederm@xmission.com, luto@kernel.org, jlayton@redhat.com, carlos@redhat.com, dhowells@redhat.com, viro@zeniv.linux.org.uk, simo@redhat.com, Eric Paris , serge@hallyn.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Apr 21, 2018 at 10:34 AM, Richard Guy Briggs wrote: > On 2018-04-18 19:47, Paul Moore wrote: >> On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs wrote: >> > Implement the proc fs write to set the audit container ID of a process, >> > emitting an AUDIT_CONTAINER record to document the event. >> > >> > This is a write from the container orchestrator task to a proc entry of >> > the form /proc/PID/containerid where PID is the process ID of the newly >> > created task that is to become the first task in a container, or an >> > additional task added to a container. >> > >> > The write expects up to a u64 value (unset: 18446744073709551615). >> > >> > This will produce a record such as this: >> > type=CONTAINER msg=audit(1519903238.968:261): op=set pid=596 uid=0 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 auid=0 tty=pts0 ses=1 opid=596 old-contid=18446744073709551615 contid=123455 res=0 >> > >> > The "op" field indicates an initial set. The "pid" to "ses" fields are >> > the orchestrator while the "opid" field is the object's PID, the process >> > being "contained". Old and new container ID values are given in the >> > "contid" fields, while res indicates its success. >> > >> > It is not permitted to self-set, unset or re-set the container ID. A >> > child inherits its parent's container ID, but then can be set only once >> > after. >> > >> > See: https://github.com/linux-audit/audit-kernel/issues/32 >> > >> > Signed-off-by: Richard Guy Briggs >> > --- >> > fs/proc/base.c | 37 ++++++++++++++++++++ >> > include/linux/audit.h | 16 +++++++++ >> > include/linux/init_task.h | 4 ++- >> > include/linux/sched.h | 1 + >> > include/uapi/linux/audit.h | 2 ++ >> > kernel/auditsc.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++ >> > 6 files changed, 143 insertions(+), 1 deletion(-) ... >> > diff --git a/include/linux/sched.h b/include/linux/sched.h >> > index d258826..1b82191 100644 >> > --- a/include/linux/sched.h >> > +++ b/include/linux/sched.h >> > @@ -796,6 +796,7 @@ struct task_struct { >> > #ifdef CONFIG_AUDITSYSCALL >> > kuid_t loginuid; >> > unsigned int sessionid; >> > + u64 containerid; >> >> This one line addition to the task_struct scares me the most of >> anything in this patchset. Why? It's a field named "containerid" in >> a perhaps one of the most widely used core kernel structures; the >> possibilities for abuse are endless, and it's foolish to think we >> would ever be able to adequately police this. > > Fair enough. > >> Unfortunately, we can't add the field to audit_context as things >> currently stand because we don't always allocate an audit_context, >> it's dependent on the system's configuration, and we need to track the >> audit container ID for a given process, regardless of the audit >> configuration. Pretty much the same reason why loginuid and sessionid >> are located directly in task_struct now. As I stressed during the >> design phase, I really want to keep this as an *audit* container ID >> and not a general purpose kernel wide container ID. If the kernel >> ever grows a general purpose container ID token, I'll be the first in >> line to convert the audit code, but I don't want audit to be that >> general purpose mechanism ... audit is hated enough as-is ;) > > When would we need an audit container ID when audit is not enabled > enough to have an audit_context? I'm thinking of the audit_alloc() case where audit_filter_task() returns AUDIT_DISABLED. I believe this is the same reason why loginuid and sessionid live directly in the task_struct and not in the audit_context; they need to persist for the lifetime of the task. > If it is only used for audit, and audit is the only consumer, and audit > can only use it when it is enabled, then we can just return success to > any write to the proc filehandle, or not even present it. Nothing will > be able to know that value wasn't used. > > When are loginuid and sessionid used now when audit is not enabled (or > should I say, explicitly disabled)? See above. I think that should answer these questions. >> > diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h >> > index 4e61a9e..921a71f 100644 >> > --- a/include/uapi/linux/audit.h >> > +++ b/include/uapi/linux/audit.h >> > @@ -71,6 +71,7 @@ >> > #define AUDIT_TTY_SET 1017 /* Set TTY auditing status */ >> > #define AUDIT_SET_FEATURE 1018 /* Turn an audit feature on or off */ >> > #define AUDIT_GET_FEATURE 1019 /* Get which features are enabled */ >> > +#define AUDIT_CONTAINER 1020 /* Define the container id and information */ >> > >> > #define AUDIT_FIRST_USER_MSG 1100 /* Userspace messages mostly uninteresting to kernel */ >> > #define AUDIT_USER_AVC 1107 /* We filter this differently */ >> > @@ -465,6 +466,7 @@ struct audit_tty_status { >> > }; >> > >> > #define AUDIT_UID_UNSET (unsigned int)-1 >> > +#define AUDIT_CID_UNSET ((u64)-1) >> >> I think we need to decide if we want to distinguish between the "host" >> (e.g. init ns) and "unset". Looking at this patch (I've only quickly >> skimmed the others so far) it would appear that you don't think we >> need to worry about this distinction; that's fine, but let's make it >> explicit with a comment in the code that AUDIT_CID_UNSET means "unset" >> as well as "host". > > I don't see any reason to distinguish between "host" and "unset". Since > a container doesn't have a concrete definition based in namespaces, the > initial namespace set is meaningless here. Okay, that sounds reasonable. > Is there value in having a container orchestrator process have a > reserved container ID that has a policy distinct from any other > container? I'm open to arguments for this idea, but I don't see a point to it right now. > If so, then I could see the value in making the distinction. > For example, I've heard of interest in systemd acting as a container > orchestrator, so if it took on that role as PID 1, then every process in > the system would inherit that ID and none would be unset. > > I can't picture how having seperate "host" and "unset" values helps us. I don't have a strong feeling either way, I just wanted to ask the question. >> > /* audit_rule_data supports filter rules with both integer and string >> > * fields. It corresponds with AUDIT_ADD_RULE, AUDIT_DEL_RULE and >> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c >> > index 4e0a4ac..29c8482 100644 >> > --- a/kernel/auditsc.c >> > +++ b/kernel/auditsc.c >> > @@ -2073,6 +2073,90 @@ int audit_set_loginuid(kuid_t loginuid) >> > return rc; >> > } >> > >> > +static int audit_set_containerid_perm(struct task_struct *task, u64 containerid) >> > +{ >> > + struct task_struct *parent; >> > + u64 pcontainerid, ccontainerid; >> > + >> > + /* Don't allow to set our own containerid */ >> > + if (current == task) >> > + return -EPERM; >> >> Why not? Is there some obvious security concern that I missing? > > We then lose the distinction in the AUDIT_CONTAINER record between the > initiating PID and the target PID. This was outlined in the proposal. I just went back and reread the v3 proposal and I still don't see a good explanation of this. Why is this bad? What's the security concern? > Having said that, I'm still not sure we have protected sufficiently from > a child turning around and setting it's parent's as yet unset or > inherited audit container ID. Yes, I believe we only want to let a task set the audit container for it's children (or itself/threads if we decide to allow that, see above). There *has* to be a function to check to see if a task if a child of a given task ... right? ... although this is likely to be a pointer traversal and locking nightmare ... hmmm. >> I ask because I suppose it might be possible for some container >> runtime to do a fork, setup some of the environment and them exec the >> container (before you answer the obvious "namespaces!" please remember >> we're not trying to define containers). > > I don't think namespaces have any bearing on this concern since none are > required. > >> > + /* Don't allow the containerid to be unset */ >> > + if (!cid_valid(containerid)) >> > + return -EINVAL; >> > + /* if we don't have caps, reject */ >> > + if (!capable(CAP_AUDIT_CONTROL)) >> > + return -EPERM; >> > + /* if containerid is unset, allow */ >> > + if (!audit_containerid_set(task)) >> > + return 0; >> > + /* it is already set, and not inherited from the parent, reject */ >> > + ccontainerid = audit_get_containerid(task); >> > + rcu_read_lock(); >> > + parent = rcu_dereference(task->real_parent); >> > + rcu_read_unlock(); >> > + task_lock(parent); >> > + pcontainerid = audit_get_containerid(parent); >> > + task_unlock(parent); >> > + if (ccontainerid != pcontainerid) >> > + return -EPERM; >> > + return 0; >> > +} >> > + >> > +static void audit_log_set_containerid(struct task_struct *task, u64 oldcontainerid, >> > + u64 containerid, int rc) >> > +{ >> > + struct audit_buffer *ab; >> > + uid_t uid; >> > + struct tty_struct *tty; >> > + >> > + if (!audit_enabled) >> > + return; >> > + >> > + ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_CONTAINER); >> > + if (!ab) >> > + return; >> > + >> > + uid = from_kuid(&init_user_ns, task_uid(current)); >> > + tty = audit_get_tty(current); >> > + >> > + audit_log_format(ab, "op=set pid=%d uid=%u", task_tgid_nr(current), uid); >> > + audit_log_task_context(ab); >> > + audit_log_format(ab, " auid=%u tty=%s ses=%u opid=%d old-contid=%llu contid=%llu res=%d", >> > + from_kuid(&init_user_ns, audit_get_loginuid(current)), >> > + tty ? tty_name(tty) : "(none)", audit_get_sessionid(current), >> > + task_tgid_nr(task), oldcontainerid, containerid, !rc); >> > + >> > + audit_put_tty(tty); >> > + audit_log_end(ab); >> > +} >> > + >> > +/** >> > + * audit_set_containerid - set current task's audit_context containerid >> > + * @containerid: containerid value >> > + * >> > + * Returns 0 on success, -EPERM on permission failure. >> > + * >> > + * Called (set) from fs/proc/base.c::proc_containerid_write(). >> > + */ >> > +int audit_set_containerid(struct task_struct *task, u64 containerid) >> > +{ >> > + u64 oldcontainerid; >> > + int rc; >> > + >> > + oldcontainerid = audit_get_containerid(task); >> > + >> > + rc = audit_set_containerid_perm(task, containerid); >> > + if (!rc) { >> > + task_lock(task); >> > + task->containerid = containerid; >> > + task_unlock(task); >> > + } >> > + >> > + audit_log_set_containerid(task, oldcontainerid, containerid, rc); >> > + return rc; >> >> Why are audit_set_containerid_perm() and audit_log_containerid() >> separate functions? > > (I assume you mean audit_log_set_containerid()?) Yep. My fingers got tired typing in that function name and decided a shortcut was necessary. > It seemed clearer that all the permission checking was in one function > and its return code could be used to report the outcome when logging the > (attempted) action. This is the same structure as audit_set_loginuid() > and it made sense. When possible I really like it when the permission checks are in the same function as the code which does the work; it's less likely to get abused that way (you have to willfully bypass the access checks). The exceptions might be if you wanted to reuse the access control code, or insert a modular access mechanism (e.g. LSMs). I'm less concerned about audit_log_set_containerid(), but the usual idea of avoiding single-use function within the same scope applies here. > This would be the time to connect it to a syscall if that seems like a > good idea and remove pid, uid, auid, tty, ses fields. Ah yes, I missed that. You know my stance on connecting records by now (hint: yes, connect them) so I think that would be a good thing to do for the next round. -- paul moore www.paul-moore.com