Received: by 10.223.176.5 with SMTP id f5csp1685996wra; Sun, 4 Feb 2018 09:38:09 -0800 (PST) X-Google-Smtp-Source: AH8x224ZV/JykCZ1DxMR9qsIxm1G6r7egmCQax+GZ5eaivFDO/CvamULJHU33vZoe+K3FePDMNAQ X-Received: by 2002:a17:902:6ac7:: with SMTP id i7-v6mr40358444plt.368.1517765889838; Sun, 04 Feb 2018 09:38:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517765889; cv=none; d=google.com; s=arc-20160816; b=qGLnyN2HLvPayFWxBmoC2sKiBn7d47CVoTuPBhHxydR1p2A85kxU5hOdb9E2WkrDPc S2iD2KP4QpboQQd2Qk5N9Qs+iXJj84OSe53WZjD9N4XgzEy7FIKbtQe+NYyKjv8xqFwx TXw9sabdf3e5HQyrM4MAO0HI0yGe4gmJ4QmaMPPButB1hCxv7hL/bbPwhk8FtxXJcKeY A716w4SFwBRqyJaN6XHqAcREt90lOjjeZlhL1mr0WjNlQUSNFhUl2LxvN5tTE4ShatWt R7wHhD+c++rH/+XtRxayK02/7MP8sjTq1vU0VKzM/4dknpIYhkQiIwiYjHFAoE2j3Hqk /tlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=AUzpCaCaDm12HTK4YE62JMi3NLxgX5ZjMvdV2HmLZA0=; b=UfNxV6kIU1IXfMFU9O930m/7sx9iivlxd0HQmHuqN3ibVKLz6+8hAh+GizKVMULSIQ ZmEudou0nF6HE2grO4JUIUxIYVFRRcHYx2uZr3OaVn5qWLbnn44Xz/4hCPEh8UDaFw/E UcOETP1wNt8pp1sM8JWAhQis2+OaeIMI3cwQRv8W74MBP5LMsTuyh4oHzT0Vvtfe/ggv S1jPeW7qJfMmACKS7BU/Z+PGIDSW4IhniUewLtXYypz5a5EDa5b69GPBwMkY5IvnqkCM bzNBm60BOvNqGvP4lwlOxsO/KPWLJI9maM5oUHe5WGdcDrGLQZ6+2kjzoGFsL5/MQJTk goIA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=dpTWOY1s; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u22si5522205pfh.162.2018.02.04.09.37.55; Sun, 04 Feb 2018 09:38:09 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=dpTWOY1s; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752085AbeBDRhA (ORCPT + 99 others); Sun, 4 Feb 2018 12:37:00 -0500 Received: from mail-it0-f65.google.com ([209.85.214.65]:36576 "EHLO mail-it0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751651AbeBDRgy (ORCPT ); Sun, 4 Feb 2018 12:36:54 -0500 Received: by mail-it0-f65.google.com with SMTP id n206so13338039itg.1 for ; Sun, 04 Feb 2018 09:36:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=AUzpCaCaDm12HTK4YE62JMi3NLxgX5ZjMvdV2HmLZA0=; b=dpTWOY1so1wuOD9iDfwFrbz7CQrkm3qr/QdFExX/YsC+nVyXNiGXt4dcjrVfVYMKzz NWGSARsAD9AQy4HKOGBaa7UZFNXr63iWNvTr04oJhz1I/zFW8GfRjHbtw3P0W9Wcw3el 6qQ2Uq0bojJ/STt4fwz2Q5GzqhQ82pY2bMEiVgfNc3gH8C4/Ah+jQ7XZoSKZcVb30ceG B4q3L4px6z0A0a5ChWH2XAHL18F/YTBxzIU2gwfOHhiTJLtA1Ougydnb9+jGDz3WVcrm 027pBDksAFxfA6OCWImljAm71PGheTgzX0Pdrx5CZaLC0vYVpHO34zhZ2ZHZay/82PeA 2nUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=AUzpCaCaDm12HTK4YE62JMi3NLxgX5ZjMvdV2HmLZA0=; b=GLTOvQWo6wNjKlEBtsalGyEUyU7RvtHFL9wT1vE6wpIhEEFraA64l2DJm1kmWM4D3j 8DbkW3vXvyv9af1NcGKQraxL3TyRtosPSCup4IF0n5V//u3Hs3SN1soC+Hwwgyfqcmgl KNe/NVx098A/Tv4NbmxzRph9aAzIY78SrgQCoLm23qZoTG602JhhC5Wcf4/ShSt31dZt mOeqUSrW/wFsnHFPpRFNJvWcDLPY6NlWKopvLeoy5gJZztxtdInF1bOpdIo8+8FqvWxa Jl2zBz7MKvuVP5y3BjrdcGODoU1JKc+FKrltnEDwfqh9/1zz+RfX28asL6Fy6/h9eZ3z oHyg== X-Gm-Message-State: AKwxytdixCGUI7BNd+qB6ea/OBbR78JAFFwNtvIAWXJK7BS+Ko2N1pGj 2CQesYzrwfOBr55cYOXi0cDVXJ4M/18PeuhLt7PMmw== X-Received: by 10.36.190.8 with SMTP id i8mr40697447itf.26.1517765813541; Sun, 04 Feb 2018 09:36:53 -0800 (PST) MIME-Version: 1.0 Received: by 10.2.137.84 with HTTP; Sun, 4 Feb 2018 09:36:33 -0800 (PST) In-Reply-To: <20180204104946.25559-2-tycho@tycho.ws> References: <20180204104946.25559-1-tycho@tycho.ws> <20180204104946.25559-2-tycho@tycho.ws> From: Andy Lutomirski Date: Sun, 4 Feb 2018 17:36:33 +0000 Message-ID: Subject: Re: [RFC 1/3] seccomp: add a return code to trap to userspace To: Tycho Andersen Cc: LKML , Linux Containers , Kees Cook , Oleg Nesterov , "Eric W . Biederman" , "Serge E . Hallyn" , Christian Brauner , Tyler Hicks , Akihiro Suda Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Feb 4, 2018 at 10:49 AM, Tycho Andersen wrote: > This patch introduces a means for syscalls matched in seccomp to notify > some other task that a particular filter has been triggered. Neat! > > The motivation for this is primarily for use with containers. For example, > if a container does an init_module(), we obviously don't want to load this > untrusted code, which may be compiled for the wrong version of the kernel > anyway. Instead, we could parse the module image, figure out which module > the container is trying to load and load it on the host. > > As another example, containers cannot mknod(), since this checks > capable(CAP_SYS_ADMIN). However, harmless devices like /dev/null or > /dev/zero should be ok for containers to mknod, but we'd like to avoid hard > coding some whitelist in the kernel. Another example is mount(), which has > many security restrictions for good reason, but configuration or runtime > knowledge could potentially be used to relax these restrictions. > > This patch adds functionality that is already possible via at least two > other means that I know about, both of which involve ptrace(): first, one > could ptrace attach, and then iterate through syscalls via PTRACE_SYSCALL. > Unfortunately this is slow, so a faster version would be to install a > filter that does SECCOMP_RET_TRACE, which triggers a PTRACE_EVENT_SECCOMP. > Since ptrace allows only one tracer, if the container runtime is that > tracer, users inside the container (or outside) trying to debug it will not > be able to use ptrace, which is annoying. It also means that older > distributions based on Upstart cannot boot inside containers using ptrace, > since upstart itself uses ptrace to start services. > > The actual implementation of this is fairly small, although getting the > synchronization right was/is slightly complex. Also worth noting that there > is one race still present: > > 1. a task does a SECCOMP_RET_USER_NOTIF > 2. the userspace handler reads this notification > 3. the task dies > 4. a new task with the same pid starts > 5. this new task does a SECCOMP_RET_USER_NOTIF, gets the same cookie id > that the previous one did > 6. the userspace handler writes a response I'm slightly confused. I thought the id was never reused for a given struct seccomp_filter. (Also, shouldn't the id be u64, not u32?) On very quick reading, I have a question. What happens if a process has two seccomp_filters attached, one of them returns SECCOMP_RET_USER_NOTIF, and the *other* one has a listener?