Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934260AbcCIVZU (ORCPT ); Wed, 9 Mar 2016 16:25:20 -0500 Received: from mail-ig0-f180.google.com ([209.85.213.180]:38123 "EHLO mail-ig0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934192AbcCIVZO (ORCPT ); Wed, 9 Mar 2016 16:25:14 -0500 MIME-Version: 1.0 In-Reply-To: References: <1456949376-4910-1-git-send-email-cmetcalf@ezchip.com> <1456949376-4910-10-git-send-email-cmetcalf@ezchip.com> <56D895EA.1060301@mellanox.com> <56DDE9C9.5060900@mellanox.com> <56DF38BA.9030007@mellanox.com> Date: Wed, 9 Mar 2016 13:25:12 -0800 X-Google-Sender-Auth: 9lZqXTEuTKv35Ff482HbVwEg6F0 Message-ID: Subject: Re: [PATCH v10 09/12] arch/x86: enable task isolation functionality From: Kees Cook To: Andy Lutomirski Cc: Chris Metcalf , Thomas Gleixner , Christoph Lameter , Andrew Morton , Viresh Kumar , Ingo Molnar , Steven Rostedt , Tejun Heo , Gilad Ben Yossef , Will Deacon , Rik van Riel , Frederic Weisbecker , "Paul E. McKenney" , "linux-kernel@vger.kernel.org" , X86 ML , "H. Peter Anvin" , Catalin Marinas , Peter Zijlstra Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2614 Lines: 68 On Wed, Mar 9, 2016 at 1:18 PM, Andy Lutomirski wrote: > On Wed, Mar 9, 2016 at 1:10 PM, Kees Cook wrote: >> On Wed, Mar 9, 2016 at 12:58 PM, Andy Lutomirski wrote: >>> On Tue, Mar 8, 2016 at 12:40 PM, Chris Metcalf wrote: >>>> On 03/07/2016 03:55 PM, Andy Lutomirski wrote: >>>>>>> >>>>>>> Let task isolation users who want to detect when they screw up and do >>>>>>> >>a syscall do it with seccomp. >>>>>> >>>>>> >>>>>> >Can you give me more details on what you're imagining here? Remember >>>>>> >that a key use case is that these applications can remove the syscall >>>>>> >prohibition voluntarily; it's only there to prevent unintended uses >>>>>> >(by third party libraries or just straight-up programming bugs). >>>>>> >As far as I can tell, seccomp does not allow you to go from "less >>>>>> >permissive" to "more permissive" settings at all, which means that as >>>>>> >it exists, it's not a good solution for this use case. >>>>>> > >>>>>> >Or were you thinking about a new seccomp API that allows this? >>>>> >>>>> I was. This is at least the second time I've wanted a way to ask >>>>> seccomp to allow a layer to be removed. >>>> >>>> >>>> Andy, >>>> >>>> Please take a look at this draft patch that intends to enable seccomp >>>> as something that task isolation can use. >>> >>> Kees, this sounds like it may solve your self-instrumentation problem. >>> Want to take a look? >> >> Errrr... I'm pretty uncomfortable with this. I really would like to >> keep the basic semantics of seccomp is simple as possible: filtering >> only gets more restricted. The other problem is that this won't work if the third-party code actually uses seccomp itself... this isn't composable as-is. >> >> This doesn't really solve my self-instrumentation desires since I >> still can't sanely deliver signals. I would need a lot more >> convincing. :) >> > > I think you could do it by adding a filter that turns all the unknown > things into SIGSYS, allows sigreturn, and allows the seccomp syscall, > at least in the pop-off-the-filter variant. Then you add this > removably. > > In the SIGSYS handler, you pop off the filter, do your bookkeeping, > update the filter, and push it back on. No, this won't let the original syscall through. I wanted to be able to document the syscalls as they happened without needing audit or a ptrace monitor. I am currently convinced that my desire for this is no good, and it should just be done with a ptrace monitor... -Kees > > --Andy -- Kees Cook Chrome OS & Brillo Security