MIME-Version: 1.0
In-Reply-To: <CALCETrUbphanjMnij887pjj+DsFNCqFD6BJUrFVp_rsd3WwzEA@mail.gmail.com>
References: <1456949376-4910-1-git-send-email-cmetcalf@ezchip.com>
	<1456949376-4910-10-git-send-email-cmetcalf@ezchip.com>
	<CALCETrX6wJHC_yBGy7H6LamHTvGf8x+Fjqi6P2jxUBZ7GBp0AQ@mail.gmail.com>
	<56D895EA.1060301@mellanox.com>
	<CALCETrUrc_LJyLJLHefSDYagCrNqqzKuknr6uLgVXnPW8PmZKw@mail.gmail.com>
	<56DDE9C9.5060900@mellanox.com>
	<CALCETrUrP+gZsDLChMi5ZbT-TkD4gXvMZQt+iun2EYHipcuxHQ@mail.gmail.com>
	<56DF38BA.9030007@mellanox.com>
	<CALCETrVfKRZKV0ZQQn_ca0T7Ts5a6h2+4GEyoEFh31JOyg4XQw@mail.gmail.com>
	<CAGXu5jLdw+nmjSSm9E=fanbe3aLjwgjw38WAWXzP6YwSc2D5+A@mail.gmail.com>
	<CALCETrUbphanjMnij887pjj+DsFNCqFD6BJUrFVp_rsd3WwzEA@mail.gmail.com>
Date: Wed, 9 Mar 2016 13:25:12 -0800
Message-ID: <CAGXu5jKgLPJ2DOFsk9Gd8iPTEUuDLRpdSS2uTCnbmBQnxO3oPQ@mail.gmail.com>
Subject: Re: [PATCH v10 09/12] arch/x86: enable task isolation functionality
From: Kees Cook <keescook@chromium.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>,
        Thomas Gleixner <tglx@linutronix.de>, Christoph Lameter <cl@linux.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        Viresh Kumar <viresh.kumar@linaro.org>, Ingo Molnar <mingo@kernel.org>,
        Steven Rostedt <rostedt@goodmis.org>, Tejun Heo <tj@kernel.org>,
        Gilad Ben Yossef <giladb@ezchip.com>,
        Will Deacon <will.deacon@arm.com>, Rik van Riel <riel@redhat.com>,
        Frederic Weisbecker <fweisbec@gmail.com>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        X86 ML <x86@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>,
        Catalin Marinas <catalin.marinas@arm.com>,
        Peter Zijlstra <peterz@infradead.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2614
Lines: 68

On Wed, Mar 9, 2016 at 1:18 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Wed, Mar 9, 2016 at 1:10 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Wed, Mar 9, 2016 at 12:58 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Tue, Mar 8, 2016 at 12:40 PM, Chris Metcalf <cmetcalf@mellanox.com> wrote:
>>>> On 03/07/2016 03:55 PM, Andy Lutomirski wrote:
>>>>>>>
>>>>>>> Let task isolation users who want to detect when they screw up and do
>>>>>>> >>a syscall do it with seccomp.
>>>>>>
>>>>>>
>>>>>> >Can you give me more details on what you're imagining here?  Remember
>>>>>> >that a key use case is that these applications can remove the syscall
>>>>>> >prohibition voluntarily; it's only there to prevent unintended uses
>>>>>> >(by third party libraries or just straight-up programming bugs).
>>>>>> >As far as I can tell, seccomp does not allow you to go from "less
>>>>>> >permissive" to "more permissive" settings at all, which means that as
>>>>>> >it exists, it's not a good solution for this use case.
>>>>>> >
>>>>>> >Or were you thinking about a new seccomp API that allows this?
>>>>>
>>>>> I was.  This is at least the second time I've wanted a way to ask
>>>>> seccomp to allow a layer to be removed.
>>>>
>>>>
>>>> Andy,
>>>>
>>>> Please take a look at this draft patch that intends to enable seccomp
>>>> as something that task isolation can use.
>>>
>>> Kees, this sounds like it may solve your self-instrumentation problem.
>>> Want to take a look?
>>
>> Errrr... I'm pretty uncomfortable with this. I really would like to
>> keep the basic semantics of seccomp is simple as possible: filtering
>> only gets more restricted.

The other problem is that this won't work if the third-party code
actually uses seccomp itself... this isn't composable as-is.

>>
>> This doesn't really solve my self-instrumentation desires since I
>> still can't sanely deliver signals. I would need a lot more
>> convincing. :)
>>
>
> I think you could do it by adding a filter that turns all the unknown
> things into SIGSYS, allows sigreturn, and allows the seccomp syscall,
> at least in the pop-off-the-filter variant.  Then you add this
> removably.
>
> In the SIGSYS handler, you pop off the filter, do your bookkeeping,
> update the filter, and push it back on.

No, this won't let the original syscall through. I wanted to be able
to document the syscalls as they happened without needing audit or a
ptrace monitor. I am currently convinced that my desire for this is no
good, and it should just be done with a ptrace monitor...

-Kees

>
> --Andy


-- 
Kees Cook
Chrome OS & Brillo Security