Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759282AbcLQBJe (ORCPT ); Fri, 16 Dec 2016 20:09:34 -0500 Received: from mga02.intel.com ([134.134.136.20]:16942 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754062AbcLQBJ1 (ORCPT ); Fri, 16 Dec 2016 20:09:27 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,360,1477983600"; d="scan'208";a="1073009122" Date: Fri, 16 Dec 2016 17:00:25 -0800 From: yunhong jiang To: Chris Metcalf Cc: "linux-kernel@vger.kernel.org" , Paolo Bonzini Subject: Re: Questions on the task isolation patches Message-ID: <20161216170025.47317723@jnakajim-build> In-Reply-To: References: <20161201142812.369f23f8@jnakajim-build> <5dd4cbf7-d0c0-074a-c5bc-e2e09ec3dc75@mellanox.com> <20161206134355.193c752b@jnakajim-build> Organization: otc X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.28; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3247 Lines: 73 On Fri, 16 Dec 2016 16:00:48 -0500 Chris Metcalf wrote: > Sorry for the slow response - I have been busy with some other things. Thanks for the reply. > > On 12/6/2016 4:43 PM, yunhong jiang wrote: > > On Fri, 2 Dec 2016 13:58:08 -0500 > > Chris Metcalf wrote: > > > >> On 12/1/2016 5:28 PM, yunhong jiang wrote: > >>> a) If the task isolation need prctl to mark itself as isolated, > >>> possibly the vCPU thread can't achieve it. First, the vCPU thread > >>> may need system service during OS booting time, also it's the > >>> application, instead of the vCPU thread to decide if the vCPU > >>> thread should be isolated. So possibly we need a mechanism so that > >>> another process can set the vCPU thread's task isolation? > >> These are good questions. I think that the we would probably want > >> to add a KVM mode that did the prctl() before transitioning back > >> to the > > Would prctl() when back to guest be too heavy? > > It's a good question; it can be heavy. But the design for task > isolation is that the task isolated process is always running in > userspace anyway. If you are transitioning in and out of the guest > or host kernels frequently, you probably should not be using task > isolation, but just regular NOHZ_FULL. As you pointed out late, the guest task isolation does not gurantee no guest VM exit to host, although we hope we can achieve vmexit free situation in future. > > >> guest. But then, in the same way that we currently allow another > >> prctl() from a task-isolated userspace process, we'd probably need > >> to > > You mean currently in your patch we alraedy can do the prctl from > > 3rd party process to task-isolate a userspace process? Sorry that I > > didn't notice that part. > > Sorry, I think I wasn't clear. Normally when you are running task > isolated and you enter the kernel, you will get a fatal signal. The > exception is if you call prctl itself (or exit), the kernel tolerates > it without a signal, since obviously that's how you need to cleanly > tell the kernel you are done with task isolation. > > My point in the previous email was that we might need to similarly > tolerate a guest exit without causing a fatal signal to the userspace > process. But as I think about it, that's probably not true; we > probably would want to notify the guest kernel of the task isolation > violation and have it kill the userspace process just as if it had > entered the guest kernel. Thanks for the clarification. It's clear now. > > Perhaps the way to drive this is to have task isolation be triggered > from the guest's prctl up to the host, so there's some kind of KVM > exit to the host that indicates that the guest has a userspace > process that wants to run task isolated, at which point qemu invokes > task isolation on behalf of the guest then returns to the guest to > set up its own virtualized task isolation. It does get confusing! Hmm, PV solution is always a choice on virtualization world. > BTW, currently both the isolated CPU and task isolation requires the kernel parameter to reserve CPUs in advance. Possibly we can extend it to be dynamic like through sysfs in future, to avoid resource wast. --jyh