Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754248Ab0GMLPz (ORCPT ); Tue, 13 Jul 2010 07:15:55 -0400 Received: from mx1.redhat.com ([209.132.183.28]:9481 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750942Ab0GMLPx (ORCPT ); Tue, 13 Jul 2010 07:15:53 -0400 Date: Tue, 13 Jul 2010 14:09:39 +0300 From: "Michael S. Tsirkin" To: Sridhar Samudrala Cc: Oleg Nesterov , Peter Zijlstra , Tejun Heo , Ingo Molnar , netdev , lkml , "kvm@vger.kernel.org" , Andrew Morton , Dmitri Vorobiev , Jiri Kosina , Thomas Gleixner , Andi Kleen Subject: Re: [PATCH repost] sched: export sched_set/getaffinity to modules Message-ID: <20100713110939.GA3446@redhat.com> References: <20100701133956.GD32223@redhat.com> <4C2CA5C5.4040402@kernel.org> <20100701144624.GA11171@redhat.com> <4C2CABF2.2020801@kernel.org> <1277996135.1917.198.camel@laptop> <4C2E2987.9040702@us.ibm.com> <1278094270.1917.288.camel@laptop> <20100702210637.GA12433@redhat.com> <20100704090005.GA8078@redhat.com> <4C3C0EBC.40305@us.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4C3C0EBC.40305@us.ibm.com> User-Agent: Mutt/1.5.20 (2009-12-10) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2339 Lines: 56 On Mon, Jul 12, 2010 at 11:59:08PM -0700, Sridhar Samudrala wrote: > On 7/4/2010 2:00 AM, Michael S. Tsirkin wrote: > >On Fri, Jul 02, 2010 at 11:06:37PM +0200, Oleg Nesterov wrote: > >>On 07/02, Peter Zijlstra wrote: > >>>On Fri, 2010-07-02 at 11:01 -0700, Sridhar Samudrala wrote: > >>>> Does it (Tejun's kthread_clone() patch) also inherit the > >>>>cgroup of the caller? > >>>Of course, its a simple do_fork() which inherits everything just as you > >>>would expect from a similar sys_clone()/sys_fork() call. > >>Yes. And I'm afraid it can inherit more than we want. IIUC, this is called > >>from ioctl(), right? > >> > >>Then the new thread becomes the natural child of the caller, and it shares > >>->mm with the parent. And files, dup_fd() without CLONE_FS. > >> > >>Signals. Say, if you send SIGKILL to this new thread, it can't sleep in > >>TASK_INTERRUPTIBLE or KILLABLE after that. And this SIGKILL can be sent > >>just because the parent gets SIGQUIT or abother coredumpable signal. > >>Or the new thread can recieve SIGSTOP via ^Z. > >> > >>Perhaps this is OK, I do not know. Just to remind that kernel_thread() > >>is merely clone(CLONE_VM). > >> > >>Oleg. > > > >Right. Doing this might break things like flush. The signal and exit > >behaviour needs to be examined carefully. I am also unsure whether > >using such threads might be more expensive than inheriting kthreadd. > > > Should we just leave it to the userspace to set the cgroup/cpumask > after qemu starts the guest and > the vhost threads? > > Thanks > Sridhar Yes but we can't trust userspace to do this. It's important to do it on thread creation: if we don't, malicious userspace can create large amount of work exceeding the cgroup limits. And the same applies so the affinity: if the qemu process is limited to a set of CPUs, it's important to make the kernel thread that does work our behalf limited to the same set of CPUs. This is not unique to vhost, it's just that virt scenarious are affected by this more: people seem to run untrusted applications and expect the damage to be contained. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/