Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161153AbWBQVkb (ORCPT ); Fri, 17 Feb 2006 16:40:31 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1161155AbWBQVka (ORCPT ); Fri, 17 Feb 2006 16:40:30 -0500 Received: from MAIL.13thfloor.at ([212.16.62.50]:26522 "EHLO mail.13thfloor.at") by vger.kernel.org with ESMTP id S1161153AbWBQVk3 (ORCPT ); Fri, 17 Feb 2006 16:40:29 -0500 Date: Fri, 17 Feb 2006 22:40:27 +0100 From: Herbert Poetzl To: Hubertus Franke Cc: "Eric W. Biederman" , Dave Hansen , "Serge E. Hallyn" , Kirill Korotaev , linux-kernel@vger.kernel.org, vserver@list.linux-vserver.org, Alan Cox , Arjan van de Ven , Suleiman Souhlal , Cedric Le Goater , Kyle Moffett , Greg , Linus Torvalds , Andrew Morton , Greg KH , Rik van Riel , Alexey Kuznetsov , Andrey Savochkin , Kirill Korotaev , Andi Kleen , Benjamin Herrenschmidt , Jeff Garzik , Trond Myklebust , Jes Sorensen Subject: Re: (pspace,pid) vs true pid virtualization Message-ID: <20060217214027.GA30682@MAIL.13thfloor.at> Mail-Followup-To: Hubertus Franke , "Eric W. Biederman" , Dave Hansen , "Serge E. Hallyn" , Kirill Korotaev , linux-kernel@vger.kernel.org, vserver@list.linux-vserver.org, Alan Cox , Arjan van de Ven , Suleiman Souhlal , Cedric Le Goater , Kyle Moffett , Greg , Linus Torvalds , Andrew Morton , Greg KH , Rik van Riel , Alexey Kuznetsov , Andrey Savochkin , Kirill Korotaev , Andi Kleen , Benjamin Herrenschmidt , Jeff Garzik , Trond Myklebust , Jes Sorensen References: <20060216175326.GA11974@sergelap.austin.ibm.com> <20060216184407.GC11974@sergelap.austin.ibm.com> <1140115979.21383.11.camel@localhost.localdomain> <20060217114441.GA17940@MAIL.13thfloor.at> <20060217124411.GB17940@MAIL.13thfloor.at> <43F5D227.8020105@watson.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <43F5D227.8020105@watson.ibm.com> User-Agent: Mutt/1.5.6i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3521 Lines: 98 On Fri, Feb 17, 2006 at 08:39:51AM -0500, Hubertus Franke wrote: > Herbert Poetzl wrote: > >On Fri, Feb 17, 2006 at 05:16:06AM -0700, Eric W. Biederman wrote: > > > >>Herbert Poetzl writes: > >> > >> > >>>On Fri, Feb 17, 2006 at 03:57:26AM -0700, Eric W. Biederman wrote: > >>> > >>>>As for that. When I mad that suggestion to Herbert Poetzl > >>>>his only concern was that a smart init might be too heavy weight > >>>>for lightweight vserver. Generally I like the idea. > >>> > >>>well, may I remind that this solution would require _two_ > >>>init processes for each guest, which could easily make up > >>>300-400 unnecessary processes in a lightweight server > >>>setup? > >> > >>I take it seriously enough that I remembered the concern, > >>and I think it is legitimate. Figuring out how to safely > >>set the policy is a challenge. That is something a > >>user space daemon trivially gets right. > >> > >>The kernel side of a process is about 10K if the user space > >>side was also lightweight we could have the entire > >>per process cost in the 30K range. 30K*400 = 12000K = 12M. > > > > > >that's something I'm not so worried about, but a statically > >compiled userspace process with 20K sounds unusual in the > >time of 2M *libcs :) > > > > > >>That is significant but we are still cheap enough that it > >>isn't necessarily a show stopper. > >> > >>I think the cost was only one extra process, for the case where you > >>have fakeinit now it would be init, for other cases it would be a > >>daemon that gets setup when you initialize the vserver. > > > > Eric, Herbert.. why do we need an extra process in each and every > pspace. > > Why not have single global pspace-init daemon that acts as the reaper > for all pspace-top processes. Its only at the boundaries of pspaces > and with signals were we seem to have trouble. that would probably work, but I think it adds some complications and might require certain design changes just to give some ideas: - how to reach the guest space if there is no 'handle'? - how to handle hierarchical contexts? > The "pspace-init" reaps the signal of all its sub-pspace's top > processes and then "forwards" the signal to processes actually > waiting. Kind of an interposer. Same way from the other side. > > You allocate a pid on behalf of the process you spawn in your > pidspace. You mark in the pid hash of the lookup that this is merely > a proxy and you forward that to the pspace-init where you have a > separate lookup with . > > Same with signals, once the signal is reaped by pspace-init and its looked > up who is the parent pspace and the pid in there, we forward it.. yup, could work ... best, Herbert > Is something like that workable, idiotic (be kind), too intrusive ? > > -- Hubertus > > > > > >well, depends, currently we do not need a parent to handle > >the guest, so there is _no_ waiting process in the light- > >weight case either, which makes that two processes for each > >guest, no? > > > >anyway, I'm not strictly against having an init process > >inside a guest, as long as it is not an essential part > >of the overall design, because that would make it much > >harder to rip it out later :) > > > >best, > >Herbert > > > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/