Return-Path: Received: from mx2.parallels.com ([199.115.105.18]:48808 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751003AbcBMQJH (ORCPT ); Sat, 13 Feb 2016 11:09:07 -0500 Reply-To: Subject: Re: call_usermodehelper in containers References: <20131111071825.62da01d1@tlielax.poochiereds.net> <20131112004703.GB15377@kroah.com> <20131112061201.04cf25ab@tlielax.poochiereds.net> <528226EC.4050701@parallels.com> <20131112083043.0ab78e67@tlielax.poochiereds.net> <5285FA0A.2080802@parallels.com> <871u2incyo.fsf@xmission.com> <52860B6D.4090208@parallels.com> <1455320373.2890.5.camel@themaw.net> To: Ian Kent , Stanislav Kinsbursky , "Eric W. Biederman" CC: Jeff Layton , Greg KH , , , , , , , From: Stanislav Kinsburskiy Message-ID: <56BF5511.6050606@virtuozzo.com> Date: Sat, 13 Feb 2016 17:08:49 +0100 MIME-Version: 1.0 In-Reply-To: <1455320373.2890.5.camel@themaw.net> Content-Type: text/plain; charset="utf-8"; format=flowed Sender: linux-nfs-owner@vger.kernel.org List-ID: 13.02.2016 00:39, Ian Kent пишет: > On Fri, 2013-11-15 at 15:54 +0400, Stanislav Kinsbursky wrote: >> 15.11.2013 15:03, Eric W. Biederman пишет: >>> Stanislav Kinsbursky writes: >>> >>>> 12.11.2013 17:30, Jeff Layton пишет: >>>>> On Tue, 12 Nov 2013 17:02:36 +0400 >>>>> Stanislav Kinsbursky wrote: >>>>> >>>>>> 12.11.2013 15:12, Jeff Layton пишет: >>>>>>> On Mon, 11 Nov 2013 16:47:03 -0800 >>>>>>> Greg KH wrote: >>>>>>> >>>>>>>> On Mon, Nov 11, 2013 at 07:18:25AM -0500, Jeff Layton >>>>>>>> wrote: >>>>>>>>> We have a bit of a problem wrt to upcalls that use >>>>>>>>> call_usermodehelper >>>>>>>>> with containers and I'd like to bring this to some sort >>>>>>>>> of resolution... >>>>>>>>> >>>>>>>>> A particularly problematic case (though there are >>>>>>>>> others) is the >>>>>>>>> nfsdcltrack upcall. It basically uses >>>>>>>>> call_usermodehelper to run a >>>>>>>>> program in userland to track some information on stable >>>>>>>>> storage for >>>>>>>>> nfsd. >>>>>>>> I thought the discussion at the kernel summit about this >>>>>>>> issue was: >>>>>>>> - don't do this. >>>>>>>> - don't do it. >>>>>>>> - if you really need to do this, fix nfsd >>>>>>>> >>>>>>> Sorry, I couldn't make the kernel summit so I missed that >>>>>>> discussion. I >>>>>>> guess LWN didn't cover it? >>>>>>> >>>>>>> In any case, I guess then that we'll either have to come up >>>>>>> with some >>>>>>> way to fix nfsd here, or simply ensure that nfsd can never >>>>>>> be started >>>>>>> unless root in the container has a full set of a full set of >>>>>>> capabilities. >>>>>>> >>>>>>> One sort of Rube Goldberg possibility to fix nfsd is: >>>>>>> >>>>>>> - when we start nfsd in a container, fork off an extra >>>>>>> kernel thread >>>>>>> that just sits idle. That thread would need to be a >>>>>>> descendant of the >>>>>>> userland process that started nfsd, so we'd need to >>>>>>> create it with >>>>>>> kernel_thread(). >>>>>>> >>>>>>> - Have the kernel just start up the UMH program in the >>>>>>> init_ns mount >>>>>>> namespace as it currently does, but also pass the pid >>>>>>> of the idle >>>>>>> kernel thread to the UMH upcall. >>>>>>> >>>>>>> - The program will then use /proc//root and >>>>>>> /proc//ns/* to set >>>>>>> itself up for doing things properly. >>>>>>> >>>>>>> Note that with this mechanism we can't actually run a >>>>>>> different binary >>>>>>> per container, but that's probably fine for most purposes. >>>>>>> >>>>>> Hmmm... Why we can't? We can go a bit further with userspace >>>>>> idea. >>>>>> >>>>>> We use UMH some very limited number of user programs. For 2, >>>>>> actually: >>>>>> 1) /sbin/nfs_cache_getent >>>>>> 2) /sbin/nfsdcltrack >>>>>> >>>>> No, the kernel uses them for a lot more than that. Pretty much >>>>> all of >>>>> the keys API upcalls use it. See all of the callers of >>>>> call_usermodehelper. All of them are running user binaries out >>>>> of the >>>>> kernel, and almost all of them are certainly broken wrt >>>>> containers. >>>>> >>>>>> If we convert them into proxies, which use /proc//root >>>>>> and /proc//ns/*, this will allow us to lookup the right >>>>>> binary. >>>>>> The only limitation here is presence of this "proxy" binaries >>>>>> on "host". >>>>>> >>>>> Suppose I spawn my own container as a user, using all of this >>>>> spiffy >>>>> new user namespace stuff. Then I make the kernel use >>>>> call_usermodehelper to call the upcall in the init_ns, and then >>>>> trick >>>>> it into running my new "escape_from_namespace" program with >>>>> "real" root >>>>> privileges. >>>>> >>>>> I don't think we can reasonably assume that having the kernel >>>>> exec an >>>>> arbitrary binary inside of a container is safe. Doing so inside >>>>> of the >>>>> init_ns is marginally more safe, but only marginally so... >>>>> >>>>>> And we don't need any significant changes in kernel. >>>>>> >>>>>> BTW, Jeff, could you remind me, please, why exactly we need to >>>>>> use UMH to run the binary? >>>>>> What are this capabilities, which force us to do so? >>>>>> >>>>> Nothing _forces_ us to do so, but upcalls are very difficult to >>>>> handle, >>>>> and UMH has a lot of advantages over a long-running daemon >>>>> launched by >>>>> userland. >>>>> >>>>> Originally, I created the nfsdcltrack upcall as a running daemon >>>>> called >>>>> nfsdcld, and the kernel used rpc_pipefs to communicate with it. >>>>> >>>>> Everyone hated it because no one likes to have to run daemons >>>>> for >>>>> infrequently used upcalls. It's a pain for users to ensure that >>>>> it's >>>>> running and it's a pain to handle when it isn't. So, I was >>>>> encouraged >>>>> to turn that instead into a UMH upcall. >>>>> >>>>> But leaving that aside, this problem is a lot larger than just >>>>> nfsd. We >>>>> have a *lot* of UMH upcalls in the kernel, so this problem is >>>>> more >>>>> general than just "fixing" nfsd's. >>>>> >>>> Ok. So we are talking about generic approach to UMH support in a >>>> container (and/or namespace). >>>> >>>> Actually, as far as I can see, there are more that one aspect, >>>> which is not supported. >>>> One one them is executing of the right binary. Another one is >>>> capabilities (and maybe there are more, like user namespaces), but >>>> I >>>> don't really care about them for now. >>>> Executing the right binary, actually, is not about namespaces at >>>> all. This is about lookup implementation in VFS >>>> (do_execve_common). >>> >>> >>>> Would be great to unshare FS for forked UHM kthread and swap it to >>>> desired root. This will solve the problem with proper lookup. >>>> However, >>>> as far as I understand, this approach is not welcome by the >>>> community. >>> I don't understand that one. Having a preforked thread with the >>> proper >>> environment that can act like kthreadd in terms of spawning user >>> mode >>> helpers works and is simple. The only downside I can see is that >>> there >>> is extra overhead. >>> >> What do you mean by "simple" here? Simple to implement? >> We already have a preforked thread, called "UMH", used exactly for >> this purpose. > Is there? > > Can you explain how the pre-forking happens please? > > AFAICS a workqueue is used to run UMH helpers, I can't see any pre > -forking going on there and it doesn't appear to be possible to do > either. Hi Ian, I'm not sure, I understand your question. But there is a generic "khelper" thread, which is responsible for spawning new kthreads to execute some binary, requested by user. IOW, when you want to use UMH, you add a request to "khelper" workqueue, which, in turn, creates another thread. The new one call init callback and does the actual execve call. > >> And, if I'm not mistaken, we are trying to discuss, how to adapt >> existent infrastructure for namespaces, don't we? >> >>> Beyond that though for the user mode helpers spawned to populate >>> security keys it is not clear which context they should be run in, >>> even if we do have kernel threads. >>> >> Regardless of the context itself, we need a way to pass it to kernel >> thread and to put kernel thread in this context. Or I'm missing >> something? >> >>>> This problem, probably, can be solved by constructing full binary >>>> path >>>> (i.e. not in a container, but in kernel thread root context) in >>>> UMH >>>> "init" callack. However, this will help only is the dentry is >>>> accessible from "init" root. Which is usually no true in case on >>>> mount >>>> namespaces, if I understand them right. >>> You are correct it can not be assumed that what is visible in one >>> mount >>> namespace is visible in another. And of course in addition to >>> picking >>> the correct binary to run you have to set up a proper environment >>> for >>> that binary to run in. It may be that it's configuration file is >>> only >>> avaiable at the expected location in the proper mount namespace, >>> even >>> if the binary is available in all of the mount namespaces. >>> >> Yes, you are right. So, this solution can help only in case of very >> specific and simple "environment-less" programs. >> So, I believe, that we should modify UMH itself to support our needs. >> But I don't see, how to make the idea more pleasant for the community. >> IOW, when I was talking about UMH in NFS implementation on Ksummit, >> Linus's answer was something like "fix NFS". >> And I can't object it, actually, because for now NFS is the only >> corner case. >> >> Jeff said, that there are a bunch of UMH calls in kernel, but this is >> not solid enough to prove UHM changes, since nobody is trying to use >> them in containers. >> >> So, I doubt, that we can change UMH generically without additional use >> -cases for 'containerized" UMH. >> >>> Eric >>> >>