Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1763264AbZCNAyZ (ORCPT ); Fri, 13 Mar 2009 20:54:25 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1757834AbZCNA1M (ORCPT ); Fri, 13 Mar 2009 20:27:12 -0400 Received: from out02.mta.xmission.com ([166.70.13.232]:58022 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755144AbZCNA1K (ORCPT ); Fri, 13 Mar 2009 20:27:10 -0400 To: Oren Laadan Cc: Dave Hansen , linux-api@vger.kernel.org, containers@lists.linux-foundation.org, hpa@zytor.com, linux-kernel@vger.kernel.org, Alexey Dobriyan , linux-mm@kvack.org, viro@zeniv.linux.org.uk, mingo@elte.hu, mpm@selenic.com, Andrew Morton , Sukadev Bhattiprolu , Linus Torvalds , tglx@linutronix.de, xemul@openvz.org Subject: Re: How much of a mess does OpenVZ make? ;) Was: What can OpenVZ do? References: <1234479845.30155.220.camel@nimitz> <20090226155755.GA1456@x200.localdomain> <20090310215305.GA2078@x200.localdomain> <49B775B4.1040800@free.fr> <20090312145311.GC12390@us.ibm.com> <1236891719.32630.14.camel@bahia> <20090312212124.GA25019@us.ibm.com> <604427e00903122129y37ad791aq5fe7ef2552415da9@mail.gmail.com> <20090313053458.GA28833@us.ibm.com> <20090313193500.GA2285@x200.localdomain> <1236981097.30142.251.camel@nimitz> <49BADAE5.8070900@cs.columbia.edu> From: ebiederm@xmission.com (Eric W. Biederman) Date: Fri, 13 Mar 2009 17:27:02 -0700 In-Reply-To: <49BADAE5.8070900@cs.columbia.edu> (Oren Laadan's message of "Fri\, 13 Mar 2009 18\:15\:01 -0400") Message-ID: User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-XM-SPF: eid=;;;mid=;;;hst=in02.mta.xmission.com;;;ip=67.169.126.145;;;frm=ebiederm@xmission.com;;;spf=neutral X-SA-Exim-Connect-IP: 67.169.126.145 X-SA-Exim-Mail-From: ebiederm@xmission.com X-SA-Exim-Scanned: No (on in02.mta.xmission.com); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2863 Lines: 60 Oren Laadan writes: > Dave Hansen wrote: >> On Fri, 2009-03-13 at 14:01 -0700, Linus Torvalds wrote: >>> On Fri, 13 Mar 2009, Alexey Dobriyan wrote: >>>>> Let's face it, we're not going to _ever_ checkpoint any kind of general >>>>> case process. Just TCP makes that fundamentally impossible in the general >>>>> case, and there are lots and lots of other cases too (just something as >>>>> totally _trivial_ as all the files in the filesystem that don't get rolled >>>>> back). >>>> What do you mean here? Unlinked files? >>> Or modified files, or anything else. "External state" is a pretty damn >>> wide net. It's not just TCP sequence numbers and another machine. >> >> This is precisely the reason that we've focused so hard on containers, >> and *didn't* just jump right into checkpoint/restart; we're trying >> really hard to constrain the _truly_ external things that a process can >> interact with. >> >> The approach so far has largely been to make things are external to a >> process at least *internal* to a container. Network, pid, ipc, and uts >> namespaces, for example. An ipc/sem.c semaphore may be external to a >> process, so we'll just pick the whole namespace up and checkpoint it >> along with the process. >> >> In the OpenVZ case, they've at least demonstrated that the filesystem >> can be moved largely with rsync. Unlinked files need some in-kernel TLC >> (or /proc mangling) but it isn't *that* bad. > > And in the Zap we have successfully used a log-based filesystem > (specifically NILFS) to continuously snapshot the file-system atomically > with taking a checkpoint, so it can easily branch off past checkpoints, > including the file system. > > And unlinked files can be (inefficiently) handled by saving their full > contents with the checkpoint image - it's not a big toll on many apps > (if you exclude Wine and UML...). At least that's a start. Oren we might want to do a proof of concept implementation like I did with network namespaces. That is done in the community and goes far enough to show we don't have horribly nasty code. The patches and individual changes don't need to be quite perfect but close enough that they can be considered for merging. For the network namespace that seems to have made a big difference. I'm afraid in our clean start we may have focused a little too much on merging something simple and not gone far enough on showing that things will work. After I had that in the network namespace and we had a clear vision of the direction. We started merging the individual patches and things went well. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/