Subject: Re: [RFC][PATCH 00/11] track files for checkpointability
From: Dave Hansen <dave@linux.vnet.ibm.com>
To: "Serge E. Hallyn" <serue@us.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>,
       Christoph Hellwig <hch@infradead.org>,
       containers <containers@lists.linux-foundation.org>,
       Ingo Molnar <mingo@elte.hu>,
       "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
In-Reply-To: <20090306162337.GA3040@us.ibm.com>
References: <20090305163857.0C18F3FD@kernel>
	 <20090305174037.GA2274@x200.localdomain> <1236280567.22399.99.camel@nimitz>
	 <20090305210840.GA2499@x200.localdomain>
	 <1236288427.22399.122.camel@nimitz>
	 <20090305220044.GA2819@x200.localdomain>
	 <1236291865.22399.139.camel@nimitz> <20090306143425.GA31250@us.ibm.com>
	 <1236354509.10626.29.camel@nimitz>  <20090306162337.GA3040@us.ibm.com>
Content-Type: text/plain
Date: Fri, 06 Mar 2009 08:46:05 -0800
Message-Id: <1236357965.10626.51.camel@nimitz>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1946
Lines: 46

On Fri, 2009-03-06 at 10:23 -0600, Serge E. Hallyn wrote:
> Which imo is fine, but my question is whether that leaves any actual
> value in the persistent per-resource uncheckpointable flag.

OK, let's take a look back at this discussion a little bit and how we
got here.

Ingo quotes:
> Yeah, per resource it should be. That's per task in the normal 
> case - except for threaded workloads where it's shared by 
> threads.

> Uncheckpointable should be a one-way flag anyway. We want this 
> to become usable, so uncheckpointable functionality should be as 
> painful as possible, to make sure it's getting fixed ...

> Is there any automated test that could discover C/R breakage via 
> brute force? All that matters in such cases is to get the "you 
> broke stuff" information as soon as possible. If it comes at an 
> early stage developers can generally just fix stuff.

You add these things together and you get what I posted.  My patch is:
1. per resource
2. has a one way flag
3. Gives messages to developers at an early stage (dmesg) and lets them
   explore it more thoroughly (/proc)

But, these "early stage" messages are completely opposed to an approach
that uses sys_checkpoint() in some form (like with a -1 fd as an
argument).

Think of it like lockdep.  We *could* have designed lockdep to simply
give us a nice message whenever we do an a/b b/a deadlock.  That would
be helpful.  Or, we could design it to record all lock acquisitions that
didn't deadlock to see if they ever possibly deadlock.  (We did the
second one, btw).  That gave an early, useful, warning that developers
could fix before we encounter an actual problem.  I'm advocating such a
mechanism for c/r.  

-- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/