Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754134AbYBDPFs (ORCPT ); Mon, 4 Feb 2008 10:05:48 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751965AbYBDPFl (ORCPT ); Mon, 4 Feb 2008 10:05:41 -0500 Received: from mtagate4.uk.ibm.com ([195.212.29.137]:31927 "EHLO mtagate4.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751799AbYBDPFl (ORCPT ); Mon, 4 Feb 2008 10:05:41 -0500 Message-ID: <47A72891.4000404@fr.ibm.com> Date: Mon, 04 Feb 2008 16:00:33 +0100 From: Daniel Lezcano User-Agent: Thunderbird 2.0.0.9 (X11/20071115) MIME-Version: 1.0 To: Pavel Emelyanov CC: Kirill Korotaev , containers@lists.linux-foundation.org, Cedric Le Goater , linux-kernel@vger.kernel.org, Alexey Dobriyan Subject: Re: [Devel] Re: [PATCH 2.6.24-rc8-mm1 09/15] (RFC) IPC: new kernel API to change an ID References: <20080129160229.612172683@bull.net> <20080129162000.454857358@bull.net> <20080129210656.GB1990@martell.zuzino.mipt.ru> <47A18E47.5050206@bull.net> <47A19AC2.7040709@sw.ru> <47A1B78C.7050405@bull.net> <47A1C8FE.9010700@sw.ru> <47A1F2DB.7080600@fr.ibm.com> <47A71606.5030201@sw.ru> <47A71BDF.5000801@openvz.org> In-Reply-To: <47A71BDF.5000801@openvz.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2414 Lines: 53 Pavel Emelyanov wrote: > Kirill Korotaev wrote: >> Cedric Le Goater wrote: >>> Hello Kirill ! >>> >>> Kirill Korotaev wrote: >>>> Pierre, >>>> >>>> my point is that after you've added interface "set IPCID", you'll need >>>> more and more for checkpointing: >>>> - "create/setup conntrack" (otherwise connections get dropped), >>>> - "set task start time" (needed for Oracle checkpointing BTW), >>>> - "set some statistics counters (e.g. networking or taskstats)" >>>> - "restore inotify" >>>> and so on and so forth. >>> right. we know that we will have to handle a lot of these >>> and more and we will need an API for it :) so how should we handle it ? >>> through a dedicated syscall that would be able to checkpoint and/or >>> restart a process, an ipc object, an ipc namespace, a full container ? >>> will it take a fd or a big binary blob ? >>> I personally really liked Pavel idea's of filesystem. but we dropped the >>> thread. >> Imho having a file system interface means having all its problems. >> Imagine you have some information about tasks exported with a file system interface. >> Obviously to collect the information you have to hold some spinlock like tasklist_lock or similar. >> Obviously, you have to drop the lock between sys_read() syscalls. >> So interface gets much more complicated - you have to rescan the objects and somehow find the place where >> you stopped previous read. Or you have to to force reader to read everything at once. > > To remember the place when we stopped previous read we have a "pos" counter > on the struct file. > > Actually, tar utility, that I propose to perform the most simple migration > reads the directory contents with 4Kb buffer - that's enough for ~500 tasks. > > Besides, is this a real problem for a frozen container? I like the idea of a C/R filesystem. Does it implies a specific user space program to orchestrate the checkpoint/restart of the different subsystems ? I mean the checkpoint is easy but what about the restart ? We must ensure, for example to restore a process before restoring the fd associated to it, or restore a deleted file before restoring the fd opened to it, no ? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/