Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758448AbYLPWp2 (ORCPT ); Tue, 16 Dec 2008 17:45:28 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753567AbYLPWpN (ORCPT ); Tue, 16 Dec 2008 17:45:13 -0500 Received: from smtp-out.google.com ([216.239.45.13]:2228 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752970AbYLPWpL (ORCPT ); Tue, 16 Dec 2008 17:45:11 -0500 DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:cc:subject: references:in-reply-to:content-type: content-transfer-encoding:x-gmailtapped-by:x-gmailtapped; b=eGLKsthOy5hIMDBDfwP5DI7jSj2UQSov8ymPTHcubmzOvh2q+BfZHlcCqgzuxXLhb sYRQ0A02Ny3caRyQ4IMog== Message-ID: <49482F14.1040407@google.com> Date: Tue, 16 Dec 2008 14:43:32 -0800 From: Mike Waychison User-Agent: Thunderbird 2.0.0.17 (X11/20080925) MIME-Version: 1.0 To: Dave Hansen CC: Oren Laadan , jeremy@goop.org, arnd@arndb.de, linux-api@vger.kernel.org, containers@lists.linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linux Torvalds , Alexander Viro , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar Subject: Re: [RFC v11][PATCH 03/13] General infrastructure for checkpoint restart References: <1228498282-11804-1-git-send-email-orenl@cs.columbia.edu> <1228498282-11804-4-git-send-email-orenl@cs.columbia.edu> <49482394.10006@google.com> <1229465641.17206.350.camel@nimitz> In-Reply-To: <1229465641.17206.350.camel@nimitz> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-GMailtapped-By: 172.24.198.81 X-GMailtapped: mikew Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dave Hansen wrote: > On Tue, 2008-12-16 at 13:54 -0800, Mike Waychison wrote: >> Oren Laadan wrote: >>> diff --git a/checkpoint/sys.c b/checkpoint/sys.c >>> index 375129c..bd14ef9 100644 >>> --- a/checkpoint/sys.c >>> +++ b/checkpoint/sys.c >>> +/* >>> + * During checkpoint and restart the code writes outs/reads in data >>> + * to/from the checkpoint image from/to a temporary buffer (ctx->hbuf). >>> + * Because operations can be nested, use cr_hbuf_get() to reserve space >>> + * in the buffer, then cr_hbuf_put() when you no longer need that space. >>> + */ >> This seems a bit over-kill for buffer management no? The only large >> header seems to be cr_hdr_head and the blowup comes from utsinfo string >> data (which could easily be moved out to be in it's own CR_HDR_STRING >> blocks). >> >> Wouldn't it be easier to use stack-local storage than balancing the >> cr_hbuf_get/put routines? > > I've asked the same question, so I'll give you Oren's response that I > remember: > > cr_hbuf_get/put() are more of an API that we can use later. For now, > those buffers really are temporary. But, in a case where we want to do > a really fast checkpoint (to reduce "downtime" during the checkpoint) we > store the image entirely in kernel memory to be written out later. > Hmm, if I'm understanding you correctly, adding ref counts explicitly (like you suggest below) would be used to let a lower layer defer writes. Seems like this could be just as easily done with explicits kmallocs and transferring ownership of the allocated memory to the in-kernel representation handling layer below (which in turn queues the data structures for writes). Any such layer would probably need to hold references to objects enqueued for write-out, so they will still a full cleanup path in case of success/error/abort (which means that any advantage of creating a pool of allocations for O(1) cleanup disappears). Reference counting these guys doesn't have a clear advantage to me. They seem to have a pretty linear lifetime. > In that case, cr_hbuf_put() stops doing anything at all because we keep > the memory around. > > cr_hbuf_get() becomes, "I need some memory to write some checkpointy > things into". > > cr_hbuf_put() becomes, "I'm done with this for now, only keep it if > someone else needs it." > > This might all be a lot clearer if we just kept some more explicit > accounting around about who is using the objects. Something like: > > struct cr_buf { > struct kref ref; > int size; > char buf[0]; > }; > > /* replaces cr_hbuf_get() */ > struct cr_buf *alloc_cr_buf(int size, gfp_t flags) > { > struct cr_buf *buf; > > buf = kmalloc(sizeof(cr_buf) + size, flags); > if (!buf) > return NULL; > buf->ref = 1; /* or whatever */ > buf->size = size; > return buf; > } > > int cr_kwrite(struct cr_buf *buf) > { > if (writing_checkpoint_now) { > // or whatever this write call was... > vfs_write(&buf->buf[0], buf->size); > } else if (deferring_write) { > kref_get(buf->kref); > } > } > > -- Dave > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/