Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753551AbdFMP0D (ORCPT ); Tue, 13 Jun 2017 11:26:03 -0400 Received: from kanga.kvack.org ([205.233.56.17]:55905 "EHLO kanga.kvack.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752987AbdFMP0C (ORCPT ); Tue, 13 Jun 2017 11:26:02 -0400 Date: Tue, 13 Jun 2017 11:26:01 -0400 From: Benjamin LaHaise To: Kirill Tkhai Cc: avagin@openvz.org, linux-kernel@vger.kernel.org, viro@zeniv.linux.org.uk, gorcunov@openvz.org, akpm@linux-foundation.org, xemul@virtuozzo.com Subject: Re: [PATCH] aio: Add command to wait completion of all requests Message-ID: <20170613152601.GM7230@kvack.org> References: <149700173837.15252.8419518498235874341.stgit@localhost.localdomain> <20170613144248.GK7230@kvack.org> <4637a41e-b9af-a685-f4af-f3ca2a287a1e@virtuozzo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4637a41e-b9af-a685-f4af-f3ca2a287a1e@virtuozzo.com> User-Agent: Mutt/1.4.2.2i Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2903 Lines: 57 On Tue, Jun 13, 2017 at 06:11:03PM +0300, Kirill Tkhai wrote: ... > The functionality, I did, grew from real need and experience. We try to > avoid kernel modification, where it's possible, but the in-flight aio > requests is not a case suitable for that. What you've done only works for *your* use-case, but not in general. Like in other subsystems, you need to provide hooks on a per file descriptor basis for quiescing different kinds of file descriptors. Your current patch set completely ignores things like usb gadget. You need to create infrastructure for restarting i/os after your checkpointing occurs, which you haven't put any thought into in this patchset. If you want to discuss how to do that, fine, but the approach in this patchset simply does not work in general. What happens when an aio doesn't complete or takes hours to complete? > The checkpointing of live system is not easy as it seems. The way, > you suggested, makes impossible the whole class of doing snapshots > of the life system. You can't just kill a process, wait for zombie, > and then restart the process again: processes are connected in difficult > topologies of essences. You need to restore pgid and sid of the process, > the namespaces, shared files (CLONE_FILES and CLONE_FS). Everything > of this requires to be created in the certain order, and there is a > lot of rules and limitations. You can't just create the same process > in the same place: it's not easy, and it's just impossible. Your > suggestion kills the big class of use cases, and it's not suitable > in any way. You may refer to criu project site if you are interested > (criu.org). Point. > Benjamin, please, could you check this once again? We really need > this functionality, it's not empty desire. Lets speak about the > way we should implement it, if you don't like the patch. > > There are many functionality in kernel to support the concept > I described. Check out MSG_PEEK flag for receiving from socket > (see unix_dgram_recvmsg()), for example. AIO now is one of the > last barriers of full support of snapshots in criu. ... Then please start looking at the big picture and think about things other than short lived disk i/o. Without some design in the infrastructure to handle those cases, your solution is incomplete and will potentially leave us with complex and unsupportable semantics that don't actually solve the problem you're trying to solve. Some of the things to think about: you need infrastructure to restart an aio, which means you need some way of dumping aios that remain in flight, as otherwise your application will see aios cancelled during checkpointing that should no have been. You need to actually cancel aios. These details need to be addressed if checkpointing is going to be a robust feature that works for other than toy use-cases. -ben -- "Thought is the essence of where you are now."