Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D1F6C433F5 for ; Thu, 11 Nov 2021 19:20:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 25C336134F for ; Thu, 11 Nov 2021 19:20:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234453AbhKKTXN (ORCPT ); Thu, 11 Nov 2021 14:23:13 -0500 Received: from out02.mta.xmission.com ([166.70.13.232]:32866 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231919AbhKKTXL (ORCPT ); Thu, 11 Nov 2021 14:23:11 -0500 Received: from in02.mta.xmission.com ([166.70.13.52]:35046) by out02.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1mlFcS-00G5IZ-0h; Thu, 11 Nov 2021 12:20:20 -0700 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95]:57210 helo=email.froward.int.ebiederm.org.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1mlFcQ-00AYdr-VM; Thu, 11 Nov 2021 12:20:19 -0700 From: ebiederm@xmission.com (Eric W. Biederman) To: Claudio Imbrenda Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, thuth@redhat.com, frankja@linux.ibm.com, borntraeger@de.ibm.com, Ulrich.Weigand@de.ibm.com, heiko.carstens@de.ibm.com, david@redhat.com, ultrachin@163.com, akpm@linux-foundation.org, vbabka@suse.cz, brookxu.cn@gmail.com, xiaoggchen@tencent.com, linuszeng@tencent.com, yihuilu@tencent.com, mhocko@suse.com, daniel.m.jordan@oracle.com, axboe@kernel.dk, legion@kernel.org, peterz@infradead.org, aarcange@redhat.com, christian@brauner.io, tglx@linutronix.de References: <20211111095008.264412-1-imbrenda@linux.ibm.com> <20211111095008.264412-4-imbrenda@linux.ibm.com> Date: Thu, 11 Nov 2021 13:20:11 -0600 In-Reply-To: <20211111095008.264412-4-imbrenda@linux.ibm.com> (Claudio Imbrenda's message of "Thu, 11 Nov 2021 10:50:06 +0100") Message-ID: <874k8ixzx0.fsf@email.froward.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1mlFcQ-00AYdr-VM;;;mid=<874k8ixzx0.fsf@email.froward.int.ebiederm.org>;;;hst=in02.mta.xmission.com;;;ip=68.227.160.95;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+bxi+7685Pl1xvVwEi+cW0dk+MrWm70bM= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [RFC v1 2/4] kernel/fork.c: implement new process_mmput_async syscall X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Claudio Imbrenda writes: > The goal of this new syscall is to be able to asynchronously free the > mm of a dying process. This is especially useful for processes that use > huge amounts of memory (e.g. databases or KVM guests). The process is > allowed to terminate immediately, while its mm is cleaned/reclaimed > asynchronously. > > A separate process needs use the process_mmput_async syscall to attach > itself to the mm of a running target process. The process will then > sleep until the last user of the target mm has gone. > > When the last user of the mm has gone, instead of synchronously free > the mm, the attached process is awoken. The syscall will then continue > and clean up the target mm. > > This solution has the advantage that the cleanup of the target mm can > happen both be asynchronous and properly accounted for (e.g. cgroups). > > Tested on s390x. > > A separate patch will actually wire up the syscall. I am a bit confused. You want the process report that it has finished immediately, and you want the cleanup work to continue on in the background. Why do you need a separate process? Why not just modify the process cleanup code to keep the task_struct running while allowing waitpid to reap the process (aka allowing release_task to run)? All tasks can be already be reaped after exit_notify in do_exit. I can see some reasons for wanting an opt-in. It is nice to know all of a processes resources have been freed when waitpid succeeds. Still I don't see why this whole thing isn't exit_mm returning the mm_sturct when a flag is set, and then having an exit_mm_late being called and passed the returned mm after exit_notify. Or maybe something with schedule_work or task_work, instead of an exit_mm_late. I don't see any practical difference. I really don't see why this needs a whole other process to connect to the process you care about asynchronously. This whole thing seems an exercise in spending lots of resources to free resources much later. Eric