Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp179753pxv; Wed, 7 Jul 2021 23:40:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwEwd0Jgyl4E7QNNeo+gZU+UXIvC6Wf5GZY9E36y0fQ6BI5btFGGhd5eg5TBr6Ys3yyT5mr X-Received: by 2002:a05:6602:1544:: with SMTP id h4mr2737669iow.76.1625726440652; Wed, 07 Jul 2021 23:40:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625726440; cv=none; d=google.com; s=arc-20160816; b=b8bO1pK64jiK9ekvOD6yNB6zzL7rrfZrQWdt8tQrEtzf9msN86Um4DCqznkfsI87lG fm0plxzJ3+Z3ssMQs/mK8lsFVJ2WjkrFIq6WbD/w7XxuTujc9ytwBnEepoYujnhh4ewe FfY0hpM2Ng4EcIfRIzP+3Wa/vrSFrJFP2kewwCZngJCej8qhMKFMTsjoaa6p5rZ/XpKJ CZkJPEHPss5tI2HSLr0PqBE5n8M2PNwKcx9IPQJFq9c+BAQz4O4hvjzdSXOs56wRok3n rn7+0uZ2NkHzjvaXgrF6cHV2gMmYU10ca7JttZZtZ3ReCiwIyAcLNRT2Ryw98Ib84uTF My7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=27in3H3NG/Vl86S0tsNdE2x7qDRib9u0Z+T9F2j/CEQ=; b=CPwEWk6vzzZtZw/9q7yuhtmoEe9PEVIl0rVwFHe6si6ni1k8/ENopsgX1kUAO+vG/5 MOHUrpQRHGowiIl768R0C40BoIGeib7ec66jLrFzkAuaVtRhHgQSNnS98BNQin/Q2LCJ DBG9RTlKqhqKncBYsl/2Y/S4tflC5mPBEn89Lth2a1qVBKdM13G1PjoU4eZragcAZcA7 SSW6ira55YCMRmpMocablZVStutBLeVdaVKQ3gPigVt0icTzgQXaNXz2n4SFNN35XD8m ShtLKj7m7iO0OEJkSFupEFNKsVN6dwewa4T2FKzej2+hWiZrGa7WWTqbxCbF3xjEVkTH HORQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Q8SVvr1j; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t3si1638595ile.109.2021.07.07.23.40.28; Wed, 07 Jul 2021 23:40:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Q8SVvr1j; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229832AbhGHGm3 (ORCPT + 99 others); Thu, 8 Jul 2021 02:42:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229654AbhGHGm2 (ORCPT ); Thu, 8 Jul 2021 02:42:28 -0400 Received: from mail-yb1-xb2a.google.com (mail-yb1-xb2a.google.com [IPv6:2607:f8b0:4864:20::b2a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1F78C061574 for ; Wed, 7 Jul 2021 23:39:46 -0700 (PDT) Received: by mail-yb1-xb2a.google.com with SMTP id k184so7227384ybf.12 for ; Wed, 07 Jul 2021 23:39:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=27in3H3NG/Vl86S0tsNdE2x7qDRib9u0Z+T9F2j/CEQ=; b=Q8SVvr1jTKOq3j1BCslRCPLWmaQMXyTeDdBuyUrjxBV79g3niuwJsDTUD9ZUUaHJ0s CmDeMucSyAQQbZ+j+xfAsykJJ8181J+14rDNbhmGr15mWq/jW0X4JmmPGq0ampUTCFLj Ih+mINskcgfLLx4uGScvhz3uXISFAHbElucLnlFNBHUtgsdR0E6XhwibmnKO4OTFmRkV efPB+obV4e8RgQ32aCvcY3hJ7ndU45iGFw+PGvbVEaIq3vSFen6pNbrsoOgfi0UqzvfW sqyAwULvJamzlS1Cj6xjhtRamBFLl80h2AQTm2OBtFA+kI3OxMaiY20J/kelIl8MGSJu zkGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=27in3H3NG/Vl86S0tsNdE2x7qDRib9u0Z+T9F2j/CEQ=; b=HeChCU0j3BoiRAeqPjDYLyLJPR9WLxh75Wl+hsxLYiiN7yeRjDs2XdghnhxUXhlDcZ PCPJdFnyGOrIpNDn4I25wd8tjK3DpJYfU+io2UDxWnk84dz+fSGAtxwh45LKO41awh5O J0SMTX5/MywTYvnIgvg4sj6U9Vooihwqt+X/cCZUciufvNCkhLy2ZkgZe+ZFzHsZwzFp qRAlMI6AIHFADhyqY42fTTwXuEjS/7meIXLIGISDkQgW8ii/yEKFz40V29fe6ZHkXrRb RILp1ng202RsQME2KgS47Vw3k64C5P+ExkL7eBTie94rKT7L6UQ0rFKe1enkipQTh4TW Pt6g== X-Gm-Message-State: AOAM5304c4UNsNa8vRwcfyxqg4R8hc4KVc+s4mh/ZESwFFb20gTUc5sZ rGqAHm+7kfQhXS5Eni0myznQEGub3JjfZL/Tt4TnIQ== X-Received: by 2002:a25:4102:: with SMTP id o2mr35358941yba.23.1625726385748; Wed, 07 Jul 2021 23:39:45 -0700 (PDT) MIME-Version: 1.0 References: <20210623192822.3072029-1-surenb@google.com> <87sg0qa22l.fsf@oldenburg.str.redhat.com> <87wnq1z7kl.fsf@oldenburg.str.redhat.com> <87zguxxrfl.fsf@oldenburg.str.redhat.com> In-Reply-To: <87zguxxrfl.fsf@oldenburg.str.redhat.com> From: Suren Baghdasaryan Date: Wed, 7 Jul 2021 23:39:34 -0700 Message-ID: Subject: Re: [PATCH 1/1] mm: introduce process_reap system call To: Florian Weimer Cc: Andrew Morton , Michal Hocko , Michal Hocko , David Rientjes , Matthew Wilcox , Johannes Weiner , Roman Gushchin , Rik van Riel , Minchan Kim , Christian Brauner , Christoph Hellwig , Oleg Nesterov , David Hildenbrand , Jann Horn , Shakeel Butt , Tim Murray , Linux API , linux-mm , LKML , kernel-team Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 7, 2021 at 11:15 PM Florian Weimer wrote: > > * Suren Baghdasaryan: > > > On Wed, Jul 7, 2021 at 10:41 PM Florian Weimer wro= te: > >> > >> * Suren Baghdasaryan: > >> > >> > On Wed, Jul 7, 2021 at 2:47 AM Florian Weimer w= rote: > >> >> > >> >> * Suren Baghdasaryan: > >> >> > >> >> > The API is as follows, > >> >> > > >> >> > int process_reap(int pidfd, unsigned int flags); > >> >> > > >> >> > DESCRIPTION > >> >> > The process_reap() system call is used to free the memo= ry of a > >> >> > dying process. > >> >> > > >> >> > The pidfd selects the process referred to by the PID fi= le > >> >> > descriptor. > >> >> > (See pidofd_open(2) for further information) > >> >> > > >> >> > The flags argument is reserved for future use; currentl= y, this > >> >> > argument must be specified as 0. > >> >> > > >> >> > RETURN VALUE > >> >> > On success, process_reap() returns 0. On error, -1 is r= eturned > >> >> > and errno is set to indicate the error. > >> >> > >> >> I think the manual page should mention what it means for a process = to be > >> >> =E2=80=9Cdying=E2=80=9D, and how to move a process to this state. > >> > > >> > Thanks for the suggestion, Florian! Would replacing "dying process" > >> > with "process which was sent a SIGKILL signal" be sufficient? > >> > >> That explains very clearly the requirement, but it raises the question > >> why this isn't an si_code flag for rt_sigqueueinfo, reusing the existi= ng > >> system call. > > > > I think you are suggesting to use sigqueue() to deliver the signal and > > perform the reaping when a special value accompanies it. This would be > > somewhat similar to my early suggestion to use a flag in > > pidfd_send_signal() (see: > > https://lore.kernel.org/patchwork/patch/1060407) to implement memory > > reaping which has another advantage of operation on PIDFDs instead of > > PIDs which can be recycled. > > kill()/pidfd_send_signal()/sigqueue() are supposed to deliver the > > signal and return without blocking. Changing that behavior was > > considered unacceptable in these discussions. > > Does this mean that you need two threads, one that sends SIGKILL, and > one that calls process_reap? Given that sending SIGKILL is blocking > with the existing interfaces? Sending SIGKILL is blocking in terms of delivering the signal, but it does not block waiting for SIGKILL to be processed by the signal recipient and memory to be released. When I was talking about "blocking", I meant that current kill() and friends do not block to wait for SIGKILL to be processed. process_reap() will block until the memory is released. Whether the userspace caller is using it right after sending a SIGKILL to reclaim the memory synchronously or spawns a separate thread to reclaim memory asynchronously is up to the user. Both patterns are supported. > Please also note that asynchronous deallocation of resources leads to > bugs and can cause unrelated workloads to fail. For example, in some > configurations, clone can fail with EAGAIN even in cases where the total > number of tasks is clearly bounded because the kernel signals task exit > to applications before all resources are deallocated. I'm worried that > the new interface makes things quite a bit worse in this regard. The process_reap() releases memory synchronously, no kthreads are being used. If asynchronous release is required, the userspace would need to spawn a userspace thread and issue this syscall from it. I hope this clears your concerns, which I think are about asynchronous deallocations within the kernel. Thanks! > > Thanks, > Florian >