Received: by 2002:ac0:a874:0:0:0:0:0 with SMTP id c49csp628501ima; Fri, 15 Mar 2019 10:19:27 -0700 (PDT) X-Google-Smtp-Source: APXvYqyaI7wJMq4Hjrp6Kv2+nN5vptBGf8w6xlr5qisbn8yMwfLd/p8BDd0hHuMJMXRwf766lZgn X-Received: by 2002:aa7:90c1:: with SMTP id k1mr5115768pfk.202.1552670367196; Fri, 15 Mar 2019 10:19:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552670367; cv=none; d=google.com; s=arc-20160816; b=pH45JGs3ZF8GRppxSplvVYukjGSRGVB6BaXn4woPfthcecPxaGYMNGLQE5FgYkY4O4 Qyya3F7SrlFs/okr3Z5ItnHlD70MZGrRo5Zyc35q6nvjDqNuLDQOd4feY5d8Izt72BnO SwtUuRMNMzBw6tieMt+qY861zm3rwBVgpKnSC1NFDJZOtqEcw3BI65XtrUIBIReNbldi FhktKEkgldSi/aGF1d6asrr2Be4EDoytFpKb6r9xortUw499rqrcCuTP9n9JfIuU6iAf Q6pa80v/SYQdAaGO/zT0VondO8fHb8jia9droy1ZktCtKpqPo8jaLMB39AFQcanbeU+C 2GSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=rnoznC+t5dzyGLgzfOQVo2l5wURP9NE8DM3DcfltKV4=; b=kG+N7/vLQveLmnZAEBQ7wmQSfhdlNuzXNl2HXwsydZ23qAEOlhrFQLY6PMvyUd6S4e Fbfzpm1IyY4QC4DDlFMgaO3ImvROUAC88mqQt+4smhh9r7tdoM1rSvZCECjNZyJB6b21 GjvekNishspYJInMIbJARpUk8pimLb4Sp0VA78IX8akOTOZcsBSZmzz3QQVCSS3EifaY yoLXRaLB+FDpL+jbIRLToJDRF6feeJsEvyLbatM81nhHp3SQzvUwTkkr/PoIkuYYO0k3 HwiBg1uJqcWJLYj7hQ8kpq1dPDUF0iT8xRtl08sgek0WPc3JMatOEm7bSSky804GQY82 2Hsw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=gxhQwXAO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d14si2234183pll.86.2019.03.15.10.19.11; Fri, 15 Mar 2019 10:19:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=gxhQwXAO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729817AbfCORSD (ORCPT + 99 others); Fri, 15 Mar 2019 13:18:03 -0400 Received: from mail-vk1-f193.google.com ([209.85.221.193]:44857 "EHLO mail-vk1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729602AbfCORSC (ORCPT ); Fri, 15 Mar 2019 13:18:02 -0400 Received: by mail-vk1-f193.google.com with SMTP id q189so2371057vkq.11 for ; Fri, 15 Mar 2019 10:18:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=rnoznC+t5dzyGLgzfOQVo2l5wURP9NE8DM3DcfltKV4=; b=gxhQwXAOoftHpXlUgejl7DE+58vlxEnTn2okDInr44WrKPy2T30fSxzRoS0ZLhfu2K RZH4Qo2nRyNMfhapCPsjmz9MLiLkAaI63UrapdCTtXtsQXyOs0mVu2eAhFCv/tbh12p/ kJK1p5d7nqDtXmHIOVNBB148pTsCizZTW2O2pop8/3cJe6ExN1RyG2CGM2UsPQ+caeqV CDUqwA/RpdRqNVSd3bR4wMWv4/jbaEWOd1ATWzqUTvWPyULNhbV2f+yHmkjUkl+rYYzo HbeOv2CxytoRfrpqFpsjCr/TYWPvjAdAGOLdGwfEQZP6EQRKo0eSFOGF9jf6DeFlm05Q lRug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=rnoznC+t5dzyGLgzfOQVo2l5wURP9NE8DM3DcfltKV4=; b=d7MPxg7gSmielsTJY73fmc+EbzjjKfGeb4oJH5kCObwWP/SvU5+QZxSpt6uvvEdx0F Se9MmPwqBELossw23i6Wa0PbDsDSc3zfeBBMzqtognzsIfMgS3zt268Vzy3zZrgf8q1I aqdXvwQNkaNPvZuR0iR1TbbToQTgA/9A2eIhwpGvKOTavWiVQcGySknx2/SiplERWFUw +c3AtV3ufShtLUahyhMtNvf8jogjYsNWGjtowVIeFshW9Voz+d+V/lzXN7NOOYRyansV zF7DeXL4JX6ORUFuHYtxtDuGhT8+N7wa3REVFToxiN2u1/teZ4/8pEhcf/iAugDDvPTo 8TKQ== X-Gm-Message-State: APjAAAUnz1vW1lPuPGt9NtPMLvw+dT7AkzDJP4a2PQHnt6fFCIWoUfwg gxif37RYS1S/zxsYKwOxhFb9NgX64GV+btr6rB6zAg== X-Received: by 2002:a1f:2dc7:: with SMTP id t190mr2543104vkt.55.1552670281373; Fri, 15 Mar 2019 10:18:01 -0700 (PDT) MIME-Version: 1.0 References: <20190310203403.27915-1-sultan@kerneltoast.com> <20190311174320.GC5721@dhcp22.suse.cz> <20190311175800.GA5522@sultan-box.localdomain> <20190311204626.GA3119@sultan-box.localdomain> <20190312080532.GE5721@dhcp22.suse.cz> <20190312163741.GA2762@sultan-box.localdomain> <20190314204911.GA875@sultan-box.localdomain> <20190314231641.5a37932b@oasis.local.home> <20190315124348.528ecd87@gandalf.local.home> In-Reply-To: <20190315124348.528ecd87@gandalf.local.home> From: Daniel Colascione Date: Fri, 15 Mar 2019 10:17:49 -0700 Message-ID: Subject: Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android To: Steven Rostedt Cc: Sultan Alsawaf , Joel Fernandes , Tim Murray , Michal Hocko , Suren Baghdasaryan , Greg Kroah-Hartman , =?UTF-8?B?QXJ2ZSBIasO4bm5ldsOlZw==?= , Todd Kjos , Martijn Coenen , Christian Brauner , Ingo Molnar , Peter Zijlstra , LKML , "open list:ANDROID DRIVERS" , linux-mm , kernel-team Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 15, 2019 at 9:43 AM Steven Rostedt wrote: > > On Thu, 14 Mar 2019 21:36:43 -0700 > Daniel Colascione wrote: > > > On Thu, Mar 14, 2019 at 8:16 PM Steven Rostedt wrote: > > > > > > On Thu, 14 Mar 2019 13:49:11 -0700 > > > Sultan Alsawaf wrote: > > > > > > > Perhaps I'm missing something, but if you want to know when a process has died > > > > after sending a SIGKILL to it, then why not just make the SIGKILL optionally > > > > block until the process has died completely? It'd be rather trivial to just > > > > store a pointer to an onstack completion inside the victim process' task_struct, > > > > and then complete it in free_task(). > > > > > > How would you implement such a method in userspace? kill() doesn't take > > > any parameters but the pid of the process you want to send a signal to, > > > and the signal to send. This would require a new system call, and be > > > quite a bit of work. > > > > That's what the pidfd work is for. Please read the original threads > > about the motivation and design of that facility. > > I wasn't Cc'd on the original work, so I haven't read them. > > > > > > If you can solve this with an ebpf program, I > > > strongly suggest you do that instead. > > > > > > > We do want killed processes to die promptly. That's why I support > > boosting a process's priority somehow when lmkd is about to kill it. > > The precise way in which we do that --- involving not only actual > > priority, but scheduler knobs, cgroup assignment, core affinity, and > > so on --- is a complex topic best left to userspace. lmkd already has > > all the knobs it needs to implement whatever priority boosting policy > > it wants. > > > > Hell, once we add a pidfd_wait --- which I plan to work on, assuming > > nobody beats me to it, after pidfd_send_signal lands --- you can > > imagine a general-purpose priority inheritance mechanism expediting > > process death when a high-priority process waits on a pidfd_wait > > handle for a condemned process. You know you're on the right track > > design-wise when you start seeing this kind of elegant constructive > > interference between seemingly-unrelated features. What we don't need > > is some kind of blocking SIGKILL alternative or backdoor event > > delivery system. > > > > We definitely don't want to have to wait for a process's parent to > > reap it. Instead, we want to wait for it to become a zombie. That's > > why I designed my original exithand patch to fire death notification > > upon transition to the zombie state, not upon process table removal, > > and I expect pidfd_wait (or whatever we call it) to act the same way. > > > > In any case, there's a clear path forward here --- general-purpose, > > cheap, and elegant --- and we should just focus on doing that instead > > of more complex proposals with few advantages. > > If you add new pidfd systemcalls then making a new way to send a signal > and block till it does die or whatever is Right. And we shouldn't couple the killing and the waiting: while we now have a good race-free way to kill processes using pidfd_send_signal, but we still have no good facility for waiting for the death of a process that isn't a child of the waiter. Any kind of unified "kill and wait for death" primitive precludes the killing thread waiting for things other than death at the same time! Instead, if we allow waiting for an arbitrary process's death using general-purpose wait primitives like select/poll/epoll/io_submit/etc., then synchronous killing becomes just another sleep that composes in useful and predictable ways. > more acceptable than adding a > new signal that changes the semantics of sending signals, which is what > I was against. Agreed. Even if it were possible to easily add signals without breaking everyone, a special kind of signal with delivery semantics different from those of existing signals is a bad idea, and not really a signal at all, but just a new system call in disguise. > I do agree with Joel about bloating task_struct too. If anything, have > a wait queue you add, where you can allocate a descriptor with the task > dieing and task killing, and just search this queue on dying. We could > add a TIF flag to the task as well to let the exiting of this task know > it should do such an operation. That's my basic plan. I think we need one link from struct signal or something so we don't end up doing some kind of *global* search on process death, but let's see how it goes.