Received: by 2002:ac0:98c7:0:0:0:0:0 with SMTP id g7-v6csp3698212imd; Mon, 29 Oct 2018 10:56:11 -0700 (PDT) X-Google-Smtp-Source: AJdET5cWN6C7akhqzkjqeff3QpacjSsKPoAMeHFVKT8s6hHdcfvrozHfhGh/I9kQaRu1jNeOQwOA X-Received: by 2002:a17:902:8648:: with SMTP id y8-v6mr15593468plt.335.1540835771876; Mon, 29 Oct 2018 10:56:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540835771; cv=none; d=google.com; s=arc-20160816; b=TczatbJWUVN6kowGiD9cejl6Nm4UXB4oyX/55rQE7GKweaqz2+ROFmuZOdM93xksik 1Han0o1cjcmFmfGBM5Zb+aYo060YsixCoLI4O55czAa3aY8LVqdjWYwxiSX4rS9XP2P6 LGK8lL55Fc3cbAEcFeo7BAJK9Yql5P3koPFbJ0UFIM1svEZGyaUtHMssMcpgjDzRKLXo I7VN0fQ43hn4Avwx0jvziZyKk1vkHrtiaIikeaBM2mccy2kZvWTmj8j9YO8sPkYq2Kqo yDHbG5e7Zwql8OU0c9kGRc7CUFr4lafa6nxilPRotng6wWBkuTu3f+UKvmewHW3Dov83 6g9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:from:subject:mime-version :message-id:date:dkim-signature; bh=2A2AdDpJbd2uH09MtJnMiQzbXF9iaCgYTjowV4Fml04=; b=mVnFr5EbREiry5JJ7IlkAN/F4IHSicBAjAINlJ9EH8EQBVp9m94ioylpdWrYeVyV7Y 91zltTEw9b2V6B8kV75YUs+nYxYtA+Wh82K38Hr/dYsQk2MwtlzXppHb7oaU1Hn0lBKJ CYYRADuWvHL9s1Rjk/SkOoENbGEGAvZ1wGt2+qPqCvkCrq8FMygcyKkTHamOq5AXwGt2 QBkl5OGaPCZUGxgH4MBZOcGrN500XuM2SNKCPGgQborQWB2gtH9AsJRZr6WeQ77y+s6Y vav22tDeKnqIHPIaFhRBbA01xHXhi8QIz5+vIwFShwQGdDhLbwL0lAipB/Qulr3PU+k4 g9Pw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=eMONfDRj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k9-v6si20618292plt.144.2018.10.29.10.55.56; Mon, 29 Oct 2018 10:56:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=eMONfDRj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728233AbeJ3CnG (ORCPT + 99 others); Mon, 29 Oct 2018 22:43:06 -0400 Received: from mail-qt1-f201.google.com ([209.85.160.201]:51546 "EHLO mail-qt1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726532AbeJ3CnG (ORCPT ); Mon, 29 Oct 2018 22:43:06 -0400 Received: by mail-qt1-f201.google.com with SMTP id h4-v6so6760044qtc.18 for ; Mon, 29 Oct 2018 10:53:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=2A2AdDpJbd2uH09MtJnMiQzbXF9iaCgYTjowV4Fml04=; b=eMONfDRjBueGlbdjY0ODA3j12cuutQzOIpj6/6ihAhEa+S3JyeMXEd36t8Yny+qJk4 DoVlJgsJjGtUfLcbrcmFBmmW+ekUtE6X2o+ZFdc+IgSy4Z7zQ1vfbJtnv4o4zYSiX47g AMOdF8QtQyCTKE8FzMZYzVQaYvbEBzbbww3QH/aW6iwnzVi6mcAcrqw1mU4FT40FhCu1 ftf7a41zXyPphDC/8bbkhbhrqtc6AoM4l0X+wKeKZuonGZY+sC7aY22967FyJofQNIBq 60XcK7XnylzaTWTQGsVilEewa6yhlO4+qOLjd6niON6VXdsO5uxz92y3WOstx4h1xQys 7Jbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=2A2AdDpJbd2uH09MtJnMiQzbXF9iaCgYTjowV4Fml04=; b=WxkiUD+LSzvf3kWLfd59MouP1Cl8v07TCbqjOxTu2ep9M6Sy8Q9hnczSastmy8jOXB 8bxpdxsQaTTbnHP3cp8RF6v2JqWXTQDRRaY3ZD/ct9GsOg9T187oeIGmBuKIK0rtMoji w/ISmN3EMJhtVvMgtqSCs9w1zENoltHewsxpxpBz1ubIbcvkEijnKRGjl/M8uRgGKe77 rM2dz/ylXp/HGlFZVxyRNGKBxcIFaicvZyJaaJplko+z5lP94Z9tPCggCHpenBUlwt9R lO/nKOsqqumHFADwUmNIcSq3dhB094oKR/ssiylo1R1cDM/vgNfjUilGGasE53JD43jX 5IWA== X-Gm-Message-State: AGRZ1gKl1rFvuW1boDtZtG9HjDAf/GyhkrevlGawNWyqz46/nOEYyyd0 orAxTnz8tdfnuXKLKINZw4vWX63ywnnwepjEdmI6w0G7wOckdWR9xNvQlFoqA3wZUJyI5NwS5H9 qhGSEgqhKyBMnwZMEtqVSWVxfpyxAEltbuOYNd0KVX22I8l6pnIHCAeiLeSGYV9a3ViwKMg== X-Received: by 2002:ac8:3497:: with SMTP id w23mr480873qtb.21.1540835606061; Mon, 29 Oct 2018 10:53:26 -0700 (PDT) Date: Mon, 29 Oct 2018 17:53:22 +0000 Message-Id: <20181029175322.189042-1-dancol@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.19.1.568.g152ad8e336-goog Subject: [RFC PATCH] Minimal non-child process exit notification support From: Daniel Colascione To: linux-kernel@vger.kernel.org Cc: timmurray@google.com, joelaf@google.com, Daniel Colascione Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch adds a new file under /proc/pid, /proc/pid/exithand. Attempting to read from an exithand file will block until the corresponding process exits, at which point the read will successfully complete with EOF. The file descriptor supports both blocking operations and poll(2). It's intended to be a minimal interface for allowing a program to wait for the exit of a process that is not one of its children. Why might we want this interface? Android's lmkd kills processes in order to free memory in response to various memory pressure signals. It's desirable to wait until a killed process actually exits before moving on (if needed) to killing the next process. Since the processes that lmkd kills are not lmkd's children, lmkd currently lacks a way to wait for a proces to actually die after being sent SIGKILL; today, lmkd resorts to polling the proc filesystem pid entry. This interface allow lmkd to give up polling and instead block and wait for process death. Signed-off-by: Daniel Colascione --- fs/proc/Makefile | 1 + fs/proc/base.c | 1 + fs/proc/exithand.c | 117 +++++++++++++++++++++++++++++++++++ fs/proc/internal.h | 4 ++ include/linux/sched/signal.h | 7 +++ kernel/exit.c | 2 + kernel/signal.c | 3 + 7 files changed, 135 insertions(+) create mode 100644 fs/proc/exithand.c diff --git a/fs/proc/Makefile b/fs/proc/Makefile index ead487e80510..21322280a2c1 100644 --- a/fs/proc/Makefile +++ b/fs/proc/Makefile @@ -27,6 +27,7 @@ proc-y += softirqs.o proc-y += namespaces.o proc-y += self.o proc-y += thread_self.o +proc-y += exithand.o proc-$(CONFIG_PROC_SYSCTL) += proc_sysctl.o proc-$(CONFIG_NET) += proc_net.o proc-$(CONFIG_PROC_KCORE) += kcore.o diff --git a/fs/proc/base.c b/fs/proc/base.c index 7e9f07bf260d..31bc6bbb6dc4 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -3006,6 +3006,7 @@ static const struct pid_entry tgid_base_stuff[] = { #ifdef CONFIG_LIVEPATCH ONE("patch_state", S_IRUSR, proc_pid_patch_state), #endif + REG("exithand", S_IRUGO, proc_tgid_exithand_operations), }; static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx) diff --git a/fs/proc/exithand.c b/fs/proc/exithand.c new file mode 100644 index 000000000000..358b08da6a08 --- /dev/null +++ b/fs/proc/exithand.c @@ -0,0 +1,117 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Synchronous exit notification of non-child processes + * + * Simple file descriptor /proc/pid/exithand. Read blocks (and poll + * reports non-readable) until process either dies or becomes + * a zombie. + */ +#include +#include +#include +#include "internal.h" + +static int proc_tgid_exithand_open(struct inode *inode, struct file *file) +{ + struct task_struct* task = get_proc_task(inode); + /* If get_proc_task failed, it means the task is dead, which + * is fine, since a subsequent read will return + * immediately. */ + if (task && !thread_group_leader(task)) + return -EINVAL; + return 0; +} + +static ssize_t proc_tgid_exithand_read(struct file * file, + char __user * buf, + size_t count, loff_t *ppos) +{ + struct task_struct* task = NULL; + wait_queue_entry_t wait; + ssize_t res = 0; + bool locked = false; + + for (;;) { + /* Retrieve the task from the struct pid each time + * through the loop in case the exact struct task + * changes underneath us (e.g., if in exec.c, the + * execing process kills the group leader and starts + * using its PID). The struct signal should be the + * same though even in this case. + */ + task = get_proc_task(file_inode(file)); + res = 0; + if (!task) + goto out; /* No task? Must have died. */ + + BUG_ON(!thread_group_leader(task)); + + /* Synchronizes with exit.c machinery. */ + read_lock(&tasklist_lock); + locked = true; + + res = 0; + if (task->exit_state) + goto out; + + res = -EAGAIN; + if (file->f_flags & O_NONBLOCK) + goto out; + + /* Tell exit.c to go to the trouble of waking our + * runqueue when this process gets around to + * exiting. */ + task->signal->exithand_is_interested = true; + + /* Even if the task identity changes, task->signal + * should be invariant across the wait, making it safe + * to go remove our wait record from the wait queue + * after we come back from schedule. */ + + init_waitqueue_entry(&wait, current); + add_wait_queue(&wait_exithand, &wait); + + read_unlock(&tasklist_lock); + locked = false; + + put_task_struct(task); + task = NULL; + + set_current_state(TASK_INTERRUPTIBLE); + schedule(); + set_current_state(TASK_RUNNING); + remove_wait_queue(&wait_exithand, &wait); + + res = -ERESTARTSYS; + if (signal_pending(current)) + goto out; + } +out: + if (locked) + read_unlock(&tasklist_lock); + if (task) + put_task_struct(task); + return res; +} + +static __poll_t proc_tgid_exithand_poll(struct file *file, poll_table *wait) +{ + __poll_t mask = 0; + struct task_struct* task = get_proc_task(file_inode(file)); + if (!task) { + mask |= POLLIN; + } else if (READ_ONCE(task->exit_state)) { + mask |= POLLIN; + } else { + read_lock(&tasklist_lock); + task->signal->exithand_is_interested = true; + read_unlock(&tasklist_lock); + poll_wait(file, &wait_exithand, wait); + } + return mask; +} + +const struct file_operations proc_tgid_exithand_operations = { + .open = proc_tgid_exithand_open, + .read = proc_tgid_exithand_read, + .poll = proc_tgid_exithand_poll, +}; diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 5185d7f6a51e..1009d20475bc 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -304,3 +304,7 @@ extern unsigned long task_statm(struct mm_struct *, unsigned long *, unsigned long *, unsigned long *, unsigned long *); extern void task_mem(struct seq_file *, struct mm_struct *); + +/* exithand.c */ + +extern const struct file_operations proc_tgid_exithand_operations; diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 13789d10a50e..44131cb6c7f4 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -74,6 +74,10 @@ struct multiprocess_signals { struct hlist_node node; }; +/* Need to stick the waitq for exithand outside process structures in + * case a process disappears across a poll. */ +extern wait_queue_head_t wait_exithand; + /* * NOTE! "signal_struct" does not have its own * locking, because a shared signal_struct always @@ -87,6 +91,9 @@ struct signal_struct { int nr_threads; struct list_head thread_head; + /* Protected with tasklist_lock. */ + bool exithand_is_interested; + wait_queue_head_t wait_chldexit; /* for wait4() */ /* current thread group signal load-balancing target: */ diff --git a/kernel/exit.c b/kernel/exit.c index 0e21e6d21f35..44a4e3796f8b 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1485,6 +1485,8 @@ void __wake_up_parent(struct task_struct *p, struct task_struct *parent) { __wake_up_sync_key(&parent->signal->wait_chldexit, TASK_INTERRUPTIBLE, 1, p); + if (p->signal->exithand_is_interested) + __wake_up_sync(&wait_exithand, TASK_INTERRUPTIBLE, 0); } static long do_wait(struct wait_opts *wo) diff --git a/kernel/signal.c b/kernel/signal.c index 17565240b1c6..e156d48da70a 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -454,6 +454,9 @@ void flush_sigqueue(struct sigpending *queue) } } +wait_queue_head_t wait_exithand = + __WAIT_QUEUE_HEAD_INITIALIZER(wait_exithand); + /* * Flush all pending signals for this kthread. */ -- 2.19.1.568.g152ad8e336-goog