Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp83481yba; Fri, 12 Apr 2019 17:57:04 -0700 (PDT) X-Google-Smtp-Source: APXvYqxCzJIe980fvNFq40XuM0Y7HVqILdXusKrq7PHurkUsfhs+ohdQ7B7fiGXGJ7w2FDRsKeiF X-Received: by 2002:aa7:90ca:: with SMTP id k10mr60295535pfk.144.1555117024611; Fri, 12 Apr 2019 17:57:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555117024; cv=none; d=google.com; s=arc-20160816; b=HOt8JVPoqdJej06tvAMDdbt1/Yn3ezT9XkaDdqWKZC3PvISMBnqA9r7kKfusPWyLPw ISBR8quD5tihXtQkzcCtbnQ/GjpYusj0YBOr/v8P+sHXYjAchp9y0012N6tjEg87PT7S eHWOVyw6RAzWf3B0l3Q4yt+uJufxm35kX8hmV4CKNo7kmO/OSxCch/Co/LCNqunx+YKR 4uah/PTv4loLzfaIquZHs9INQUAijACFDRg+iR+MChLNMfFoEJfoioRMz82Pkosqs9Hd oRpdNTZhBwsTFNsH/7y+PfKmPaENcRhkMqTdKBSFXXTvrm4zBmnz/RhkaT7uiLKtovw3 fGuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=JP8BGTDc+T+zCWF2hfOpIX7uaRUEDBwZZB4d0ncDIfE=; b=OxB+c8FDf7OhhPWiwkODoPNOKtnuAAnKJFqrtLKeXoUSPRpCVn3YWtCUoqNezhXvRG 4d5HtSkxpAU/FTWheW35lN6Ik/wNeS6qMyiF3/1EnuHODebYx8DtQT3xW4gAbh+GYJNa Miw9uwuM3jzn/77P7+bWQ0WjUGC9sSMUB5KHNnas3G8dqnuHks0QOWe4XNBf48ATkWzA tNeoK+LUuiiTvfesWoQpTuyQ4+9n+hhDt7H1Q8JOHKc1SecLqrgQ5zKeiw9nE9PYSDwV MA0XfR5/u9wUsjxhyDhs48rkBsPoBJDg3v+vVyHqYb4E8SDfR2yTQVB+0E2+BgeNDOKl bk3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=BnLq3kDl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h5si37862174pgq.224.2019.04.12.17.56.47; Fri, 12 Apr 2019 17:57:04 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=BnLq3kDl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726988AbfDMA4N (ORCPT + 99 others); Fri, 12 Apr 2019 20:56:13 -0400 Received: from mail-oi1-f195.google.com ([209.85.167.195]:34526 "EHLO mail-oi1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726912AbfDMA4N (ORCPT ); Fri, 12 Apr 2019 20:56:13 -0400 Received: by mail-oi1-f195.google.com with SMTP id v10so9487764oib.1 for ; Fri, 12 Apr 2019 17:56:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=JP8BGTDc+T+zCWF2hfOpIX7uaRUEDBwZZB4d0ncDIfE=; b=BnLq3kDlBDmlcVNpDV/kMrmJlav2tzyaNAcmXlFhSL20OeyNWfdlMCDJjsfjitHvYG Sd2l+PO7sufg3HMYlYZfyX1LAavN91C+dRxaAh0+/t3yry9GN9hxBx5bt48aRb0XZ0b7 AWbDETU+q5wj7730p4S5B6hkkp43AAp2JSYkIgmD49G09XhaMquyOTQ9lF1ScbI6mhkC okC8hC64cU6CKT/ZDNRbm8Czn+h5QG/svGI3Q80OXubRhxiA88MXRa8Ogn6VKRTPWGkU E8f/GEVxhl5FJgGikcAjZZI7qD+gEzRNQZyO0+aaPd9JbUpR1fK2OzgpTfbvT9O8SifE a6wQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=JP8BGTDc+T+zCWF2hfOpIX7uaRUEDBwZZB4d0ncDIfE=; b=iJL//b4IRiCB2rlCaLiTledLnOBjm/xeos35DS6YM773WwoO8QrWeO5fZAyta9CD9S zNh8LmEOj4I1SM4Et8+QFcy/j51nbrvcBrk4EVNC89I9iYnZRiTYmEIpDW4YRk/yNhFn imP1SQld3D13wgWPq/NeHO9VX08xwQFCnQccWYl4HQhvpBwIPGR2ubudvclKBWcFChIM qUIhGY9owxMHUhazqPCAQDUem2/Ir6BUjAvpZKrpWN8XIPDesrcp/WaH0eJcHAw3GXJa vah4zcnWPZhrUZUky+C7AYpoc+p4wWMdqHLouMitz2LIqRUFfMjOVBarheKcOIVNMoej SRxg== X-Gm-Message-State: APjAAAXubue3hCMjuV0xeybxuW0eIkcJWVxLWqX/J8xBuPW7WMo+VkUI CbYfwu8M/38LgMm3AbRC6k8SPIk1jADmk/p+Mdgu9g== X-Received: by 2002:aca:d5cf:: with SMTP id m198mr12040178oig.138.1555116971659; Fri, 12 Apr 2019 17:56:11 -0700 (PDT) MIME-Version: 1.0 References: <20190411175043.31207-1-joel@joelfernandes.org> <20190413000941.GA53420@google.com> In-Reply-To: From: Daniel Colascione Date: Fri, 12 Apr 2019 17:56:00 -0700 Message-ID: Subject: Re: [PATCH RFC 1/2] Add polling support to pidfd To: Joel Fernandes Cc: Andy Lutomirski , LKML , Steven Rostedt , Christian Brauner , Jann Horn , Suren Baghdasaryan , Linus Torvalds , Alexey Dobriyan , Al Viro , Andrei Vagin , Andrew Morton , Arnd Bergmann , "Eric W. Biederman" , Kees Cook , Linux FS Devel , "open list:KERNEL SELFTEST FRAMEWORK" , Michal Hocko , Nadav Amit , Oleg Nesterov , Serge Hallyn , Shuah Khan , Stephen Rothwell , Taehee Yoo , Tejun Heo , Thomas Gleixner , Tycho Andersen Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [Resending due to accidental HTML. I need to take Joel's advice and switch to a real email client] On Fri, Apr 12, 2019 at 5:54 PM Daniel Colascione wrote: > > On Fri, Apr 12, 2019 at 5:09 PM Joel Fernandes wrote: >> >> Hi Andy! >> >> On Fri, Apr 12, 2019 at 02:32:53PM -0700, Andy Lutomirski wrote: >> > On Thu, Apr 11, 2019 at 10:51 AM Joel Fernandes (Google) >> > wrote: >> > > >> > > pidfd are /proc/pid directory file descriptors referring to a task group >> > > leader. Android low memory killer (LMK) needs pidfd polling support to >> > > replace code that currently checks for existence of /proc/pid for >> > > knowing a process that is signalled to be killed has died, which is both >> > > racy and slow. The pidfd poll approach is race-free, and also allows the >> > > LMK to do other things (such as by polling on other fds) while awaiting >> > > the process being killed to die. >> > > >> > > It prevents a situation where a PID is reused between when LMK sends a >> > > kill signal and checks for existence of the PID, since the wrong PID is >> > > now possibly checked for existence. >> > > >> > > In this patch, we follow the same mechanism used uhen the parent of the >> > > task group is to be notified, that is when the tasks waiting on a poll >> > > of pidfd are also awakened. >> > > >> > > We have decided to include the waitqueue in struct pid for the following >> > > reasons: >> > > 1. The wait queue has to survive for the lifetime of the poll. Including >> > > it in task_struct would not be option in this case because the task can >> > > be reaped and destroyed before the poll returns. >> > >> > Are you sure? I admit I'm not all that familiar with the innards of >> > poll() on Linux, but I thought that the waitqueue only had to survive >> > long enough to kick the polling thread and did *not* have to survive >> > until poll() actually returned. >> >> I am not sure now. I thought epoll(2) was based on the wait_event APIs, >> however more closely looking at the eventpoll code, it looks like there are 2 >> waitqueues involved, one that we pass and the other that is a part of the >> eventpoll session itself, so you could be right about that. Daniel Colascione >> may have some more thoughts about it since he brought up the possiblity of a >> wq life-time issue. Daniel? We were just playing it safe. I think you (Joel) and Andy are talking about different meanings of poll(). Joel is talking about the VFS method; Andy is talking about the system call. ISTM that the lifetime of wait queue we give to poll_wait needs to last through the poll. Normally the wait queue gets pinned by the struct file that we give to poll_wait (which takes a reference on the struct file), but the pidfd struct file doesn't pin the struct task, so we can't use a wait queue in struct task. (remove_wait_queue, which poll implementations call to undo wait queue additions, takes the wait queue head we pass to poll_wait, and we don't want to pass a dangling pointer to remove_wait_queue.) If the lifetime requirements for the queue aren't this strict, I don't see it documented anywhere. Besides: if we don't actually need to pin the waitqueue lifetime for the duration of the poll, why bother taking a reference on the polled struct file?