Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp2168927yba; Fri, 19 Apr 2019 13:36:50 -0700 (PDT) X-Google-Smtp-Source: APXvYqwQXmnNapBXM6LwsEazOY79qC5Zim+tZE91aKSQjxiqvaUMlXOeiiRlsVVkyBiPpxEzULlV X-Received: by 2002:a65:6212:: with SMTP id d18mr5821512pgv.162.1555706210895; Fri, 19 Apr 2019 13:36:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555706210; cv=none; d=google.com; s=arc-20160816; b=N2UvBRTb1I7m9mM8xUx6gpcm2yOp3e6X3a4jFHvVkCxxItwvcUjB6I7aK21Q3Pbs0Q avokGgD+bFY4ogPw9o3jwiBX4lm+phtdmKTrXp8NCILahe++VqdKK5CsdKwNNZVirE66 GLJtFjVOSKtcdq0SygfJs6PsHIIUoqKKzgyngfesiD3l28KKFUvMEjs8cJc9BevMqS/b 3usk7fFFETFozoLH3Qfix/F8M3v0mqe+UIK6JMWG2tHqbPsZqgEu0M7smtUSNsKfFDFU VLxwc6TIImcbG7JOgXJBhePuyIgwQj8Hg08VtMwCL7AoWg/Ua8JeMQexQBVx/+8EfHRY trPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=EvRGakCeJqqdNk1DtC1nn2B7pbi0sEcCooioNgUV1v4=; b=EL7/khDjHCMWmIkNK2xHu4JKc51S1I+VPbANwqPF5BoRD+W9Gvtj3E2nzw/x/vlY3D 3LjVat0vnhLKyuCRSOV+TCFnLxxoYmz38NuH36oEl+iRSemNrsVwu+0ef0731T1IA0py ZWdBg3vNc5u8qhoHsiJLr+6edT1PYpyzbgqj0tnjC7Q2qw49raeUnvwP4Ex+4PH3Ky1W 9rHz0UgAJJXpZH+GGmi4g9TrJwyZDsY6BFP7teig2EkzVjOU8cJm0ZuS8vLsg+lOySho tlAti1rCXtW+LX8OXJvL4gX4m2H/aCbTYccC5kvoqTLnifQCa3DlHFJzjLpIH7ELZjFu r+Bg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=bsqIfqxl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t12si5510926pgv.415.2019.04.19.13.36.35; Fri, 19 Apr 2019 13:36:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=bsqIfqxl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726977AbfDSUeg (ORCPT + 99 others); Fri, 19 Apr 2019 16:34:36 -0400 Received: from mail-ua1-f66.google.com ([209.85.222.66]:38078 "EHLO mail-ua1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726163AbfDSUef (ORCPT ); Fri, 19 Apr 2019 16:34:35 -0400 Received: by mail-ua1-f66.google.com with SMTP id t15so2060016uao.5 for ; Fri, 19 Apr 2019 13:34:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=EvRGakCeJqqdNk1DtC1nn2B7pbi0sEcCooioNgUV1v4=; b=bsqIfqxlo3ORWn8Xmg1yU1Qg4299WoXlwo14yV4kgBhHDQqkxJVQot9/9qEDr7lqv0 WFCOfp3yYDOZLj1d/FoDJ8D1rI4Wlqxr4tg2KAwNrD/xhl6pfgSBS4IigVseMyTRMTqL qLK1W8Z1vrVXl7LfOHu8vg4itTWFsPQvsUIRC4qf2M6znAjDlWVMl/y8yF5tdmeXJrHs CggmcFVgogxsYUk3gH41C/6raJdbsp7gLCcr3Mt12Qpns5q0kPw8gF/AGT4Ka1k/Xu57 C7s0V7twls9XNlaQtr6OWTKnnp969/fN79UTlcOM0mzZQ3qY8pbB421iZNyQPMfwSx5K VUNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=EvRGakCeJqqdNk1DtC1nn2B7pbi0sEcCooioNgUV1v4=; b=fzRPLc8o9fzrxl7dTkW2gSJqB4wNfyFO4dCLAbAW/OedjEMGFIsZt+5/Ecinzkg6qf 0W7hJFD1MvYkLIUTnY63GWCKkbJaR/ZnHuVey2+YmuF/7fjZwbSECMe+JnlS+KRg3hlL C99vXZjWZ0tVY4qh22DRLgORaAih+FUSafQ409DeA+64R67Ne84jPC0FhPCWqVFjGIRd V2Kyi0dbmlODnqHcI2bF48BO0jw4QesYEi8JmCIbADSBp07nNtIOS+w/00HSRA9CcBh+ D/1bn/MUCNXXc/0tIYdBne7HY8UjhbguWB9Fbld1hJqKPcf2YXtmA3R0OX1ViX5C1yi8 NWMw== X-Gm-Message-State: APjAAAVH73cxO8W7WzkRY5FU4L4x5xMbr3BFb2oHv7jmkXdTHv2TwouC MFoTuCgoK6Rwi/5N/UMAVf3zVO0RMCJM3htov0721A== X-Received: by 2002:ab0:14c6:: with SMTP id f6mr3229939uae.30.1555706074042; Fri, 19 Apr 2019 13:34:34 -0700 (PDT) MIME-Version: 1.0 References: <20190411175043.31207-1-joel@joelfernandes.org> <20190416120430.GA15437@redhat.com> <20190416192051.GA184889@google.com> <20190417130940.GC32622@redhat.com> <20190419190247.GB251571@google.com> <20190419191858.iwcvqm6fihbkaata@brauner.io> <20190419194902.GE251571@google.com> In-Reply-To: <20190419194902.GE251571@google.com> From: Daniel Colascione Date: Fri, 19 Apr 2019 13:34:22 -0700 Message-ID: Subject: Re: [PATCH RFC 1/2] Add polling support to pidfd To: Joel Fernandes Cc: Christian Brauner , Jann Horn , Oleg Nesterov , Florian Weimer , kernel list , Andy Lutomirski , Steven Rostedt , Suren Baghdasaryan , Linus Torvalds , Alexey Dobriyan , Al Viro , Andrei Vagin , Andrew Morton , Arnd Bergmann , "Eric W. Biederman" , Kees Cook , linux-fsdevel , "open list:KERNEL SELFTEST FRAMEWORK" , Michal Hocko , Nadav Amit , Serge Hallyn , Shuah Khan , Stephen Rothwell , Taehee Yoo , Tejun Heo , Thomas Gleixner , kernel-team , Tycho Andersen Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 19, 2019 at 12:49 PM Joel Fernandes wrote: > > On Fri, Apr 19, 2019 at 09:18:59PM +0200, Christian Brauner wrote: > > On Fri, Apr 19, 2019 at 03:02:47PM -0400, Joel Fernandes wrote: > > > On Thu, Apr 18, 2019 at 07:26:44PM +0200, Christian Brauner wrote: > > > > On April 18, 2019 7:23:38 PM GMT+02:00, Jann Horn wrote: > > > > >On Wed, Apr 17, 2019 at 3:09 PM Oleg Nesterov wrote: > > > > >> On 04/16, Joel Fernandes wrote: > > > > >> > On Tue, Apr 16, 2019 at 02:04:31PM +0200, Oleg Nesterov wrote: > > > > >> > > > > > > >> > > Could you explain when it should return POLLIN? When the whole > > > > >process exits? > > > > >> > > > > > >> > It returns POLLIN when the task is dead or doesn't exist anymore, > > > > >or when it > > > > >> > is in a zombie state and there's no other thread in the thread > > > > >group. > > > > >> > > > > >> IOW, when the whole thread group exits, so it can't be used to > > > > >monitor sub-threads. > > > > >> > > > > >> just in case... speaking of this patch it doesn't modify > > > > >proc_tid_base_operations, > > > > >> so you can't poll("/proc/sub-thread-tid") anyway, but iiuc you are > > > > >going to use > > > > >> the anonymous file returned by CLONE_PIDFD ? > > > > > > > > > >I don't think procfs works that way. /proc/sub-thread-tid has > > > > >proc_tgid_base_operations despite not being a thread group leader. > > > > >(Yes, that's kinda weird.) AFAICS the WARN_ON_ONCE() in this code can > > > > >be hit trivially, and then the code will misbehave. > > > > > > > > > >@Joel: I think you'll have to either rewrite this to explicitly bail > > > > >out if you're dealing with a thread group leader, or make the code > > > > >work for threads, too. > > > > > > > > The latter case probably being preferred if this API is supposed to be > > > > useable for thread management in userspace. > > > > > > At the moment, we are not planning to use this for sub-thread management. I > > > am reworking this patch to only work on clone(2) pidfds which makes the above > > > > Indeed and agreed. > > > > > discussion about /proc a bit unnecessary I think. Per the latest CLONE_PIDFD > > > patches, CLONE_THREAD with pidfd is not supported. > > > > Yes. We have no one asking for it right now and we can easily add this > > later. > > > > Admittedly I haven't gotten around to reviewing the patches here yet > > completely. But one thing about using POLLIN. FreeBSD is using POLLHUP > > on process exit which I think is nice as well. How about returning > > POLLIN | POLLHUP on process exit? > > We already do things like this. For example, when you proxy between > > ttys. If the process that you're reading data from has exited and closed > > it's end you still can't usually simply exit because it might have still > > buffered data that you want to read. The way one can deal with this > > from userspace is that you can observe a (POLLHUP | POLLIN) event and > > you keep on reading until you only observe a POLLHUP without a POLLIN > > event at which point you know you have read > > all data. > > I like the semantics for pidfds as well as it would indicate: > > - POLLHUP -> process has exited > > - POLLIN -> information can be read > > Actually I think a bit different about this, in my opinion the pidfd should > always be readable (we would store the exit status somewhere in the future > which would be readable, even after task_struct is dead). So I was thinking > we always return EPOLLIN. If process has not exited, then it blocks. ITYM that a pidfd polls as readable *once a task exits* and stays readable forever. Before a task exit, a poll on a pidfd should *not* yield POLLIN and reading that pidfd should *not* complete immediately. There's no way that, having observed POLLIN on a pidfd, you should ever then *not* see POLLIN on that pidfd in the future --- it's a one-way transition from not-ready-to-get-exit-status to ready-to-get-exit-status. Besides, didn't Linux say that he wanted waitpid(2) to be the function for exit status on a pidfd, not read(2)? It doesn't really matter: POLLIN on a pidfd would just make "I, the kernel, say that waitpid on this FD won't block", whereas for something like socket, it would mean "I, the kernel, say read(2) on this FD won't block". I don't see the need for POLLHUP in pidfds at all. IMHO, we shouldn't include it. "Hangup" doesn't have an obvious meaning distinct from exit, and we might want the POLLHUP bit for something else in the future. What would a caller do with POLLHUP? If the answer is "something to do with ptrace", let's defer that to some future work, since ptrace has a ton of moving parts I don't want to consider right now.