Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp504645imu; Thu, 22 Nov 2018 00:42:57 -0800 (PST) X-Google-Smtp-Source: AFSGD/VZVFUVU9FCbHMBUGJOPiVxQHKROZl1cChpj2iL8sJI1EOPc44xJv+a7d13UN4/qHXgT7zI X-Received: by 2002:a63:b649:: with SMTP id v9mr9305172pgt.436.1542876177429; Thu, 22 Nov 2018 00:42:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542876177; cv=none; d=google.com; s=arc-20160816; b=nOo/nAhksv87IyQ3vRGG0+mGR9YMR12rF+b/K1hk8MolvY4eXo/lemu2MnYCX39SPn PWpRRHPnssMn1cGKGAXkTlsuA7zxjvAoSn+DXmn4HcDwuDk1RYNZcCFg3Vl1K2xbnYjc 29679URBm4Q1pGYBJOG9WXXzN0PS9pPo020sgGNWTWEL1Y5Qrs9d8N7xLHuDqkZxKEpu ypL8D/stsOkz2J1DvGwZuCMqLg6aSU1tNsrTw/aaA2Ysm7pE+pflwB3gvgZsX8WEyPMW ffobEzHC7DkysEhEHOghAdORG7UyGBW3hLKfPoNF1NlP4RmeC9KWp6xa5Ami5aqCGi3E 5ykw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=SenTUbOMDudhG+CyNQ1mhqSdumEIXhs0Ulmp868ubVo=; b=rrHRiSXRp17CxZDPD+Qp30cjqb+ttYo9yi3KJ7JSgEMrb5cT8E0sF52MKQmoqxqvZR /EBVdOEmTduNOvpWZvNOeJB2dxkQpXbqszKjwwU2cu9nNu4sC0yVpzpFoHMaYHUn1Lew advpBlJ4giLmvhtpUFESGsj/3a9zh9cnH9Cdk6gyFu9goSIv9n93gG6XVWbB3WpqjqGD A2/NcHGjX6zR4vXd2zD5UuWvVTUn8BQK/dTcLRjLxz1CquRkSfGajUvLYRJXJnX699cr Wtf4A4JHs9X7QNTPRf+UGN+lNr74bU+AQMNikyEb5A/5hNObWqerbwTEZFelkt/Vujk3 3+Vw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n18si26179016pfj.30.2018.11.22.00.42.42; Thu, 22 Nov 2018 00:42:57 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390022AbeKVIsi (ORCPT + 99 others); Thu, 22 Nov 2018 03:48:38 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:35170 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387621AbeKVIsh (ORCPT ); Thu, 22 Nov 2018 03:48:37 -0500 Received: from localhost.localdomain (c-24-6-170-16.hsd1.ca.comcast.net [24.6.170.16]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 2264C7AA; Wed, 21 Nov 2018 22:12:22 +0000 (UTC) Date: Wed, 21 Nov 2018 14:12:20 -0800 From: Andrew Morton To: Daniel Colascione Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, timmurray@google.com, primiano@google.com, joelaf@google.com, Jonathan Corbet , Mike Rapoport , Vlastimil Babka , Roman Gushchin , Prashant Dhamdhere , "Dennis Zhou (Facebook)" , "Eric W. Biederman" , "Steven Rostedt (VMware)" , Thomas Gleixner , Ingo Molnar , Dominik Brodowski , Josh Poimboeuf , Ard Biesheuvel , Michal Hocko , Stephen Rothwell , KJ Tsanaktsidis , David Howells , "open list:DOCUMENTATION" Subject: Re: [PATCH v2] Add /proc/pid_gen Message-Id: <20181121141220.0e533c1dcb4792480efbf3ff@linux-foundation.org> In-Reply-To: <20181121205428.165205-1-dancol@google.com> References: <20181121201452.77173-1-dancol@google.com> <20181121205428.165205-1-dancol@google.com> X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 21 Nov 2018 12:54:20 -0800 Daniel Colascione wrote: > Trace analysis code needs a coherent picture of the set of processes > and threads running on a system. While it's possible to enumerate all > tasks via /proc, this enumeration is not atomic. If PID numbering > rolls over during snapshot collection, the resulting snapshot of the > process and thread state of the system may be incoherent, confusing > trace analysis tools. The fundamental problem is that if a PID is > reused during a userspace scan of /proc, it's impossible to tell, in > post-processing, whether a fact that the userspace /proc scanner > reports regarding a given PID refers to the old or new task named by > that PID, as the scan of that PID may or may not have occurred before > the PID reuse, and there's no way to "stamp" a fact read from the > kernel with a trace timestamp. > > This change adds a per-pid-namespace 64-bit generation number, > incremented on PID rollover, and exposes it via a new proc file > /proc/pid_gen. By examining this file before and after /proc > enumeration, user code can detect the potential reuse of a PID and > restart the task enumeration process, repeating until it gets a > coherent snapshot. > > PID rollover ought to be rare, so in practice, scan repetitions will > be rare. In general, tracing is a rather specialized thing. Why is this very occasional confusion a sufficiently serious problem to warrant addition of this code? Which userspace tools will be using pid_gen? Are the developers of those tools signed up to use pid_gen? > --- a/include/linux/pid.h > +++ b/include/linux/pid.h > @@ -112,6 +112,7 @@ extern struct pid *find_ge_pid(int nr, struct pid_namespace *); > int next_pidmap(struct pid_namespace *pid_ns, unsigned int last); > > extern struct pid *alloc_pid(struct pid_namespace *ns); > +extern u64 read_pid_generation(struct pid_namespace *ns); pig_generation_read() would be a better (and more idiomatic) name. > extern void free_pid(struct pid *pid); > extern void disable_pid_allocation(struct pid_namespace *ns); > > ... > > +u64 read_pid_generation(struct pid_namespace *ns) > +{ > + u64 generation; > + > + > + spin_lock_irq(&pidmap_lock); > + generation = ns->generation; > + spin_unlock_irq(&pidmap_lock); > + return generation; > +} What is the spinlocking in here for? afaict the only purpose it serves is to make the 64-bit read atomic, so it isn't needed on 32-bit? > void disable_pid_allocation(struct pid_namespace *ns) > { > spin_lock_irq(&pidmap_lock); > @@ -449,6 +463,17 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns) > return idr_get_next(&ns->idr, &nr); > } > > +#ifdef CONFIG_PROC_FS > +static int pid_generation_show(struct seq_file *m, void *v) > +{ > + u64 generation = > + read_pid_generation(proc_pid_ns(file_inode(m->file))); u64 generation; generation = read_pid_generation(proc_pid_ns(file_inode(m->file))); is a nicer way of avoiding column wrap. > + seq_printf(m, "%llu\n", generation); > + return 0; > + > +}; > +#endif > + > void __init pid_idr_init(void) > { > /* Verify no one has done anything silly: */ > @@ -465,4 +490,13 @@ void __init pid_idr_init(void) > > init_pid_ns.pid_cachep = KMEM_CACHE(pid, > SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT); > + > +} > + > +void __init pid_proc_init(void) > +{ > + /* pid_idr_init is too early, so get a separate init function. */ s/get a/use a/ > +#ifdef CONFIG_PROC_FS > + WARN_ON(!proc_create_single("pid_gen", 0, NULL, pid_generation_show)); > +#endif > } This whole function could vanish if !CONFIG_PROC_FS. Doesn't matter much with __init code though.