Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp428612imu; Wed, 21 Nov 2018 23:09:04 -0800 (PST) X-Google-Smtp-Source: AFSGD/Xj2NfhurAaFWzfvjSgSjo9FiEmA3u5tzsxlOK8/nloiS7bwiQ8MhqErFGOqKOtSZTu7FhL X-Received: by 2002:a17:902:925:: with SMTP id 34mr9766024plm.14.1542870544403; Wed, 21 Nov 2018 23:09:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542870544; cv=none; d=google.com; s=arc-20160816; b=RmLX95cwAGCkvsdR1E9slqLzfhCCzGx1utDho5uZ/0UzexrLLkWNG4X9g9eWg22MVc JE6uXjNU4gkZ+n6xnVleuGI64pe0B/TA8s4jCogdGq9kvU3GeCCTE2uODXj7T+snFkqy 6PgHRhp0AxZTxrs6iL9rqHTVry6eYhrZahShZczKS/fwEdCVlt2a07kavOrgH0GHATTy /0FyZ5yydGgoMRAOh++Hj9xNH+uvoW+nzY+Yd5zwQwhN49b6KOYFQFeu4ZsJFj1hsh2U QLGZoo/fyxlkuzJy9q0EwhP3/xo+BqtANiKblMgMf5F4zsDce4xMlc/f2xFikNfsdJYV Z+rg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:from:subject:references :mime-version:message-id:in-reply-to:date:dkim-signature; bh=To84DltU944Wr9QC3TPhBrNPzTAHxV6T9nRLazCGLYs=; b=pdjRSHmBHOFwOgURrOjAqAyCIZJWqMCP8qUSNhpgoA78+5de63hQfg/dthNu9DGuiq TPoOP3s/wfp/I35Tl3dCsqGsb2tp30S4bvoftv5pyH7lKWt5gxu6D064ljTT9OlGCePq vjTj6Zn4EMEO5swaAgiXV2jgPW0nUrExTzTRE4D6dx1wUAkQQao3rCET5VeKwkpqU7Hy iU63xkBmCwrmAlqI3fKsABjqkOWNORdek3hGp1jciec33r2tw2Q8qPzk8FctpKi1SDEs Bb6N/R1/cc7TWBwbgfATSGLB3yPVZ3M4z6uoGeCe/ashhSL0rncy2fSaBxlnjWZVp25c dIzA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=r+EajsTg; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id cb1si28631989plb.37.2018.11.21.23.08.35; Wed, 21 Nov 2018 23:09:04 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=r+EajsTg; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728781AbeKVHax (ORCPT + 99 others); Thu, 22 Nov 2018 02:30:53 -0500 Received: from mail-it1-f202.google.com ([209.85.166.202]:56122 "EHLO mail-it1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726280AbeKVHax (ORCPT ); Thu, 22 Nov 2018 02:30:53 -0500 Received: by mail-it1-f202.google.com with SMTP id 71-v6so8430128itl.5 for ; Wed, 21 Nov 2018 12:54:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=To84DltU944Wr9QC3TPhBrNPzTAHxV6T9nRLazCGLYs=; b=r+EajsTg9+XPEcVykFnpTsAIMcSfE42ITmJUNnKbAlwYk04mrzNIxIKN5hXxfd0eti Ze69r71DVP7MveKsV9s2QlDIkHAukD1ag1BfU3aAXaJnlOtCvJ8513S5K36HjPEMWFg6 +lVshLPKDyj1aSPemz9WRB+focp9NToSRNEL9AaReTe5M5tOehYF7AHq7V7fyqS3QLr9 M6O8Tu5PYqB9UcM8ddifeD8A9G2KBJLn0bjo3Zhi02DiMP5AAAjysUkwLA6/Mn5xcTza 4WRdN2Djg56jD2OKmBuBVgpKgEUdhjxKbfDsugA30xLuA7cZr3Yc01dbvmcUo1FOpq9N drug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=To84DltU944Wr9QC3TPhBrNPzTAHxV6T9nRLazCGLYs=; b=DP448BbI5Jjh7mf3NudojI/FojGnyaI+37sEqqo9+jRvXmx20xupdsGBk5PEXaTRoW 5qdp68mMWDyw2nO/5s0+ljoKpII6Uga5AeYwCE9VK8eHfDTiNBe+AhibqSiHvf3oASo5 6IPX3mq/zYC9RVCbSmB1tD2/B/uxRXmluDfH6+B4ORJtggK8t0FHmpkazaVxmYvPABD6 upL/hpjqFNK7HXFd+jY/E5du7KiArNqTtnGx3knFfw30BgUKlb77fPJHCRateDxr9Oz0 JHSwUEFyYX6UinPyX4/O2vFH3b2CloYXw5gLyKBP8xZwlXaM7vLUZnq+rOQVR1Nn0En4 OXVw== X-Gm-Message-State: AA+aEWZkUt0+W7JrX5aLRAyVT6zYdWHZqrqL5rHznpytuDSKmmtADto4 6xSWUxyxX4fPaYnTRdz/eMMQDFp6cRIu9Xocuug/trozlYuENFq4Tuyp1c9vJ+C8jAqX67vG/Yr s2MygzfCy61PixVN4+J6hhBKakKHLs+k13/OXZBHDZkgW1ZBCkV0W35oc7Osb1Nh5pHWWng== X-Received: by 2002:a24:2f0e:: with SMTP id j14mr4341337itj.14.1542833695242; Wed, 21 Nov 2018 12:54:55 -0800 (PST) Date: Wed, 21 Nov 2018 12:54:20 -0800 In-Reply-To: <20181121201452.77173-1-dancol@google.com> Message-Id: <20181121205428.165205-1-dancol@google.com> Mime-Version: 1.0 References: <20181121201452.77173-1-dancol@google.com> X-Mailer: git-send-email 2.19.1.1215.g8438c0b245-goog Subject: [PATCH v2] Add /proc/pid_gen From: Daniel Colascione To: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org Cc: timmurray@google.com, primiano@google.com, joelaf@google.com, Daniel Colascione , Jonathan Corbet , Andrew Morton , Mike Rapoport , Vlastimil Babka , Roman Gushchin , Prashant Dhamdhere , "Dennis Zhou (Facebook)" , "Eric W. Biederman" , "Steven Rostedt (VMware)" , Thomas Gleixner , Ingo Molnar , Dominik Brodowski , Josh Poimboeuf , Ard Biesheuvel , Michal Hocko , Stephen Rothwell , KJ Tsanaktsidis , David Howells , "open list:DOCUMENTATION" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Trace analysis code needs a coherent picture of the set of processes and threads running on a system. While it's possible to enumerate all tasks via /proc, this enumeration is not atomic. If PID numbering rolls over during snapshot collection, the resulting snapshot of the process and thread state of the system may be incoherent, confusing trace analysis tools. The fundamental problem is that if a PID is reused during a userspace scan of /proc, it's impossible to tell, in post-processing, whether a fact that the userspace /proc scanner reports regarding a given PID refers to the old or new task named by that PID, as the scan of that PID may or may not have occurred before the PID reuse, and there's no way to "stamp" a fact read from the kernel with a trace timestamp. This change adds a per-pid-namespace 64-bit generation number, incremented on PID rollover, and exposes it via a new proc file /proc/pid_gen. By examining this file before and after /proc enumeration, user code can detect the potential reuse of a PID and restart the task enumeration process, repeating until it gets a coherent snapshot. PID rollover ought to be rare, so in practice, scan repetitions will be rare. Signed-off-by: Daniel Colascione --- Make commit message match the code. Documentation/filesystems/proc.txt | 1 + include/linux/pid.h | 1 + include/linux/pid_namespace.h | 2 ++ init/main.c | 1 + kernel/pid.c | 36 +++++++++++++++++++++++++++++- 5 files changed, 40 insertions(+), 1 deletion(-) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 12a5e6e693b6..f58a359f9a2c 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -615,6 +615,7 @@ Table 1-5: Kernel info in /proc partitions Table of partitions known to the system pci Deprecated info of PCI bus (new way -> /proc/bus/pci/, decoupled by lspci (2.4) + pid_gen PID rollover count rtc Real time clock scsi SCSI info (see text) slabinfo Slab pool info diff --git a/include/linux/pid.h b/include/linux/pid.h index 14a9a39da9c7..2e4b41a32e86 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -112,6 +112,7 @@ extern struct pid *find_ge_pid(int nr, struct pid_namespace *); int next_pidmap(struct pid_namespace *pid_ns, unsigned int last); extern struct pid *alloc_pid(struct pid_namespace *ns); +extern u64 read_pid_generation(struct pid_namespace *ns); extern void free_pid(struct pid *pid); extern void disable_pid_allocation(struct pid_namespace *ns); diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index 49538b172483..fa92ae66fb98 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -44,6 +44,7 @@ struct pid_namespace { kgid_t pid_gid; int hide_pid; int reboot; /* group exit code if this pidns was rebooted */ + u64 generation; /* incremented on wraparound */ struct ns_common ns; } __randomize_layout; @@ -99,5 +100,6 @@ static inline int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd) extern struct pid_namespace *task_active_pid_ns(struct task_struct *tsk); void pidhash_init(void); void pid_idr_init(void); +void pid_proc_init(void); #endif /* _LINUX_PID_NS_H */ diff --git a/init/main.c b/init/main.c index ee147103ba1b..20c595e852c6 100644 --- a/init/main.c +++ b/init/main.c @@ -730,6 +730,7 @@ asmlinkage __visible void __init start_kernel(void) cgroup_init(); taskstats_init_early(); delayacct_init(); + pid_proc_init(); check_bugs(); diff --git a/kernel/pid.c b/kernel/pid.c index b2f6c506035d..cd5f4aa8eb55 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -174,6 +174,7 @@ struct pid *alloc_pid(struct pid_namespace *ns) for (i = ns->level; i >= 0; i--) { int pid_min = 1; + unsigned int old_cursor; idr_preload(GFP_KERNEL); spin_lock_irq(&pidmap_lock); @@ -182,7 +183,8 @@ struct pid *alloc_pid(struct pid_namespace *ns) * init really needs pid 1, but after reaching the maximum * wrap back to RESERVED_PIDS */ - if (idr_get_cursor(&tmp->idr) > RESERVED_PIDS) + old_cursor = idr_get_cursor(&tmp->idr); + if (old_cursor > RESERVED_PIDS) pid_min = RESERVED_PIDS; /* @@ -191,6 +193,8 @@ struct pid *alloc_pid(struct pid_namespace *ns) */ nr = idr_alloc_cyclic(&tmp->idr, NULL, pid_min, pid_max, GFP_ATOMIC); + if (unlikely(idr_get_cursor(&tmp->idr) <= old_cursor)) + tmp->generation += 1; spin_unlock_irq(&pidmap_lock); idr_preload_end(); @@ -246,6 +250,16 @@ struct pid *alloc_pid(struct pid_namespace *ns) return ERR_PTR(retval); } +u64 read_pid_generation(struct pid_namespace *ns) +{ + u64 generation; + + spin_lock_irq(&pidmap_lock); + generation = ns->generation; + spin_unlock_irq(&pidmap_lock); + return generation; +} + void disable_pid_allocation(struct pid_namespace *ns) { spin_lock_irq(&pidmap_lock); @@ -449,6 +463,17 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns) return idr_get_next(&ns->idr, &nr); } +#ifdef CONFIG_PROC_FS +static int pid_generation_show(struct seq_file *m, void *v) +{ + u64 generation = + read_pid_generation(proc_pid_ns(file_inode(m->file))); + seq_printf(m, "%llu\n", generation); + return 0; + +}; +#endif + void __init pid_idr_init(void) { /* Verify no one has done anything silly: */ @@ -465,4 +490,13 @@ void __init pid_idr_init(void) init_pid_ns.pid_cachep = KMEM_CACHE(pid, SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT); + +} + +void __init pid_proc_init(void) +{ + /* pid_idr_init is too early, so get a separate init function. */ +#ifdef CONFIG_PROC_FS + WARN_ON(!proc_create_single("pid_gen", 0, NULL, pid_generation_show)); +#endif } -- 2.19.1.1215.g8438c0b245-goog