Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2870065imu; Fri, 23 Nov 2018 16:24:37 -0800 (PST) X-Google-Smtp-Source: AJdET5dmn8K0jmI/2riUSib/q3YmVh4f0ZsPsEIggm4Kyo5A1LM7BAtZjT7d65NXF11fs8Snx/MZ X-Received: by 2002:a62:6181:: with SMTP id v123-v6mr18497421pfb.117.1543019077112; Fri, 23 Nov 2018 16:24:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543019077; cv=none; d=google.com; s=arc-20160816; b=JNgRQwDfuIlP66yg3KdYUmhWWiWmrCQAJSfWJuNjktYlkp3snUfd4iNHVm3Sy95Pnk ffDWNEIZPN1JzYq+kbKp2VVpIfI3e51WaQvTzysxhb6JbwJfMN5MxbPhKzusgsA3q7Y0 UKfoXUG3yZfjHBG6FlEQmDEM9qLWm20lER/ymzBGVeT/YJoYHAuvAVzaVJiatU3UMD/I EhRFLINCRrZYdmqPOwDB18ZcXV2UZ1Te6yjUp28qpSjUhM7ou+dEiXrHAI/NlpV8j/Ku 2Ekp3CNmh1mXemKwZ5muuDNIVlUj7yyWrLYplRDhU94Wii3GfQcW/azROez7CTtlrWFm takg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:dkim-signature:dkim-filter; bh=AgnIwdAK236VoKYA3hUes3qnopMWQ3Gm0U+9TZRlV2I=; b=FYmjQ8MJWtbu4MZajcslJven/TdX9IUU0k6M5qJFZ3oYt4+JdKg9xtuovBBzrkgUE2 ya5Yp4EoaHLrlsJChyNXaO4wNZ1Ir+GVO7u5J9LbjLlro93oxfsrZj8KrgAOiUMJ3c/m 4nGyNn9C6d8N+Xf6E2v5k/448QPD7EXvZM5dq+58Y9QbnlkQKhGZHUCO2F2gQsmaLigG snDEN7scQT5fFXKcSROS7DFGFsD8eWaqEvsTquWy+EsHQ+gI/64lU62jXb4n8NjP4Gmw H6CpbBByN5LJ5K6bb1pXtdqWEutmsIjP2F/h5W6C7W2v3Z6xowokF87j6G/fNn2SpTOo 4clg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=DriWMKxg; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 9si52310558pgm.112.2018.11.23.16.24.22; Fri, 23 Nov 2018 16:24:37 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=DriWMKxg; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2437714AbeKWCHM (ORCPT + 99 others); Thu, 22 Nov 2018 21:07:12 -0500 Received: from mail.efficios.com ([167.114.142.138]:39868 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2437703AbeKWCHM (ORCPT ); Thu, 22 Nov 2018 21:07:12 -0500 Received: from localhost (ip6-localhost [IPv6:::1]) by mail.efficios.com (Postfix) with ESMTP id EE7CA250B34; Thu, 22 Nov 2018 10:27:20 -0500 (EST) Received: from mail.efficios.com ([IPv6:::1]) by localhost (mail02.efficios.com [IPv6:::1]) (amavisd-new, port 10032) with ESMTP id j-X7tUyjBKv0; Thu, 22 Nov 2018 10:27:20 -0500 (EST) Received: from localhost (ip6-localhost [IPv6:::1]) by mail.efficios.com (Postfix) with ESMTP id 757C3250B21; Thu, 22 Nov 2018 10:27:20 -0500 (EST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 757C3250B21 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1542900440; bh=AgnIwdAK236VoKYA3hUes3qnopMWQ3Gm0U+9TZRlV2I=; h=Date:From:To:Message-ID:MIME-Version; b=DriWMKxgdzeiKD6BaJdfXr0Acyt8Y4bi0gPwuKuiB4dCyyTG5kB6awePQlAEEL1Qh q5sw4qcWSIb3LfxJ2dkqzecGTQ9E8sCSzT6EdSwnZQ+YLUo55qi9fCyh1GgBSkkLL6 44cvS1/4q6HYUnGUsuXvMtzoyDUKY1TNQXzvoE+Rl7QemiIdocOb2ldcQxd1zML9NQ pmijPjYS+wcfGDvs++J2yJUEb2vlqXLmem+8BFlOK30defGGcHjbpir+8Ca5rFc4Ln 0BjgYU1Xhk2cfsd0YzIu1tVu7VpBaRPYl2Z2GtqyMdGsBfUxW45lRKBW9jbXAanz+7 1ADnmXphDlAOw== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([IPv6:::1]) by localhost (mail02.efficios.com [IPv6:::1]) (amavisd-new, port 10026) with ESMTP id lb2uKHMi9GYY; Thu, 22 Nov 2018 10:27:20 -0500 (EST) Received: from mail02.efficios.com (mail02.efficios.com [167.114.142.138]) by mail.efficios.com (Postfix) with ESMTP id 4F350250B1B; Thu, 22 Nov 2018 10:27:20 -0500 (EST) Date: Thu, 22 Nov 2018 10:27:20 -0500 (EST) From: Mathieu Desnoyers To: Daniel Colascione Cc: Andrew Morton , linux-kernel , linux-api , Tim Murray , Primiano Tucci , Joel Fernandes , Jonathan Corbet , Mike Rapoport , Vlastimil Babka , Roman Gushchin , Prashant Dhamdhere , "Dennis Zhou (Facebook)" , "Eric W. Biederman" , rostedt , Thomas Gleixner , Ingo Molnar , linux@dominikbrodowski.net, Josh Poimboeuf , Ard Biesheuvel , Michal Hocko , Stephen Rothwell , ktsanaktsidis@zendesk.com, David Howells , "open list:DOCUMENTATION" Message-ID: <1320611605.10033.1542900440206.JavaMail.zimbra@efficios.com> In-Reply-To: References: <20181121201452.77173-1-dancol@google.com> <20181121141220.0e533c1dcb4792480efbf3ff@linux-foundation.org> <20181121145043.fa029f4f91afddc2a10bb81e@linux-foundation.org> <20181121162247.467fcab6c0aca0819a822286@linux-foundation.org> Subject: Re: [PATCH v2] Add /proc/pid_gen MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.142.138] X-Mailer: Zimbra 8.8.10_GA_3047 (ZimbraWebClient - FF52 (Linux)/8.8.10_GA_3041) Thread-Topic: Add /proc/pid_gen Thread-Index: jO0zrNcB96YX4gIU2FfZ/D9op3CHfA== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On Nov 21, 2018, at 7:30 PM, Daniel Colascione dancol@google.com wrote: [...] >> > > >> > > The problem here is the possibility of confusion, even if it's rare. >> > > Does the naive approach of just walking /proc and ignoring the >> > > possibility of PID reuse races work most of the time? Sure. But "most >> > > of the time" isn't good enough. It's not that there are tons of sob >> > > stories: it's that without completely robust reporting, we can't rule >> > > out of the possibility that weirdness we observe in a given trace is >> > > actually just an artifact from a kinda-sort-working best-effort trace >> > > collection system instead of a real anomaly in behavior. Tracing, >> > > essentially, gives us deltas for system state, and without an accurate >> > > baseline, collected via some kind of scan on trace startup, it's >> > > impossible to use these deltas to robustly reconstruct total system >> > > state at a given time. And this matters, because errors in >> > > reconstruction (e.g., assigning a thread to the wrong process because >> > > the IDs happen to be reused) can affect processing of the whole trace. >> > > If it's 3am and I'm analyzing the lone trace from a dogfooder >> > > demonstrating a particularly nasty problem, I don't want to find out >> > > that the trace I'm analyzing ended up being useless because the >> > > kernel's trace system is merely best effort. It's very cheap to be >> > > 100% reliable here, so let's be reliable and rule out sources of >> > > error. >> > [...] I've just been CC'd on this thread for some reason, so I'll add my 2 cents. WHIW, I think using /proc to add stateful information to a time-based trace is the wrong way to do things. Here, the fact that you need to add a generation counter struct pid_namespace and expose it via /proc just highlights its limitations when it comes to dealing with state that changes over time. Your current issue is with PID re-use, but you will eventually face the same issue for re-use of all other resources you are trying to model. For instance, a file descriptor may be associated to a path as some point in time, but that is not true anymore after a sequence of close/open which re-uses that file descriptor. Does that mean we will eventually end up needing per-file-descriptor generation counters as well ? LTTng solves this by dumping the system state as events within the trace [1], which associates time-stamps with the state being dumped. It is recorded while the rest of the system is being traced, so tools can reconstruct full system state by combining this statedump with the rest of the events recording state transitions. So while I agree that it's important to have a way to reconstruct system state that is aware of PID re-use, I think trying to extend /proc for this is the wrong approach. It adds extra fields to struct pid_namespace that seem to be only useful for tracing, whereas using the time-stamp at which the thread/process was first seen in the trace (either fork or statedump) as secondary key should suffice to uniquely identify a thread/process. I would recommend extending tracing facilities to dump the data you need rather than /proc. Thanks, Mathieu [1] http://git.lttng.org/?p=lttng-modules.git;a=blob;f=lttng-statedump-impl.c;h=dc037508c055b7f61b8c758d581bd0178e26552a;hb=HEAD -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com