Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757396AbcJXLYI (ORCPT ); Mon, 24 Oct 2016 07:24:08 -0400 Received: from merlin.infradead.org ([205.233.59.134]:55548 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753962AbcJXLYG (ORCPT ); Mon, 24 Oct 2016 07:24:06 -0400 Date: Mon, 24 Oct 2016 13:24:02 +0200 From: Peter Zijlstra To: Oleg Nesterov Cc: "Ni, BaoleX" , "mingo@redhat.com" , "acme@kernel.org" , "linux-kernel@vger.kernel.org" , "alexander.shishkin@linux.intel.com" , "Liu, Chuansheng" Subject: Re: hit a KASan bug related to Perf during stress test Message-ID: <20161024112402.GI3102@twins.programming.kicks-ass.net> References: <318B87A793BE164187D8851D6CE09D64371C8811@shsmsx102.ccr.corp.intel.com> <20161024095341.GF3102@twins.programming.kicks-ass.net> <20161024111526.GA13509@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161024111526.GA13509@redhat.com> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1366 Lines: 44 On Mon, Oct 24, 2016 at 01:15:27PM +0200, Oleg Nesterov wrote: > On 10/24, Peter Zijlstra wrote: > > > > > [32738.867020] [] task_tgid_nr_ns+0x35/0xb0 > > > > So here we did: perf_event_[pt]id(event, current); > > > > How can _current_ not be valid anymore? > > ... > > > > [32739.040207] [] __call_rcu+0x12c/0x450 > > > > And while we just called release_task(), that call_rcu() should still be > > pending at this point, > > Yes, current is still valid. > > But nothing protects current->group_leader or parent/real_parent, they > can point to the exited/freed task. We really need to nullify them in > __unhash_process() to catch the problems like this, I wanted to do this > many times... > > So you simply can't know your tgid or even tid after release_task() calls > __unhash_process(). Actually after exit_notify() unless the exiting task > autoreaps itself. > > How about the trivial fix below? > > Oleg. > > --- x/kernel/events/core.c > +++ x/kernel/events/core.c > @@ -1257,7 +1257,7 @@ static u32 perf_event_pid(struct perf_ev > if (event->parent) > event = event->parent; > > - return task_tgid_nr_ns(p, event->ns); > + return pid_alive(p) ? task_tgid_nr_ns(p, event->ns) : 0; > } Hurm.. should we not push this into task_tgid_nr_ns() ? I mean, now the user needs to be aware of this dinky detail.