Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757473AbcJXLRA (ORCPT ); Mon, 24 Oct 2016 07:17:00 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47488 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755751AbcJXLQ7 (ORCPT ); Mon, 24 Oct 2016 07:16:59 -0400 Date: Mon, 24 Oct 2016 13:15:27 +0200 From: Oleg Nesterov To: Peter Zijlstra Cc: "Ni, BaoleX" , "mingo@redhat.com" , "acme@kernel.org" , "linux-kernel@vger.kernel.org" , "alexander.shishkin@linux.intel.com" , "Liu, Chuansheng" Subject: Re: hit a KASan bug related to Perf during stress test Message-ID: <20161024111526.GA13509@redhat.com> References: <318B87A793BE164187D8851D6CE09D64371C8811@shsmsx102.ccr.corp.intel.com> <20161024095341.GF3102@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161024095341.GF3102@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.18 (2008-05-17) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Mon, 24 Oct 2016 11:16:59 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1184 Lines: 41 On 10/24, Peter Zijlstra wrote: > > > [32738.867020] [] task_tgid_nr_ns+0x35/0xb0 > > So here we did: perf_event_[pt]id(event, current); > > How can _current_ not be valid anymore? ... > > [32739.040207] [] __call_rcu+0x12c/0x450 > > And while we just called release_task(), that call_rcu() should still be > pending at this point, Yes, current is still valid. But nothing protects current->group_leader or parent/real_parent, they can point to the exited/freed task. We really need to nullify them in __unhash_process() to catch the problems like this, I wanted to do this many times... So you simply can't know your tgid or even tid after release_task() calls __unhash_process(). Actually after exit_notify() unless the exiting task autoreaps itself. How about the trivial fix below? Oleg. --- x/kernel/events/core.c +++ x/kernel/events/core.c @@ -1257,7 +1257,7 @@ static u32 perf_event_pid(struct perf_ev if (event->parent) event = event->parent; - return task_tgid_nr_ns(p, event->ns); + return pid_alive(p) ? task_tgid_nr_ns(p, event->ns) : 0; } static u32 perf_event_tid(struct perf_event *event, struct task_struct *p)