Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758169AbcJYG4N convert rfc822-to-8bit (ORCPT ); Tue, 25 Oct 2016 02:56:13 -0400 Received: from mga11.intel.com ([192.55.52.93]:18711 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752435AbcJYG4H (ORCPT ); Tue, 25 Oct 2016 02:56:07 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,545,1473145200"; d="scan'208";a="777252683" From: "Ni, BaoleX" To: Oleg Nesterov , Peter Zijlstra CC: "mingo@redhat.com" , "acme@kernel.org" , "linux-kernel@vger.kernel.org" , "alexander.shishkin@linux.intel.com" , "Liu, Chuansheng" Subject: RE: hit a KASan bug related to Perf during stress test Thread-Topic: hit a KASan bug related to Perf during stress test Thread-Index: AdIt2f3jqc67Yj2uTI29za5JcRQ5U///fueAgAAW2YCAAAJmAIAACsEAgAACO4CAAANCAIAAAhsAgAACYgCAAA1SgIAAE8wAgAARbQCAAAPogP/+frUw Date: Tue, 25 Oct 2016 06:55:15 +0000 Message-ID: <318B87A793BE164187D8851D6CE09D64371C9321@shsmsx102.ccr.corp.intel.com> References: <20161024111526.GA13509@redhat.com> <20161024112402.GI3102@twins.programming.kicks-ass.net> <20161024120231.GA16554@redhat.com> <20161024121030.GA17007@redhat.com> <20161024122210.GM3102@twins.programming.kicks-ass.net> <20161024122942.GC17007@redhat.com> <20161024123814.GP3102@twins.programming.kicks-ass.net> <20161024132555.GA18410@redhat.com> <20161024143646.GR3102@twins.programming.kicks-ass.net> <20161024153908.GA26135@redhat.com> <20161024155306.GA27477@redhat.com> In-Reply-To: <20161024155306.GA27477@redhat.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYjliNGNjZjctYjg1NC00ZDQ1LWIxYzctM2M5NzkxMTU4MDc2IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6IndqYlJjd08wTkF1N2ExcHA4b1JMcGVEeTlpXC9cL3h3Q1FETUhOVldNeWtqYz0ifQ== x-ctpclassification: CTP_IC x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2103 Lines: 55 Thanks a lot, guys. I will take Peter's patch to do stress test. -----Original Message----- From: Oleg Nesterov [mailto:oleg@redhat.com] Sent: Monday, October 24, 2016 11:53 PM To: Peter Zijlstra Cc: Ni, BaoleX; mingo@redhat.com; acme@kernel.org; linux-kernel@vger.kernel.org; alexander.shishkin@linux.intel.com; Liu, Chuansheng Subject: Re: hit a KASan bug related to Perf during stress test On 10/24, Oleg Nesterov wrote: > > On 10/24, Peter Zijlstra wrote: > > > > --- a/kernel/events/core.c > > +++ b/kernel/events/core.c > > @@ -1257,7 +1257,14 @@ static u32 perf_event_pid(struct perf_event *event, struct task_struct *p) > > if (event->parent) > > event = event->parent; > > > > - return task_tgid_nr_ns(p, event->ns); > > + /* > > + * It is possible the task already got unhashed, in which case we > > + * cannot determine the current->group_leader/real_parent. > > + * > > + * Also, report -1 to indicate unhashed, so as not to confused with > > + * 0 for the idle task. > > + */ > > + return pid_alive(p) ? task_tgid_nr_ns(p, event->ns) : ~0; > > } > > Yes, but this _looks_ racy unless p == current. I mean, pid_alive() > makes > task_tgid_nr_ns() safe, but task_tgid_nr_ns() still can return zero > _if_ it can race with the exiting task. > > > static u32 perf_event_tid(struct perf_event *event, struct > > task_struct *p) @@ -1268,7 +1275,7 @@ static u32 perf_event_tid(struct perf_event *event, struct task_struct *p) > > if (event->parent) > > event = event->parent; > > > > - return task_pid_nr_ns(p, event->ns); > > + return pid_alive(p) ? task_pid_nr_ns(p, event->ns) : ~0; > > The same. > > However. At first glance the only case when p != current is > copy_process(), right? And in this case the new child can't go away. > So I think this patch is fine. Actually there is another case, comm_write() -> perf_event_comm_output(). It checks same_thread_group(current, p), so we can only race with the exiting sub-thread. perf_event_pid() can't return zero, perf_event_tid() can. And I personally think we do not care and your patch is fine anyway ;) Oleg.