Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752072AbbFPCPH (ORCPT ); Mon, 15 Jun 2015 22:15:07 -0400 Received: from e35.co.us.ibm.com ([32.97.110.153]:37132 "EHLO e35.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751172AbbFPCPD (ORCPT ); Mon, 15 Jun 2015 22:15:03 -0400 X-Helo: d03dlp03.boulder.ibm.com X-MailFrom: paulmck@linux.vnet.ibm.com X-RcptTo: linux-kernel@vger.kernel.org Date: Mon, 15 Jun 2015 19:14:58 -0700 From: "Paul E. McKenney" To: Alexei Starovoitov Cc: Daniel Wagner , LKML Subject: Re: call_rcu from trace_preempt Message-ID: <20150616021458.GE3913@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <557F509D.2000509@plumgrid.com> <20150615230702.GB3913@linux.vnet.ibm.com> <557F7764.5060707@plumgrid.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <557F7764.5060707@plumgrid.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15061602-0013-0000-0000-00000E543631 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3925 Lines: 94 On Mon, Jun 15, 2015 at 06:09:56PM -0700, Alexei Starovoitov wrote: > On 6/15/15 4:07 PM, Paul E. McKenney wrote: > > > >Oh... One important thing is that both call_rcu() and kfree_rcu() > >use per-CPU variables, managing a per-CPU linked list. This is why > >they disable interrupts. If you do another call_rcu() in the middle > >of the first one in just the wrong place, you will have two entities > >concurrently manipulating the same linked list, which will not go well. > > yes. I'm trying to find that 'wrong place'. > The trace.patch is doing kmalloc/kfree_rcu for every preempt_enable. > So any spin_unlock called by first call_rcu will be triggering > 2nd recursive to call_rcu. > But as far as I could understand rcu code that looks ok everywhere. > call_rcu > debug_rcu_head_[un]queue > debug_object_activate > spin_unlock > > and debug_rcu_head* seems to be called from safe places > where local_irq is enabled. I do sympathize, but your own testing does demonstrate that it is very much not OK. ;-) > >Maybe mark call_rcu() and the things it calls as notrace? Or you > >could maintain a separate per-CPU linked list that gathered up the > >stuff to be kfree()ed after a grace period, and some time later > >feed them to kfree_rcu()? > > yeah, I can think of this or 10 other ways to fix it within > kprobe+bpf area, but I think something like call_rcu_notrace() > may be a better solution. > Or may be single generic 'fix' for call_rcu will be enough if > it doesn't affect all other users. Why do you believe that it is better to fix it within call_rcu()? > >The usual consequence of racing a pair of callback insertions on the > >same CPU would be that one of them gets leaked, and possible all > >subsequent callbacks. So the lockup is no surprise. And there are a > >lot of other assumptions in nearby code paths about only one execution > >at a time from a given CPU. > > yes, I don't think calling 2nd call_rcu from preempt_enable violates > this assumptions. local_irq does it job. No extra stuff is called when > interrupts are disabled. Perhaps you are self-deadlocking within __call_rcu_core(). If you have not already done so, please try running with CONFIG_PROVE_LOCKING=y. (Yes, I see your point about not calling extra stuff when interrupts are disabled, and I remember that __call_rcu() avoids that path when interrupts are disabled, but -something- is clearly going wrong! Or maybe something momentarily enables interrupts somewhere, and RCU has just been getting lucky. Or...) > >>Any advise on where to look is greatly appreciated. > > > >What I don't understand is exactly what you are trying to do. Have more > >complex tracers that dynamically allocate memory? If so, having a per-CPU > >list that stages memory to be freed so that it can be passed to call_rcu() > >in a safe environment might make sense. Of course, that list would need > >to be managed carefully! > > yes. We tried to compute the time the kernel spends between > preempt_disable->preempt_enable and plot a histogram of latencies. > > >Or am I missing the point of the code below? > > this trace.patch is reproducer of call_rcu crashes that doing: > preempt_enable > trace_preempt_on > kfree_call_rcu > > The real call stack is: > preempt_enable > trace_preempt_on > kprobe_int3_handler > trace_call_bpf > bpf_map_update_elem > htab_map_update_elem > kree_call_rcu I suspect that your problem may range quite a bit further than just call_rcu(). For example, in your stack trace, you have a recursive call to debug_object_activate(), which might not be such good thing. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/