Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754277AbbFPBKI (ORCPT ); Mon, 15 Jun 2015 21:10:08 -0400 Received: from mail-pd0-f174.google.com ([209.85.192.174]:35884 "EHLO mail-pd0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752150AbbFPBJ6 (ORCPT ); Mon, 15 Jun 2015 21:09:58 -0400 Message-ID: <557F7764.5060707@plumgrid.com> Date: Mon, 15 Jun 2015 18:09:56 -0700 From: Alexei Starovoitov User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: paulmck@linux.vnet.ibm.com CC: Daniel Wagner , LKML Subject: Re: call_rcu from trace_preempt References: <557F509D.2000509@plumgrid.com> <20150615230702.GB3913@linux.vnet.ibm.com> In-Reply-To: <20150615230702.GB3913@linux.vnet.ibm.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2925 Lines: 74 On 6/15/15 4:07 PM, Paul E. McKenney wrote: > > Oh... One important thing is that both call_rcu() and kfree_rcu() > use per-CPU variables, managing a per-CPU linked list. This is why > they disable interrupts. If you do another call_rcu() in the middle > of the first one in just the wrong place, you will have two entities > concurrently manipulating the same linked list, which will not go well. yes. I'm trying to find that 'wrong place'. The trace.patch is doing kmalloc/kfree_rcu for every preempt_enable. So any spin_unlock called by first call_rcu will be triggering 2nd recursive to call_rcu. But as far as I could understand rcu code that looks ok everywhere. call_rcu debug_rcu_head_[un]queue debug_object_activate spin_unlock and debug_rcu_head* seems to be called from safe places where local_irq is enabled. > Maybe mark call_rcu() and the things it calls as notrace? Or you > could maintain a separate per-CPU linked list that gathered up the > stuff to be kfree()ed after a grace period, and some time later > feed them to kfree_rcu()? yeah, I can think of this or 10 other ways to fix it within kprobe+bpf area, but I think something like call_rcu_notrace() may be a better solution. Or may be single generic 'fix' for call_rcu will be enough if it doesn't affect all other users. > The usual consequence of racing a pair of callback insertions on the > same CPU would be that one of them gets leaked, and possible all > subsequent callbacks. So the lockup is no surprise. And there are a > lot of other assumptions in nearby code paths about only one execution > at a time from a given CPU. yes, I don't think calling 2nd call_rcu from preempt_enable violates this assumptions. local_irq does it job. No extra stuff is called when interrupts are disabled. >> Any advise on where to look is greatly appreciated. > > What I don't understand is exactly what you are trying to do. Have more > complex tracers that dynamically allocate memory? If so, having a per-CPU > list that stages memory to be freed so that it can be passed to call_rcu() > in a safe environment might make sense. Of course, that list would need > to be managed carefully! yes. We tried to compute the time the kernel spends between preempt_disable->preempt_enable and plot a histogram of latencies. > Or am I missing the point of the code below? this trace.patch is reproducer of call_rcu crashes that doing: preempt_enable trace_preempt_on kfree_call_rcu The real call stack is: preempt_enable trace_preempt_on kprobe_int3_handler trace_call_bpf bpf_map_update_elem htab_map_update_elem kree_call_rcu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/