Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756565AbbFPOQq (ORCPT ); Tue, 16 Jun 2015 10:16:46 -0400 Received: from e39.co.us.ibm.com ([32.97.110.160]:47354 "EHLO e39.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756429AbbFPOQb (ORCPT ); Tue, 16 Jun 2015 10:16:31 -0400 X-Helo: d03dlp02.boulder.ibm.com X-MailFrom: paulmck@linux.vnet.ibm.com X-RcptTo: linux-kernel@vger.kernel.org Date: Tue, 16 Jun 2015 07:16:26 -0700 From: "Paul E. McKenney" To: Daniel Wagner Cc: Alexei Starovoitov , Daniel Wagner , LKML , rostedt@goodmis.org Subject: Re: call_rcu from trace_preempt Message-ID: <20150616141626.GI3913@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <557F509D.2000509@plumgrid.com> <20150615230702.GB3913@linux.vnet.ibm.com> <557F7764.5060707@plumgrid.com> <20150616021458.GE3913@linux.vnet.ibm.com> <557FB7E1.6080004@plumgrid.com> <20150616122733.GG3913@linux.vnet.ibm.com> <558018DD.1080701@monom.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <558018DD.1080701@monom.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15061614-0033-0000-0000-000004DBBE97 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4457 Lines: 146 On Tue, Jun 16, 2015 at 02:38:53PM +0200, Daniel Wagner wrote: > On 06/16/2015 02:27 PM, Paul E. McKenney wrote: > > On Mon, Jun 15, 2015 at 10:45:05PM -0700, Alexei Starovoitov wrote: > >> On 6/15/15 7:14 PM, Paul E. McKenney wrote: > >>> > >>> Why do you believe that it is better to fix it within call_rcu()? > >> > >> found it: > >> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > >> index 8cf7304b2867..a3be09d482ae 100644 > >> --- a/kernel/rcu/tree.c > >> +++ b/kernel/rcu/tree.c > >> @@ -935,9 +935,9 @@ bool notrace rcu_is_watching(void) > >> { > >> bool ret; > >> > >> - preempt_disable(); > >> + preempt_disable_notrace(); > >> ret = __rcu_is_watching(); > >> - preempt_enable(); > >> + preempt_enable_notrace(); > >> return ret; > >> } > >> > >> the rcu_is_watching() and __rcu_is_watching() are already marked > >> notrace, so imo it's a good 'fix'. > >> What was happening is that the above preempt_enable was triggering > >> recursive call_rcu that was indeed messing 'rdp' that was > >> prepared by __call_rcu and before __call_rcu_core could use that. > > > >> btw, also noticed that local_irq_save done by note_gp_changes > >> is partially redundant. In __call_rcu_core path the irqs are > >> already disabled. > > > > But you said earlier that nothing happened when interrupts were > > disabled. And interrupts are disabled across the call to > > rcu_is_watching() in __call_rcu_core(). So why did those calls > > to preempt_disable() and preempt_enable() cause trouble? > > Just for the record: Using a thread for freeing the memory is curing the > problem without the need to modify rcu_is_watching. I must confess to liking this approach better than guaranteeing full-up reentrancy in call_rcu() and kfree_rcu(). ;-) Thanx, Paul > Here is the hack: > > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > index 83c209d..8d73be3 100644 > --- a/kernel/bpf/hashtab.c > +++ b/kernel/bpf/hashtab.c > @@ -13,6 +13,8 @@ > #include > #include > #include > +#include > +#include > > struct bpf_htab { > struct bpf_map map; > @@ -27,10 +29,14 @@ struct bpf_htab { > struct htab_elem { > struct hlist_node hash_node; > struct rcu_head rcu; > + struct list_head list; > u32 hash; > char key[0] __aligned(8); > }; > > +static LIST_HEAD(elem_freelist); > +static DEFINE_SPINLOCK(elem_freelist_lock); > + > /* Called from syscall */ > static struct bpf_map *htab_map_alloc(union bpf_attr *attr) > { > @@ -228,6 +234,7 @@ static int htab_map_update_elem(struct bpf_map *map, > void *key, void *value, > memcpy(l_new->key + round_up(key_size, 8), value, map->value_size); > > l_new->hash = htab_map_hash(l_new->key, key_size); > + INIT_LIST_HEAD(&l_new->list); > > /* bpf_map_update_elem() can be called in_irq() */ > spin_lock_irqsave(&htab->lock, flags); > @@ -300,11 +307,17 @@ static int htab_map_delete_elem(struct bpf_map > *map, void *key) > if (l) { > hlist_del_rcu(&l->hash_node); > htab->count--; > - kfree_rcu(l, rcu); > + /* kfree_rcu(l, rcu); */ > ret = 0; > } > > spin_unlock_irqrestore(&htab->lock, flags); > + > + if (l) { > + spin_lock_irqsave(&elem_freelist_lock, flags); > + list_add(&l->list, &elem_freelist); > + spin_unlock_irqrestore(&elem_freelist_lock, flags); > + } > return ret; > } > > @@ -359,9 +372,31 @@ static struct bpf_map_type_list htab_type > __read_mostly = { > .type = BPF_MAP_TYPE_HASH, > }; > > +static int free_thread(void *arg) > +{ > + unsigned long flags; > + struct htab_elem *l; > + > + while (!kthread_should_stop()) { > + spin_lock_irqsave(&elem_freelist_lock, flags); > + while (!list_empty(&elem_freelist)) { > + l = list_entry(elem_freelist.next, > + struct htab_elem, list); > + list_del(&l->list); > + kfree(l); > + } > + spin_unlock_irqrestore(&elem_freelist_lock, flags); > + } > + > + return 0; > +} > + > static int __init register_htab_map(void) > { > bpf_register_map_type(&htab_type); > + > + kthread_run(free_thread, NULL, "free_thread"); > + > return 0; > } > late_initcall(register_htab_map); > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/