Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754746AbbKCOjf (ORCPT ); Tue, 3 Nov 2015 09:39:35 -0500 Received: from smtprelay0046.hostedemail.com ([216.40.44.46]:49579 "EHLO smtprelay.hostedemail.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754010AbbKCOje (ORCPT ); Tue, 3 Nov 2015 09:39:34 -0500 X-Session-Marker: 726F737465647440676F6F646D69732E6F7267 X-Spam-Summary: 2,0,0,,d41d8cd98f00b204,rostedt@goodmis.org,:::::::::,RULES_HIT:41:355:379:541:599:800:960:968:973:988:989:1260:1277:1311:1313:1314:1345:1359:1431:1437:1515:1516:1518:1534:1543:1593:1594:1605:1711:1730:1747:1777:1792:1981:2194:2199:2393:2553:2559:2562:2693:2895:2898:3138:3139:3140:3141:3142:3622:3865:3866:3867:3868:3870:3871:3872:3873:3874:4250:4321:4559:5007:6119:6261:7875:7903:8526:9010:9207:10004:10394:10400:10450:10455:10848:10967:11026:11232:11658:11914:12043:12114:12296:12438:12517:12519:12555:12740:13172:13229:13972:14093:14097:14659:19904:19999:21080:30003:30012:30025:30029:30054:30070:30080:30090:30091,0,RBL:none,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:2,LUA_SUMMARY:none X-HE-Tag: dust29_50b94f345a748 X-Filterd-Recvd-Size: 4948 Date: Tue, 3 Nov 2015 09:39:13 -0500 From: Steven Rostedt To: Alexey Kardashevskiy Cc: "Paul E. McKenney" , Paul Mackerras , David Gibson , linux-kernel@vger.kernel.org Subject: Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu Message-ID: <20151103093913.346374e2@gandalf.local.home> In-Reply-To: <1446533825-30160-1-git-send-email-aik@ozlabs.ru> References: <1446533825-30160-1-git-send-email-aik@ozlabs.ru> X-Mailer: Claws Mail 3.12.0 (GTK+ 2.24.28; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4152 Lines: 98 On Tue, 3 Nov 2015 17:57:05 +1100 Alexey Kardashevskiy wrote: > This defines list_for_each_entry_lockless. This allows safe list > traversing in cases when lockdep() invocation is unwanted like > real mode (MMU is off). > > Signed-off-by: Alexey Kardashevskiy > --- > > This is for VFIO acceleration in POWERKVM for pSeries guests. > There is a KVM instance. There also can be some VFIO (PCI passthru) > devices attached to a KVM guest. > > To perform DMA, a pSeries guest registers DMA memory by calling some > hypercalls explicitely at the rate close to one-two hcalls per > a network packet, i.e. very often. When a guest does a hypercall > (which is just an assembly instruction), the host kernel receives it > in the real mode (MMU is off). When real mode fails to handle it, > it enables MMU and tries handling a hcall in virtual mode. > > A logical bus ID (LIOBN) is a tagret id for these hypecalls. > > Each VFIO device belongs to an IOMMU group. Each group has an address > translation table. It is allowed to have multiple IOMMU groups (i.e. > multiple tables) under the same LIOBN. > > So effectively every DMA hcall has to update one or more TCE tables > attached to the same LIOBN. RCU is used to update/traverse this list > safely. > > Using RCU as is in virtual mode is fine. Lockdep works, etc. > list_add_rcu() is used to populate the list; > list_del_rcu() + call_rcu() used to remove groups from a list. > These operations can happen in runtim as a result of PCI hotplug/unplug > in guests. > > Using RCU as is in real mode is not fine as some RCU checks can lock up > the system and in real mode we won't even have a chance to see any > debug. This is why rcu_read_lock() and rcu_read_unlock() are NOT used. > > Previous version of this used to define list_for_each_entry_rcu_notrace() > but it was proposed to use list_entry_lockless() instead. However > the comment for lockless_dereference() suggests this is a good idea > if "lifetime is managed by something other than RCU" but it is in my case. > > So what would be the correct approach here? Thanks. If the only use case for this so far is in POWERKVM, perhaps it should be defined specifically (and in arch/powerpc) and not confuse others about using this. Or, if you do imagine that this can be used in other scenarios, then a much deeper comment must be made in the code in the kerneldoc section. list_for_each_entry_rcu() should really be used in 99.99% of the time in the kernel. This looks to be an extreme exception. I hate to add a generic helper for something that will only be used in one location. -- Steve > --- > include/linux/rculist.h | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/include/linux/rculist.h b/include/linux/rculist.h > index 17c6b1f..a83a924 100644 > --- a/include/linux/rculist.h > +++ b/include/linux/rculist.h > @@ -308,6 +308,22 @@ static inline void list_splice_init_rcu(struct list_head *list, > pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) > > /** > + * list_for_each_entry_lockless - iterate over rcu list of given type > + * @pos: the type * to use as a loop cursor. > + * @head: the head for your list. > + * @member: the name of the list_struct within the struct. > + * > + * This list-traversal primitive may safely run concurrently > + */ > +#define list_entry_lockless(ptr, type, member) \ > + container_of((typeof(ptr))lockless_dereference(ptr), type, member) > + > +#define list_for_each_entry_lockless(pos, head, member) \ > + for (pos = list_entry_lockless((head)->next, typeof(*pos), member); \ > + &pos->member != (head); \ > + pos = list_entry_lockless(pos->member.next, typeof(*pos), member)) > + > +/** > * list_for_each_entry_continue_rcu - continue iteration over list of given type > * @pos: the type * to use as a loop cursor. > * @head: the head for your list. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/