Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1031535AbbKFCR0 (ORCPT ); Thu, 5 Nov 2015 21:17:26 -0500 Received: from mail-pa0-f41.google.com ([209.85.220.41]:36414 "EHLO mail-pa0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1031044AbbKFCRY (ORCPT ); Thu, 5 Nov 2015 21:17:24 -0500 Subject: Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu To: Steven Rostedt References: <1446533825-30160-1-git-send-email-aik@ozlabs.ru> <20151103093913.346374e2@gandalf.local.home> Cc: "Paul E. McKenney" , Paul Mackerras , David Gibson , linux-kernel@vger.kernel.org From: Alexey Kardashevskiy Message-ID: <563C0DAD.8070100@ozlabs.ru> Date: Fri, 6 Nov 2015 13:17:17 +1100 User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <20151103093913.346374e2@gandalf.local.home> Content-Type: text/plain; charset=koi8-r; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4650 Lines: 114 On 11/04/2015 01:39 AM, Steven Rostedt wrote: > On Tue, 3 Nov 2015 17:57:05 +1100 > Alexey Kardashevskiy wrote: > >> This defines list_for_each_entry_lockless. This allows safe list >> traversing in cases when lockdep() invocation is unwanted like >> real mode (MMU is off). >> >> Signed-off-by: Alexey Kardashevskiy >> --- >> >> This is for VFIO acceleration in POWERKVM for pSeries guests. >> There is a KVM instance. There also can be some VFIO (PCI passthru) >> devices attached to a KVM guest. >> >> To perform DMA, a pSeries guest registers DMA memory by calling some >> hypercalls explicitely at the rate close to one-two hcalls per >> a network packet, i.e. very often. When a guest does a hypercall >> (which is just an assembly instruction), the host kernel receives it >> in the real mode (MMU is off). When real mode fails to handle it, >> it enables MMU and tries handling a hcall in virtual mode. >> >> A logical bus ID (LIOBN) is a tagret id for these hypecalls. >> >> Each VFIO device belongs to an IOMMU group. Each group has an address >> translation table. It is allowed to have multiple IOMMU groups (i.e. >> multiple tables) under the same LIOBN. >> >> So effectively every DMA hcall has to update one or more TCE tables >> attached to the same LIOBN. RCU is used to update/traverse this list >> safely. >> >> Using RCU as is in virtual mode is fine. Lockdep works, etc. >> list_add_rcu() is used to populate the list; >> list_del_rcu() + call_rcu() used to remove groups from a list. >> These operations can happen in runtim as a result of PCI hotplug/unplug >> in guests. >> >> Using RCU as is in real mode is not fine as some RCU checks can lock up >> the system and in real mode we won't even have a chance to see any >> debug. This is why rcu_read_lock() and rcu_read_unlock() are NOT used. >> >> Previous version of this used to define list_for_each_entry_rcu_notrace() >> but it was proposed to use list_entry_lockless() instead. However >> the comment for lockless_dereference() suggests this is a good idea >> if "lifetime is managed by something other than RCU" but it is in my case. >> >> So what would be the correct approach here? Thanks. > > If the only use case for this so far is in POWERKVM, perhaps it should > be defined specifically (and in arch/powerpc) and not confuse others > about using this. > > Or, if you do imagine that this can be used in other scenarios, then a > much deeper comment must be made in the code in the kerneldoc section. > list_for_each_entry_rcu() should really be used in 99.99% of the time > in the kernel. This looks to be an extreme exception. I hate to add a > generic helper for something that will only be used in one location. No, I cannot imagine this really and I can move it somewhere in arch/powerpc. Still, is my approach correct? What does the comment for lockless_dereference() actally mean - it won't work together with RCU at all or this is to force people not to use it as "list_for_each_entry_rcu() should really be used in 99.99% of the time"? :) > > -- Steve > > >> --- >> include/linux/rculist.h | 16 ++++++++++++++++ >> 1 file changed, 16 insertions(+) >> >> diff --git a/include/linux/rculist.h b/include/linux/rculist.h >> index 17c6b1f..a83a924 100644 >> --- a/include/linux/rculist.h >> +++ b/include/linux/rculist.h >> @@ -308,6 +308,22 @@ static inline void list_splice_init_rcu(struct list_head *list, >> pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) >> >> /** >> + * list_for_each_entry_lockless - iterate over rcu list of given type >> + * @pos: the type * to use as a loop cursor. >> + * @head: the head for your list. >> + * @member: the name of the list_struct within the struct. >> + * >> + * This list-traversal primitive may safely run concurrently >> + */ >> +#define list_entry_lockless(ptr, type, member) \ >> + container_of((typeof(ptr))lockless_dereference(ptr), type, member) >> + >> +#define list_for_each_entry_lockless(pos, head, member) \ >> + for (pos = list_entry_lockless((head)->next, typeof(*pos), member); \ >> + &pos->member != (head); \ >> + pos = list_entry_lockless(pos->member.next, typeof(*pos), member)) >> + >> +/** >> * list_for_each_entry_continue_rcu - continue iteration over list of given type >> * @pos: the type * to use as a loop cursor. >> * @head: the head for your list. > -- Alexey -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/