Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932174AbbKCG6P (ORCPT ); Tue, 3 Nov 2015 01:58:15 -0500 Received: from e23smtp07.au.ibm.com ([202.81.31.140]:55913 "EHLO e23smtp07.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753772AbbKCG6M (ORCPT ); Tue, 3 Nov 2015 01:58:12 -0500 X-Helo: d23dlp02.au.ibm.com X-MailFrom: aik@ozlabs.ru X-RcptTo: linux-kernel@vger.kernel.org From: Alexey Kardashevskiy To: "Paul E. McKenney" Cc: Alexey Kardashevskiy , Steven Rostedt , Paul Mackerras , David Gibson , linux-kernel@vger.kernel.org Subject: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu Date: Tue, 3 Nov 2015 17:57:05 +1100 Message-Id: <1446533825-30160-1-git-send-email-aik@ozlabs.ru> X-Mailer: git-send-email 2.5.0.rc3 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15110306-0025-0000-0000-0000025B72B2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3413 Lines: 83 This defines list_for_each_entry_lockless. This allows safe list traversing in cases when lockdep() invocation is unwanted like real mode (MMU is off). Signed-off-by: Alexey Kardashevskiy --- This is for VFIO acceleration in POWERKVM for pSeries guests. There is a KVM instance. There also can be some VFIO (PCI passthru) devices attached to a KVM guest. To perform DMA, a pSeries guest registers DMA memory by calling some hypercalls explicitely at the rate close to one-two hcalls per a network packet, i.e. very often. When a guest does a hypercall (which is just an assembly instruction), the host kernel receives it in the real mode (MMU is off). When real mode fails to handle it, it enables MMU and tries handling a hcall in virtual mode. A logical bus ID (LIOBN) is a tagret id for these hypecalls. Each VFIO device belongs to an IOMMU group. Each group has an address translation table. It is allowed to have multiple IOMMU groups (i.e. multiple tables) under the same LIOBN. So effectively every DMA hcall has to update one or more TCE tables attached to the same LIOBN. RCU is used to update/traverse this list safely. Using RCU as is in virtual mode is fine. Lockdep works, etc. list_add_rcu() is used to populate the list; list_del_rcu() + call_rcu() used to remove groups from a list. These operations can happen in runtim as a result of PCI hotplug/unplug in guests. Using RCU as is in real mode is not fine as some RCU checks can lock up the system and in real mode we won't even have a chance to see any debug. This is why rcu_read_lock() and rcu_read_unlock() are NOT used. Previous version of this used to define list_for_each_entry_rcu_notrace() but it was proposed to use list_entry_lockless() instead. However the comment for lockless_dereference() suggests this is a good idea if "lifetime is managed by something other than RCU" but it is in my case. So what would be the correct approach here? Thanks. --- include/linux/rculist.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/include/linux/rculist.h b/include/linux/rculist.h index 17c6b1f..a83a924 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -308,6 +308,22 @@ static inline void list_splice_init_rcu(struct list_head *list, pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) /** + * list_for_each_entry_lockless - iterate over rcu list of given type + * @pos: the type * to use as a loop cursor. + * @head: the head for your list. + * @member: the name of the list_struct within the struct. + * + * This list-traversal primitive may safely run concurrently + */ +#define list_entry_lockless(ptr, type, member) \ + container_of((typeof(ptr))lockless_dereference(ptr), type, member) + +#define list_for_each_entry_lockless(pos, head, member) \ + for (pos = list_entry_lockless((head)->next, typeof(*pos), member); \ + &pos->member != (head); \ + pos = list_entry_lockless(pos->member.next, typeof(*pos), member)) + +/** * list_for_each_entry_continue_rcu - continue iteration over list of given type * @pos: the type * to use as a loop cursor. * @head: the head for your list. -- 2.5.0.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/