Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp5949834imm; Sat, 19 May 2018 13:21:45 -0700 (PDT) X-Google-Smtp-Source: AB8JxZol32BISDtQsrjJvGclPRMSmyte3DDxgmnEEESNyhRmsFEKby/P/21jChojzTJHtOWiGcE8 X-Received: by 2002:a62:f20d:: with SMTP id m13-v6mr14449447pfh.170.1526761305900; Sat, 19 May 2018 13:21:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526761305; cv=none; d=google.com; s=arc-20160816; b=mGGADhvQ9+T+1LRNsi/HYo80IH45GfWzhtq/UbceUsYcV0Y9JwRynTWyRbPCLOmmCD VkCO2Wy9wiqUypcWY6i3xr/S5Rx7wZyffx36QdQbXWfSCP1hiTGuIxRRq9QIZQFkhp8c 4SwAim45dgFH49BooQ/HVK6esHWroGV3f+cDwCQwbYEYKe56joHidlnrJ0dSvx166T7U ODUy8OsMwPXBksOctJxwaJbITGQM9XsGc6sD65xTrItQap25vJefN8r6Hbw46NTuZGIC Pok5M+PM1/hyb40VWoUQeN1YtqZrMr9g02xg0r8VvTezf7jp8v2o8N0Rx+7hc4m+ogZC R9Lw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=F/B350g/K8WJB9SpsSqursR9xws5okgpjf017TroK0M=; b=ppiAIdVJl+kAYUJ7snvUy7jeqpq+QKp4xLT62D2aX5KWByDl6q/MGGeyKvOnsj0sOF exq8BOXMbTuICaIhACzgRp2cvAc17RNMGbwD73tSSmGQ1R4aBDMjMePTlJOznl6rKlO0 vMLMrviVPg8J5KGTG23S5PrPmkM+pCdwMZEm9FI4+T6nk5S5mmaiZQAyf5RJ63QmVygU /GBOZa7N2UU+5GMYQ7Q1Bn02Y0c+m6+hlGOcQ798fKeokTadVeG7fXKfNlIb49vXM2zZ G1VD0oNzzvQLmU1TAgV+V9xTZHE9af62SY2bO8zvAm5VCZ78Stij0R0qBXMNzBHx53tn V40g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@profitbricks-com.20150623.gappssmtp.com header.s=20150623 header.b=NUwC1dSr; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g12-v6si9981403plo.406.2018.05.19.13.21.22; Sat, 19 May 2018 13:21:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@profitbricks-com.20150623.gappssmtp.com header.s=20150623 header.b=NUwC1dSr; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752537AbeESUVM (ORCPT + 99 others); Sat, 19 May 2018 16:21:12 -0400 Received: from mail-it0-f67.google.com ([209.85.214.67]:39920 "EHLO mail-it0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752424AbeESUVK (ORCPT ); Sat, 19 May 2018 16:21:10 -0400 Received: by mail-it0-f67.google.com with SMTP id c3-v6so17134961itj.4 for ; Sat, 19 May 2018 13:21:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=profitbricks-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=F/B350g/K8WJB9SpsSqursR9xws5okgpjf017TroK0M=; b=NUwC1dSr7W7XdaNJwxop0574C/e0hi1WYDKfpfOVkf0/y7TXnRrZpSiuh1pMBNLY6G UDeSqu+VqeWQPpXiHa1Gxr6MKjLmdaQRs3gEpNu5liVNQ47CEPdovFnYQw+Clj0JUtUt biHy+AZJl6rd+BqEpywKBLfmJKT4ItWz6Hw3efK0XudRcb3nfuY2VWKjF6lEIR9NYop8 zREiIdvqwBE0/R3uMuKd6YylmN7UjwQ7wfsVm/DpZqEHDfnyklWo2L5hj1jtlNjVrCpU fvk7dbCJ2qdDNf0XGn7ZZUfxwsTU+Y8h+OLnWO249CDw2Yx5XwzMRujjXgV4mev4SPQO Ygbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=F/B350g/K8WJB9SpsSqursR9xws5okgpjf017TroK0M=; b=VdRx7jFQMSNhklB5AFGJwLjySIYqRO2nBFjAEEhoVF1vSEhi+/maAHHcrAFquF2rnr PJCELqag0ubwyKkYYLo/09I+OQA1bVQIs+YRHp/WLO9bSncbir+Tp3QH2KjVOrZTuXyq Na7SSKvbs9i9mtI2ZXhKwWssGLiHqrQvucmAU+d1rl/1iZD6nUQ/BvHcwDw+vRJrOHu+ W6jC8v5TtI/Oz82ERf/nEJQYeMzBfm2BQi+4dzLnj1VQWnk7TamJVU/0cEtrsMDbg9rY PeS+mipMCtqV3Q8go69UtZqzdlD/fg3Mz7CkB5M86uhxUkHABoIIiaNyiMblcmHXSIVI mu3A== X-Gm-Message-State: ALKqPwfOgN0ZHyZbn7DSRRi3GJQkFgtJ4mEp0va2yX26MlZxXGKZQkA2 8SyXxtKmPVs802k5qPgqrOOP1DZ8kdU3Q5A22S2b0w== X-Received: by 2002:a24:e488:: with SMTP id o130-v6mr12298106ith.37.1526761269269; Sat, 19 May 2018 13:21:09 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.140.210 with HTTP; Sat, 19 May 2018 13:20:48 -0700 (PDT) In-Reply-To: <20180519163735.GX3803@linux.vnet.ibm.com> References: <20180518130413.16997-1-roman.penyaev@profitbricks.com> <20180518130413.16997-2-roman.penyaev@profitbricks.com> <20180519163735.GX3803@linux.vnet.ibm.com> From: Roman Penyaev Date: Sat, 19 May 2018 22:20:48 +0200 Message-ID: Subject: Re: [PATCH v2 01/26] rculist: introduce list_next_or_null_rr_rcu() To: "Paul E . McKenney" Cc: linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, Jens Axboe , Christoph Hellwig , Sagi Grimberg , Bart Van Assche , Or Gerlitz , Doug Ledford , Swapnil Ingle , Danil Kipnis , Jack Wang , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, May 19, 2018 at 6:37 PM, Paul E. McKenney wrote: > On Fri, May 18, 2018 at 03:03:48PM +0200, Roman Pen wrote: >> Function is going to be used in transport over RDMA module >> in subsequent patches. >> >> Function returns next element in round-robin fashion, >> i.e. head will be skipped. NULL will be returned if list >> is observed as empty. >> >> Signed-off-by: Roman Pen >> Cc: Paul E. McKenney >> Cc: linux-kernel@vger.kernel.org >> --- >> include/linux/rculist.h | 19 +++++++++++++++++++ >> 1 file changed, 19 insertions(+) >> >> diff --git a/include/linux/rculist.h b/include/linux/rculist.h >> index 127f534fec94..b0840d5ab25a 100644 >> --- a/include/linux/rculist.h >> +++ b/include/linux/rculist.h >> @@ -339,6 +339,25 @@ static inline void list_splice_tail_init_rcu(struct list_head *list, >> }) >> >> /** >> + * list_next_or_null_rr_rcu - get next list element in round-robin fashion. >> + * @head: the head for the list. >> + * @ptr: the list head to take the next element from. >> + * @type: the type of the struct this is embedded in. >> + * @memb: the name of the list_head within the struct. >> + * >> + * Next element returned in round-robin fashion, i.e. head will be skipped, >> + * but if list is observed as empty, NULL will be returned. >> + * >> + * This primitive may safely run concurrently with the _rcu list-mutation >> + * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock(). > > Of course, all the set of list_next_or_null_rr_rcu() invocations that > are round-robining a given list must all be under the same RCU read-side > critical section. For example, the following will break badly: > > struct foo *take_rr_step(struct list_head *head, struct foo *ptr) > { > struct foo *ret; > > rcu_read_lock(); > ret = list_next_or_null_rr_rcu(head, ptr, struct foo, foolist); > rcu_read_unlock(); /* BUG */ > return ret; > } > > You need a big fat comment stating this, at the very least. The resulting > bug can be very hard to trigger and even harder to debug. > > And yes, I know that the same restriction applies to list_next_rcu() > and friends. The difference is that if you try to invoke those in an > infinite loop, you will be rapped on the knuckles as soon as you hit > the list header. Without that knuckle-rapping, RCU CPU stall warnings > might tempt people to do something broken like take_rr_step() above. Hi Paul, I need -rr behaviour for doing IO load-balancing when I choose next RDMA connection from the list in order to send a request, i.e. my code is something like the following: static struct conn *get_and_set_next_conn(void) { struct conn *conn; conn = rcu_dereferece(rcu_conn); if (unlikely(!conn)) return conn; conn = list_next_or_null_rr_rcu(&conn_list, &conn->entry, typeof(*conn), entry); rcu_assign_pointer(rcu_conn, conn); return conn; } rcu_read_lock(); conn = get_and_set_next_conn(); if (unlikely(!conn)) { /* ... */ } err = rdma_io(conn, request); rcu_read_unlock(); i.e. usage of the @next pointer is under an RCU critical section. > Is it possible to instead do some sort of list_for_each_entry_rcu()-like > macro that makes it more obvious that the whole thing need to be under > a single RCU read-side critical section? Such a macro would of course be > an infinite loop if the list never went empty, so presumably there would > be a break or return statement in there somewhere. The difference is that I do not need a loop, I take the @next conn pointer, save it for the following IO request and do IO for current IO request. It seems list_for_each_entry_rcu()-like with immediate "break" in the body of the loop does not look nice, I personally do not like it, i.e.: static struct conn *get_and_set_next_conn(void) { struct conn *conn; conn = rcu_dereferece(rcu_conn); if (unlikely(!conn)) return conn; list_for_each_entry_rr_rcu(conn, &conn_list, entry) { break; } rcu_assign_pointer(rcu_conn, conn); return conn; } or maybe I did not fully get your idea? >> + */ >> +#define list_next_or_null_rr_rcu(head, ptr, type, memb) \ >> +({ \ >> + list_next_or_null_rcu(head, ptr, type, memb) ?: \ >> + list_next_or_null_rcu(head, READ_ONCE((ptr)->next), type, memb); \ > > Are there any uses for this outside of RDMA? If not, I am with Linus. > Define this within RDMA, where a smaller number of people can more > easily be kept aware of the restrictions on use. If it turns out to be > more generally useful, we can take a look at exactly what makes sense > more globally. The only one list_for_each_entry_rcu()-like macro I am aware of is used in block/blk-mq-sched.c, is called list_for_each_entry_rcu_rr(): https://elixir.bootlin.com/linux/v4.17-rc5/source/block/blk-mq-sched.c#L370 Does it make sense to implement generic list_next_or_null_rr_rcu() reusing my list_next_or_null_rr_rcu() variant? > Even within RDMA, I strongly recommend the big fat comment called out above. > And the list_for_each_entry_rcu()-like formulation, if that can be made to > work within RDMA's code structure. > > Seem reasonable, or am I missing something here? Thanks for clear explanation. -- Roman