Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp1419698rdb; Wed, 6 Dec 2023 20:17:17 -0800 (PST) X-Google-Smtp-Source: AGHT+IEpidWJenucZ9OQcnJVjfqRHUKzJB7Uic7I6deNG9okv4OOhxedqnCzaQ4oQkGfU+cXsCFs X-Received: by 2002:a05:6358:9f83:b0:170:17eb:204e with SMTP id fy3-20020a0563589f8300b0017017eb204emr1953103rwb.55.1701922636946; Wed, 06 Dec 2023 20:17:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701922636; cv=none; d=google.com; s=arc-20160816; b=iXjAzHH7WzQaQCyoWp3iUNqSstCKD1sxncUA1XSMiTwFQw7VWstGDAplQn/nUrt0eT ARHp+u1VsSp7GOxPXp5iVqCsXJu+yCQsrZO57okKfPDcYCzkmwSSoAQJrZyw01aS6Fep uNXzbyaUzFO9Jbk4473O/qql+JIFZBqD9bPG4+EjuAmDT6lI23WdGLGwa4w5g78rlzjn kXnqp6IhLWlPwY0IpOxp7+FDlH6onpGiJ9SJp0Q812Bthl63pxZj4LzICjZLQLichIBz jmjUoFsc3s7AcZdrFq1ZG44J5gxkeJ3PkRNrw2Pv8rfsiSU5s5NMC6tK+O3/7YuXgYEv ICzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:dkim-signature:date; bh=LNe/f5rAqesLPfJC2GbJ9GN0ksK5lZDul4OlDPlZJWI=; fh=OvCUO7VSi2qhFSf1pAk0DSxAfqIFciTB3JbkL+4/94M=; b=0S5lgmrIQB4QwMcgGc0M4KCh0q3N1pG4Tu9v5MEQQgIsV2fcb1nhA95RH1nqzO1JKK JgXnQfZdzTggKuNJAa/WhLP5dj6ahl57rv919ksKdL9oXlaGXFRFQ1DMLxQuNN/FqPNh zhAa8j3IK/FqxthGt8IcLwm60oPNngpPoVShJaMeEv3tjUSwnuthdcYVWm7vtI+APuvH anVioOYw5i5Crewm0xTBEFnapsTHiVXLIusu6r3aOSnWyirpux2P2+JM7rwoMG/ouOG/ Nz3oI/wMdcPGqJITZFPnmMk4iW8Sr3QBK7yYSxVYdI16QFLR2Xnc5VXryv99f5wBOfDL 91fQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=jJovVp0P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id n189-20020a6327c6000000b005c6b4e4abe2si464169pgn.169.2023.12.06.20.17.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Dec 2023 20:17:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=jJovVp0P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id B0D88801F22A; Wed, 6 Dec 2023 20:17:13 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229484AbjLGEQy (ORCPT + 99 others); Wed, 6 Dec 2023 23:16:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60918 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229449AbjLGEQw (ORCPT ); Wed, 6 Dec 2023 23:16:52 -0500 Received: from out-185.mta1.migadu.com (out-185.mta1.migadu.com [95.215.58.185]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C54DEA3; Wed, 6 Dec 2023 20:16:56 -0800 (PST) Date: Wed, 6 Dec 2023 23:16:50 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1701922614; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=LNe/f5rAqesLPfJC2GbJ9GN0ksK5lZDul4OlDPlZJWI=; b=jJovVp0PMthG0Fxd+36NA6aiVeZ0cUop/mARCXienhJod0P8rjipoXIV3vyPsRwR++Hltt KZos0d6bAQlhsYSXLnBC8LyY4chkQWLGfLDZHfGVIbBecpWDyhIu9oyZGDpWDN25rl6KDv 9810V143X27MqeRBM+FMKQqyc6/x2t4= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Dave Chinner Cc: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-cachefs@redhat.com, dhowells@redhat.com, gfs2@lists.linux.dev, dm-devel@lists.linux.dev, linux-security-module@vger.kernel.org, selinux@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 10/11] list_bl: don't use bit locks for PREEMPT_RT or lockdep Message-ID: <20231207041650.3tzzmv2jfrr5vppl@moria.home.lan> References: <20231206060629.2827226-1-david@fromorbit.com> <20231206060629.2827226-11-david@fromorbit.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231206060629.2827226-11-david@fromorbit.com> X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 06 Dec 2023 20:17:13 -0800 (PST) On Wed, Dec 06, 2023 at 05:05:39PM +1100, Dave Chinner wrote: > From: Dave Chinner > > hash-bl nests spinlocks inside the bit locks. This causes problems > for CONFIG_PREEMPT_RT which converts spin locks to sleeping locks, > and we're not allowed to sleep while holding a spinning lock. > > Further, lockdep does not support bit locks, so we lose lockdep > coverage of the inode hash table with the hash-bl conversion. > > To enable these configs to work, add an external per-chain spinlock > to the hlist_bl_head() and add helpers to use this instead of the > bit spinlock when preempt_rt or lockdep are enabled. > > This converts all users of hlist-bl to use the external spinlock in > these situations, so we also gain lockdep coverage of things like > the dentry cache hash table with this change. > > Signed-off-by: Dave Chinner Sleepable bit locks can be done with wait_on_bit(), is that worth considering for PREEMPT_RT? Or are the other features of real locks important there? (not a request for the current patchset, just perhaps a note for future work) Reviewed-by: Kent Overstreet > --- > include/linux/list_bl.h | 126 ++++++++++++++++++++++++++++--------- > include/linux/rculist_bl.h | 13 ++++ > 2 files changed, 110 insertions(+), 29 deletions(-) > > diff --git a/include/linux/list_bl.h b/include/linux/list_bl.h > index 8ee2bf5af131..990ad8e24e0b 100644 > --- a/include/linux/list_bl.h > +++ b/include/linux/list_bl.h > @@ -4,14 +4,27 @@ > > #include > #include > +#include > > /* > * Special version of lists, where head of the list has a lock in the lowest > * bit. This is useful for scalable hash tables without increasing memory > * footprint overhead. > * > - * For modification operations, the 0 bit of hlist_bl_head->first > - * pointer must be set. > + * Whilst the general use of bit spin locking is considered safe, PREEMPT_RT > + * introduces a problem with nesting spin locks inside bit locks: spin locks > + * become sleeping locks, and we can't sleep inside spinning locks such as bit > + * locks. However, for RTPREEMPT, performance is less of an issue than > + * correctness, so we trade off the memory and cache footprint of a spinlock per > + * list so the list locks are converted to sleeping locks and work correctly > + * with PREEMPT_RT kernels. > + * > + * An added advantage of this is that we can use the same trick when lockdep is > + * enabled (again, performance doesn't matter) and gain lockdep coverage of all > + * the hash-bl operations. > + * > + * For modification operations when using pure bit locking, the 0 bit of > + * hlist_bl_head->first pointer must be set. > * > * With some small modifications, this can easily be adapted to store several > * arbitrary bits (not just a single lock bit), if the need arises to store > @@ -30,16 +43,21 @@ > #define LIST_BL_BUG_ON(x) > #endif > > +#undef LIST_BL_USE_SPINLOCKS > +#if defined(CONFIG_PREEMPT_RT) || defined(CONFIG_LOCKDEP) > +#define LIST_BL_USE_SPINLOCKS 1 > +#endif > > struct hlist_bl_head { > struct hlist_bl_node *first; > +#ifdef LIST_BL_USE_SPINLOCKS > + spinlock_t lock; > +#endif > }; > > struct hlist_bl_node { > struct hlist_bl_node *next, **pprev; > }; > -#define INIT_HLIST_BL_HEAD(ptr) \ > - ((ptr)->first = NULL) > > static inline void INIT_HLIST_BL_NODE(struct hlist_bl_node *h) > { > @@ -54,6 +72,69 @@ static inline bool hlist_bl_unhashed(const struct hlist_bl_node *h) > return !h->pprev; > } > > +#ifdef LIST_BL_USE_SPINLOCKS > +#define INIT_HLIST_BL_HEAD(ptr) do { \ > + (ptr)->first = NULL; \ > + spin_lock_init(&(ptr)->lock); \ > +} while (0) > + > +static inline void hlist_bl_lock(struct hlist_bl_head *b) > +{ > + spin_lock(&b->lock); > +} > + > +static inline void hlist_bl_unlock(struct hlist_bl_head *b) > +{ > + spin_unlock(&b->lock); > +} > + > +static inline bool hlist_bl_is_locked(struct hlist_bl_head *b) > +{ > + return spin_is_locked(&b->lock); > +} > + > +static inline struct hlist_bl_node *hlist_bl_first(struct hlist_bl_head *h) > +{ > + return h->first; > +} > + > +static inline void hlist_bl_set_first(struct hlist_bl_head *h, > + struct hlist_bl_node *n) > +{ > + h->first = n; > +} > + > +static inline void hlist_bl_set_before(struct hlist_bl_node **pprev, > + struct hlist_bl_node *n) > +{ > + WRITE_ONCE(*pprev, n); > +} > + > +static inline bool hlist_bl_empty(const struct hlist_bl_head *h) > +{ > + return !READ_ONCE(h->first); > +} > + > +#else /* !LIST_BL_USE_SPINLOCKS */ > + > +#define INIT_HLIST_BL_HEAD(ptr) \ > + ((ptr)->first = NULL) > + > +static inline void hlist_bl_lock(struct hlist_bl_head *b) > +{ > + bit_spin_lock(0, (unsigned long *)b); > +} > + > +static inline void hlist_bl_unlock(struct hlist_bl_head *b) > +{ > + __bit_spin_unlock(0, (unsigned long *)b); > +} > + > +static inline bool hlist_bl_is_locked(struct hlist_bl_head *b) > +{ > + return bit_spin_is_locked(0, (unsigned long *)b); > +} > + > static inline struct hlist_bl_node *hlist_bl_first(struct hlist_bl_head *h) > { > return (struct hlist_bl_node *) > @@ -69,11 +150,21 @@ static inline void hlist_bl_set_first(struct hlist_bl_head *h, > h->first = (struct hlist_bl_node *)((unsigned long)n | LIST_BL_LOCKMASK); > } > > +static inline void hlist_bl_set_before(struct hlist_bl_node **pprev, > + struct hlist_bl_node *n) > +{ > + WRITE_ONCE(*pprev, > + (struct hlist_bl_node *) > + ((uintptr_t)n | ((uintptr_t)*pprev & LIST_BL_LOCKMASK))); > +} > + > static inline bool hlist_bl_empty(const struct hlist_bl_head *h) > { > return !((unsigned long)READ_ONCE(h->first) & ~LIST_BL_LOCKMASK); > } > > +#endif /* LIST_BL_USE_SPINLOCKS */ > + > static inline void hlist_bl_add_head(struct hlist_bl_node *n, > struct hlist_bl_head *h) > { > @@ -94,11 +185,7 @@ static inline void hlist_bl_add_before(struct hlist_bl_node *n, > n->pprev = pprev; > n->next = next; > next->pprev = &n->next; > - > - /* pprev may be `first`, so be careful not to lose the lock bit */ > - WRITE_ONCE(*pprev, > - (struct hlist_bl_node *) > - ((uintptr_t)n | ((uintptr_t)*pprev & LIST_BL_LOCKMASK))); > + hlist_bl_set_before(pprev, n); > } > > static inline void hlist_bl_add_behind(struct hlist_bl_node *n, > @@ -119,11 +206,7 @@ static inline void __hlist_bl_del(struct hlist_bl_node *n) > > LIST_BL_BUG_ON((unsigned long)n & LIST_BL_LOCKMASK); > > - /* pprev may be `first`, so be careful not to lose the lock bit */ > - WRITE_ONCE(*pprev, > - (struct hlist_bl_node *) > - ((unsigned long)next | > - ((unsigned long)*pprev & LIST_BL_LOCKMASK))); > + hlist_bl_set_before(pprev, next); > if (next) > next->pprev = pprev; > } > @@ -165,21 +248,6 @@ static inline bool hlist_bl_fake(struct hlist_bl_node *n) > return n->pprev == &n->next; > } > > -static inline void hlist_bl_lock(struct hlist_bl_head *b) > -{ > - bit_spin_lock(0, (unsigned long *)b); > -} > - > -static inline void hlist_bl_unlock(struct hlist_bl_head *b) > -{ > - __bit_spin_unlock(0, (unsigned long *)b); > -} > - > -static inline bool hlist_bl_is_locked(struct hlist_bl_head *b) > -{ > - return bit_spin_is_locked(0, (unsigned long *)b); > -} > - > /** > * hlist_bl_for_each_entry - iterate over list of given type > * @tpos: the type * to use as a loop cursor. > diff --git a/include/linux/rculist_bl.h b/include/linux/rculist_bl.h > index 0b952d06eb0b..2d5eb5153121 100644 > --- a/include/linux/rculist_bl.h > +++ b/include/linux/rculist_bl.h > @@ -8,6 +8,18 @@ > #include > #include > > +#ifdef LIST_BL_USE_SPINLOCKS > +static inline void hlist_bl_set_first_rcu(struct hlist_bl_head *h, > + struct hlist_bl_node *n) > +{ > + rcu_assign_pointer(h->first, n); > +} > + > +static inline struct hlist_bl_node *hlist_bl_first_rcu(struct hlist_bl_head *h) > +{ > + return rcu_dereference_check(h->first, hlist_bl_is_locked(h)); > +} > +#else /* !LIST_BL_USE_SPINLOCKS */ > static inline void hlist_bl_set_first_rcu(struct hlist_bl_head *h, > struct hlist_bl_node *n) > { > @@ -23,6 +35,7 @@ static inline struct hlist_bl_node *hlist_bl_first_rcu(struct hlist_bl_head *h) > return (struct hlist_bl_node *) > ((unsigned long)rcu_dereference_check(h->first, hlist_bl_is_locked(h)) & ~LIST_BL_LOCKMASK); > } > +#endif /* LIST_BL_USE_SPINLOCKS */ > > /** > * hlist_bl_del_rcu - deletes entry from hash list without re-initialization > -- > 2.42.0 >