Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp3096761rwd; Mon, 29 May 2023 05:48:41 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6SugpzQMrpukjP1vs0cLTPbkTrMR7iE/WKnjuf23Opg3QJf/sDJu5Q0HSZKL/2b5jvO9I3 X-Received: by 2002:a17:903:189:b0:1af:b92d:b5fe with SMTP id z9-20020a170903018900b001afb92db5femr12002691plg.0.1685364520813; Mon, 29 May 2023 05:48:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685364520; cv=none; d=google.com; s=arc-20160816; b=cPpcoQSVuk+O6c0jINaz5aObtX9LIEKCd0nSjGjusWtKYs4wOqf29sgNIoAhMwS1M4 XNXxqvnH1jpjPa6lgbwdHnCeHI7rCWqv1R9CLjSnOXuf7/Yo9FcuPd5h07t3hJNHmQt/ F+P7QMkLDsk6Fxk64ULdtr2X0hhgwPkBzHGbRiJsnkZLIiWKi2pguG4P2T4tvoy/p3tq wGDnFlmo724rlngkppgkNW4HpFhJKib5cnW/gWw26Z//49nR23GGAuH1TjTY53DXkdWG ahxyTbY7pehuP3YZeN/z5r5toCf/1QB3rOGwWp/cFYkMj0mlHiK/wqTb4tSdZrYdsG1Q tQWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=tImvNh+/5CloKaP7AbvgEWD6eE/XNWMAvyydjlrGOVc=; b=MQcGUieg8eiupaf6VnK90ZmK4GOOUhHRT7zRkNvU5pqtxsYOdFXIxsB8PoLaERs07q tX7k2aKUWmZG4LLSsBiRjWaCdZR4cFzwuc8zrukcinhzetN1c8n75Xvgg3w/0PDMsF1b 1mxvG8YpoW0xp7w27gRivwyWdhaiCoxIUtYbOJOKBAxL6W09CHnnCEdUdgpD6ywK37u2 QkyiLQzsmL9qhBr0AoS0cFYOUsb8WqtFquLja3MJ69uW3Cq63AirAbI0sbRmUnW+GZu3 Mer94dDAGOGK3IZnN6rckEXTTs8ViOKcYUm+rbON92H4m5U3NBVeCbwU14lfU7XNhT1A 24nw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b20-20020a637154000000b0053f21272796si3273687pgn.617.2023.05.29.05.48.28; Mon, 29 May 2023 05:48:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231758AbjE2Mhl (ORCPT + 99 others); Mon, 29 May 2023 08:37:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53526 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230174AbjE2Mhk (ORCPT ); Mon, 29 May 2023 08:37:40 -0400 Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B9FCD8 for ; Mon, 29 May 2023 05:37:36 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R691e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046060;MF=hsiangkao@linux.alibaba.com;NM=1;PH=DS;RN=3;SR=0;TI=SMTPD_---0Vjnn2Bk_1685363849; Received: from e18g06460.et15sqa.tbsite.net(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0Vjnn2Bk_1685363849) by smtp.aliyun-inc.com; Mon, 29 May 2023 20:37:33 +0800 From: Gao Xiang To: linux-erofs@lists.ozlabs.org Cc: LKML , Gao Xiang Subject: [PATCH v3 5/6] erofs: use struct lockref to replace handcrafted approach Date: Mon, 29 May 2023 20:37:27 +0800 Message-Id: <20230529123727.79943-1-hsiangkao@linux.alibaba.com> X-Mailer: git-send-email 2.24.4 In-Reply-To: <2fa6114d-9de2-9a0d-ae89-c012914bf682@linux.alibaba.com> References: <2fa6114d-9de2-9a0d-ae89-c012914bf682@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Let's avoid the current handcrafted lockref although `struct lockref` inclusion usually increases extra 4 bytes with an explicit spinlock if CONFIG_DEBUG_SPINLOCK is off. Apart from the size difference, note that the meaning of refcount is also changed to active users. IOWs, it doesn't take an extra refcount for XArray tree insertion. I don't observe any significant performance difference at least on our cloud compute server but the new one indeed simplifies the overall codebase a bit. Signed-off-by: Gao Xiang --- changes since v2: - use lockref_put_or_lock() as Yue's suggested; fs/erofs/internal.h | 38 ++------------------ fs/erofs/utils.c | 85 ++++++++++++++++++++++----------------------- fs/erofs/zdata.c | 15 ++++---- 3 files changed, 51 insertions(+), 87 deletions(-) diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 005f3391bcd3..36e32fa542f0 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -208,46 +208,12 @@ enum { EROFS_ZIP_CACHE_READAROUND }; -#define EROFS_LOCKED_MAGIC (INT_MIN | 0xE0F510CCL) - /* basic unit of the workstation of a super_block */ struct erofs_workgroup { - /* the workgroup index in the workstation */ pgoff_t index; - - /* overall workgroup reference count */ - atomic_t refcount; + struct lockref lockref; }; -static inline bool erofs_workgroup_try_to_freeze(struct erofs_workgroup *grp, - int val) -{ - preempt_disable(); - if (val != atomic_cmpxchg(&grp->refcount, val, EROFS_LOCKED_MAGIC)) { - preempt_enable(); - return false; - } - return true; -} - -static inline void erofs_workgroup_unfreeze(struct erofs_workgroup *grp, - int orig_val) -{ - /* - * other observers should notice all modifications - * in the freezing period. - */ - smp_mb(); - atomic_set(&grp->refcount, orig_val); - preempt_enable(); -} - -static inline int erofs_wait_on_workgroup_freezed(struct erofs_workgroup *grp) -{ - return atomic_cond_read_relaxed(&grp->refcount, - VAL != EROFS_LOCKED_MAGIC); -} - enum erofs_kmap_type { EROFS_NO_KMAP, /* don't map the buffer */ EROFS_KMAP, /* use kmap_local_page() to map the buffer */ @@ -486,7 +452,7 @@ static inline void erofs_pagepool_add(struct page **pagepool, struct page *page) void erofs_release_pages(struct page **pagepool); #ifdef CONFIG_EROFS_FS_ZIP -int erofs_workgroup_put(struct erofs_workgroup *grp); +void erofs_workgroup_put(struct erofs_workgroup *grp); struct erofs_workgroup *erofs_find_workgroup(struct super_block *sb, pgoff_t index); struct erofs_workgroup *erofs_insert_workgroup(struct super_block *sb, diff --git a/fs/erofs/utils.c b/fs/erofs/utils.c index 46627cb69abe..4590954cf147 100644 --- a/fs/erofs/utils.c +++ b/fs/erofs/utils.c @@ -33,22 +33,21 @@ void erofs_release_pages(struct page **pagepool) /* global shrink count (for all mounted EROFS instances) */ static atomic_long_t erofs_global_shrink_cnt; -static int erofs_workgroup_get(struct erofs_workgroup *grp) +static bool erofs_workgroup_get(struct erofs_workgroup *grp) { - int o; + if (lockref_get_not_zero(&grp->lockref)) + return true; -repeat: - o = erofs_wait_on_workgroup_freezed(grp); - if (o <= 0) - return -1; - - if (atomic_cmpxchg(&grp->refcount, o, o + 1) != o) - goto repeat; + spin_lock(&grp->lockref.lock); + if (__lockref_is_dead(&grp->lockref)) { + spin_unlock(&grp->lockref.lock); + return false; + } - /* decrease refcount paired by erofs_workgroup_put */ - if (o == 1) + if (!grp->lockref.count++) atomic_long_dec(&erofs_global_shrink_cnt); - return 0; + spin_unlock(&grp->lockref.lock); + return true; } struct erofs_workgroup *erofs_find_workgroup(struct super_block *sb, @@ -61,7 +60,7 @@ struct erofs_workgroup *erofs_find_workgroup(struct super_block *sb, rcu_read_lock(); grp = xa_load(&sbi->managed_pslots, index); if (grp) { - if (erofs_workgroup_get(grp)) { + if (!erofs_workgroup_get(grp)) { /* prefer to relax rcu read side */ rcu_read_unlock(); goto repeat; @@ -80,11 +79,10 @@ struct erofs_workgroup *erofs_insert_workgroup(struct super_block *sb, struct erofs_workgroup *pre; /* - * Bump up a reference count before making this visible - * to others for the XArray in order to avoid potential - * UAF without serialized by xa_lock. + * Bump up before making this visible to others for the XArray in order + * to avoid potential UAF without serialized by xa_lock. */ - atomic_inc(&grp->refcount); + lockref_get(&grp->lockref); repeat: xa_lock(&sbi->managed_pslots); @@ -93,13 +91,13 @@ struct erofs_workgroup *erofs_insert_workgroup(struct super_block *sb, if (pre) { if (xa_is_err(pre)) { pre = ERR_PTR(xa_err(pre)); - } else if (erofs_workgroup_get(pre)) { + } else if (!erofs_workgroup_get(pre)) { /* try to legitimize the current in-tree one */ xa_unlock(&sbi->managed_pslots); cond_resched(); goto repeat; } - atomic_dec(&grp->refcount); + lockref_put_return(&grp->lockref); grp = pre; } xa_unlock(&sbi->managed_pslots); @@ -112,38 +110,34 @@ static void __erofs_workgroup_free(struct erofs_workgroup *grp) erofs_workgroup_free_rcu(grp); } -int erofs_workgroup_put(struct erofs_workgroup *grp) +void erofs_workgroup_put(struct erofs_workgroup *grp) { - int count = atomic_dec_return(&grp->refcount); + if (lockref_put_or_lock(&grp->lockref)) + return; - if (count == 1) + DBG_BUGON(__lockref_is_dead(&grp->lockref)); + if (grp->lockref.count == 1) atomic_long_inc(&erofs_global_shrink_cnt); - else if (!count) - __erofs_workgroup_free(grp); - return count; + --grp->lockref.count; + spin_unlock(&grp->lockref.lock); } static bool erofs_try_to_release_workgroup(struct erofs_sb_info *sbi, struct erofs_workgroup *grp) { - /* - * If managed cache is on, refcount of workgroups - * themselves could be < 0 (freezed). In other words, - * there is no guarantee that all refcounts > 0. - */ - if (!erofs_workgroup_try_to_freeze(grp, 1)) - return false; + int free = false; + + spin_lock(&grp->lockref.lock); + if (grp->lockref.count) + goto out; /* - * Note that all cached pages should be unattached - * before deleted from the XArray. Otherwise some - * cached pages could be still attached to the orphan - * old workgroup when the new one is available in the tree. + * Note that all cached pages should be detached before deleted from + * the XArray. Otherwise some cached pages could be still attached to + * the orphan old workgroup when the new one is available in the tree. */ - if (erofs_try_to_free_all_cached_pages(sbi, grp)) { - erofs_workgroup_unfreeze(grp, 1); - return false; - } + if (erofs_try_to_free_all_cached_pages(sbi, grp)) + goto out; /* * It's impossible to fail after the workgroup is freezed, @@ -152,10 +146,13 @@ static bool erofs_try_to_release_workgroup(struct erofs_sb_info *sbi, */ DBG_BUGON(__xa_erase(&sbi->managed_pslots, grp->index) != grp); - /* last refcount should be connected with its managed pslot. */ - erofs_workgroup_unfreeze(grp, 0); - __erofs_workgroup_free(grp); - return true; + lockref_mark_dead(&grp->lockref); + free = true; +out: + spin_unlock(&grp->lockref.lock); + if (free) + __erofs_workgroup_free(grp); + return free; } static unsigned long erofs_shrink_workstation(struct erofs_sb_info *sbi, diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index c556906354e5..637a964ff110 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -641,7 +641,7 @@ int erofs_try_to_free_all_cached_pages(struct erofs_sb_info *sbi, DBG_BUGON(z_erofs_is_inline_pcluster(pcl)); /* - * refcount of workgroup is now freezed as 1, + * refcount of workgroup is now freezed as 0, * therefore no need to worry about available decompression users. */ for (i = 0; i < pcl->pclusterpages; ++i) { @@ -674,10 +674,11 @@ static bool z_erofs_cache_release_folio(struct folio *folio, gfp_t gfp) if (!folio_test_private(folio)) return true; - if (!erofs_workgroup_try_to_freeze(&pcl->obj, 1)) - return false; - ret = false; + spin_lock(&pcl->obj.lockref.lock); + if (pcl->obj.lockref.count > 0) + goto out; + DBG_BUGON(z_erofs_is_inline_pcluster(pcl)); for (i = 0; i < pcl->pclusterpages; ++i) { if (pcl->compressed_bvecs[i].page == &folio->page) { @@ -686,10 +687,10 @@ static bool z_erofs_cache_release_folio(struct folio *folio, gfp_t gfp) break; } } - erofs_workgroup_unfreeze(&pcl->obj, 1); - if (ret) folio_detach_private(folio); +out: + spin_unlock(&pcl->obj.lockref.lock); return ret; } @@ -805,7 +806,7 @@ static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *fe) if (IS_ERR(pcl)) return PTR_ERR(pcl); - atomic_set(&pcl->obj.refcount, 1); + spin_lock_init(&pcl->obj.lockref.lock); pcl->algorithmformat = map->m_algorithmformat; pcl->length = 0; pcl->partial = true; -- 2.24.4