Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp5408667rdb; Wed, 13 Dec 2023 07:55:03 -0800 (PST) X-Google-Smtp-Source: AGHT+IFCpEOF/nTMgQmIBCC45gIh37yWz5rKhl20CdCSbdQOncoqt7I2W6iK0hn04Yp5oPS91S3Y X-Received: by 2002:a05:6a20:4295:b0:18f:97c:8260 with SMTP id o21-20020a056a20429500b0018f097c8260mr5076793pzj.106.1702482903472; Wed, 13 Dec 2023 07:55:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702482903; cv=none; d=google.com; s=arc-20160816; b=yj031BKXcuci0M//Pzg0h7HGrPF27RQWd+NIOE+12U4DB1lkkVfLYqPXT0SyMAc9YN lIkA632VpXS//DptgJ0YYV+oIqfeoYmWl1dKuhT5Yco0cAmSWLvdatbsg8o+fTVEY+I5 mVQCpj4uooQSvJIZbbxRIn1WFBBRGyxX8EXMlBzYnqQEIBqBdaTqV3TZkSkcmYm6Ye39 H8kWNt4zAgAApkO9zql5SDJZ59CXEDCKmz5G7spbzD3jkrv4uPqg0eueD0HHwHr61x0H hb6jTFrceLBa0zROx/ciN3EtPjpkNu3HAQwImNwg35tFXGSG1+iB1mVh9Q2rkUca+AwC YhFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=M7xTC+G+f1Iqi6sFQ87hq+iglTPOBVKIoeWAacl0tD0=; fh=epXdrTmLZuetiH0hOUWWaZS9cjY+X9Jqbmsy7IPdEp8=; b=qbcZcuCr3uOvbU6/vHdSk+Isdw++dGjjZIXUqHEMJo+agettM7JPewpFkR6UoC7iL9 8JoWV34tGF5iRIi6CsnX6yj0B6R+T0cpGKnHVFWVxZHlGi7lPuNOqIxdD6t0CiJJT7YR L12m2Ocmhz0I+Qsf2SWjn/1AQIbtfQkMz3MhjSCsAoCLo6WvMQQyTQ3uxJhRq0OONi7I e4v4H9HmymZBNgbJ9v9Jb+QSQUlM8Msd/7rgV4kgFDqQlXIWQ7IDww3Mv0ScUBq61N+A GPRtGMMmt/XshboSAleCosL+lonxB2ljjF70kTx1flhngP4eOqjzd7EKTGnYL9VgrqTh SbDA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20230601.gappssmtp.com header.s=20230601 header.b=xL7uCKxE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id t19-20020a056a0021d300b006c8d6276334si8924800pfj.140.2023.12.13.07.55.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Dec 2023 07:55:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20230601.gappssmtp.com header.s=20230601 header.b=xL7uCKxE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 4F4868040193; Wed, 13 Dec 2023 07:54:59 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442359AbjLMPyj (ORCPT + 99 others); Wed, 13 Dec 2023 10:54:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56362 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1442173AbjLMPyh (ORCPT ); Wed, 13 Dec 2023 10:54:37 -0500 Received: from mail-qt1-x82c.google.com (mail-qt1-x82c.google.com [IPv6:2607:f8b0:4864:20::82c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D707BB7 for ; Wed, 13 Dec 2023 07:54:42 -0800 (PST) Received: by mail-qt1-x82c.google.com with SMTP id d75a77b69052e-4259da1fc56so42212621cf.0 for ; Wed, 13 Dec 2023 07:54:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20230601.gappssmtp.com; s=20230601; t=1702482882; x=1703087682; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=M7xTC+G+f1Iqi6sFQ87hq+iglTPOBVKIoeWAacl0tD0=; b=xL7uCKxEbLQloJAJeSxCrNyRvpCzuhWofuYAJMLjBZW89eZhhr7S6dMs24E7jZBAEG StUZRcVDF0bxdYXEYa6yXc0Lo1543fdOvJBT5PdIHHOYmkBAYxF14QnP7yMD/DjgXSA+ 42IdS5ZX1kw7clQlcsvH1rRbgGoBggtwvh6Vg7gVoP5OCPpye0pm19Ci6VAMhgJNhpKg iqOuYuta4JGnxpBbwECYmgxXBahH/NnCknko05JdPKULxmtkILI2lAwz4mVWvtF9juk2 f1/jOP4ZCZPAddZQKVJAgYWmYzETBqwGrbKC1DKgYVFtBlKwwNmC5rgGGyPcGPz2MsCJ WFNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702482882; x=1703087682; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=M7xTC+G+f1Iqi6sFQ87hq+iglTPOBVKIoeWAacl0tD0=; b=HmjKvAJfwnta4BC+17eEiXWt51iIXyF3a578qlBTXJyWGFpgSmkrYKfGzkiIBUdQGy WlfS85IXaBtpfDScsJWrgakd1RkqnnJLH8QEdPGykKsclMsWSwCBw61pc4PR+QKbco3g n37q1kd3s+NzJgor8kFxyqDtZZO3tfhc/0IXZIqQ02jAVJLTLs6WvH4VS/R2Zn8z3fkX UaSacDnGfD/vLF+A1iBPq1VY9Ymq2wz8orMziRZ/FoAArZZA5R5Iwvx7fl4jN3I6WciF If+lTwd7mIBgnr1G+rezEyVtyh6sUbDVFI/roA5F9FKT+F0ZwH+lhC2CeUWPByH0OtC0 0zvg== X-Gm-Message-State: AOJu0Yy/okuVYFzCfroSc/OOkz9UzigfUo7l2rWynOWc64SWyoV6u79H YlOhjoCYQTewt9PgtCU0bZPc2g== X-Received: by 2002:a05:622a:40f:b0:425:4043:50e8 with SMTP id n15-20020a05622a040f00b00425404350e8mr10908966qtx.119.1702482881861; Wed, 13 Dec 2023 07:54:41 -0800 (PST) Received: from localhost (076-182-020-124.res.spectrum.com. [76.182.20.124]) by smtp.gmail.com with ESMTPSA id o5-20020ac841c5000000b004254355d4dcsm4982048qtm.77.2023.12.13.07.54.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Dec 2023 07:54:41 -0800 (PST) Date: Wed, 13 Dec 2023 10:54:40 -0500 From: Josef Bacik To: "Paul E. McKenney" Cc: David Sterba , kernel test robot , llvm@lists.linux.dev, oe-kbuild-all@lists.linux.dev, linux-kernel@vger.kernel.org, julia.lawall@inria.fr, clm@fb.com, dsterba@suse.com, baptiste.lepers@gmail.com Subject: Re: [paulmck-rcu:frederic.2023.12.08a 29/37] fs/btrfs/transaction.c:496:6: error: call to '__compiletime_assert_329' declared with 'error' attribute: Need native word sized stores/loads for atomicity. Message-ID: <20231213155440.GA454379@perftesting> References: <202312091837.cKaPw0Tf-lkp@intel.com> <0487d7cc-b906-4a4a-b284-9c79700b4ede@paulmck-laptop> <20231213125358.GB3001@suse.cz> <4c814394-6eab-4aca-96af-43f99fb94c01@paulmck-laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4c814394-6eab-4aca-96af-43f99fb94c01@paulmck-laptop> X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 13 Dec 2023 07:54:59 -0800 (PST) On Wed, Dec 13, 2023 at 06:56:37AM -0800, Paul E. McKenney wrote: > On Wed, Dec 13, 2023 at 01:53:58PM +0100, David Sterba wrote: > > On Sat, Dec 09, 2023 at 07:51:30AM -0800, Paul E. McKenney wrote: > > > On Sat, Dec 09, 2023 at 06:20:37PM +0800, kernel test robot wrote: > > > > tree: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git frederic.2023.12.08a > > > > head: 37843b5f561a08ae899fb791eeeb5abd992eabe2 > > > > commit: 7dd87072d40809e26503f04b79d63290288dbbac [29/37] btrfs: Adjust ->last_trans ordering in btrfs_record_root_in_trans() > > > > config: riscv-rv32_defconfig (https://download.01.org/0day-ci/archive/20231209/202312091837.cKaPw0Tf-lkp@intel.com/config) > > > > compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a) > > > > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231209/202312091837.cKaPw0Tf-lkp@intel.com/reproduce) > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > > > the same patch/commit), kindly add following tags > > > > | Reported-by: kernel test robot > > > > | Closes: https://lore.kernel.org/oe-kbuild-all/202312091837.cKaPw0Tf-lkp@intel.com/ > > > > > > > > All errors (new ones prefixed by >>): > > > > > > > > warning: unknown warning option '-Wpacked-not-aligned'; did you mean '-Wpacked-non-pod'? [-Wunknown-warning-option] > > > > warning: unknown warning option '-Wstringop-truncation'; did you mean '-Wstring-concatenation'? [-Wunknown-warning-option] > > > > warning: unknown warning option '-Wmaybe-uninitialized'; did you mean '-Wuninitialized'? [-Wunknown-warning-option] > > > > >> fs/btrfs/transaction.c:496:6: error: call to '__compiletime_assert_329' declared with 'error' attribute: Need native word sized stores/loads for atomicity. > > > > 496 | if (smp_load_acquire(&root->last_trans) == trans->transid && /* ^^^ */ > > > > | ^ > > > > > > Ooooh!!! :-/ > > > > > > From what I can see, the current code can tear this load on 32-bit > > > systems, which can result in bad comparisons and then in failure to wait > > > for a partially complete transaction. > > > > > > So is btrfs actually supported on 32-bit systems? If not, would the > > > following patch be appropriate? > > > > There are limitations on 32bit systems, eg. due to shorter inode numbers > > (ino_t is unsigned long) and that radix-tree/xarray does support only > > unsigned long keys, while we have 64bit identifiers for inodes or tree > > roots. > > > > So far we support that and dropping it completely is I think a big deal, > > like with any possibly used feature. What I've seen there are NAS boxes > > with low power ARM that are 32bit. > > > > > If btrfs is to be supported on 32-bit systems, from what I can see some > > > major surgery is required, even if a 32-bit counter is wrap-safe for > > > this particular type of transaction. (But SSDs? In-memory btrfs > > > filesystems?) > > > > We won't probably do any major surgery to support 32bit systems. > > Got it, and thank you for the background! My takeaway is that 32-bit > BTRFS must work in the common case, but might have issues on some > workloads, for example, running out of inode numbers or load tearing. > > > > ------------------------------------------------------------------------ > > > > > > diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig > > > index 4fb925e8c981..4d56158c34f9 100644 > > > --- a/fs/btrfs/Kconfig > > > +++ b/fs/btrfs/Kconfig > > > @@ -19,6 +19,7 @@ config BTRFS_FS > > > select RAID6_PQ > > > select XOR_BLOCKS > > > depends on PAGE_SIZE_LESS_THAN_256KB > > > + depends on 64BIT > > > > Can we keep the current inefficient smp_* barriers instead of dropping > > the whole 32bit support as an alternative. If the smp_load_acquire are > > better but not strictly necessary for the correctness (from the barriers > > POV) I'd suggest to leave it as-is. We can put comments in case somebody > > wants to optimize it in the future again. > > We still have the barrier placement issue, given that smp_rmb() enforces > only the ordering of earlier and later loads, correct? Or am I missing > some other ordering constraint that makes all that work? > > But I can make each of the current patch's smp_load_acquire() call instead > be be a READ_ONCE() followed by an smp_rmb(), the test_bit_acquire() > call be test_bit() followed by smp_rmb(), and the smp_store_release() > call be an smp_wmb() followed by a WRITE_ONCE(). This provides the needed > ordering, bullet-proofs the 64-bit code against compilers, but works on > 32-bit systems. For example, on a 32-bit system the 64-bit READ_ONCE() > and WRITE_ONCE() might still be compiled as a pair of 32-bit memory > accesses, but they will both be guaranteed to be single memory accesses > on 64-bit systems. > > Would that work for you guys? Actually I think we just need to re-work all of this to make it less silly. Does this look reasonable to you Paul? I still have to test it, but I think it addresses your concerns and lets us keep 32bit support ;). Thanks, Josef From: Josef Bacik Date: Wed, 13 Dec 2023 10:46:57 -0500 Subject: [PATCH] btrfs: cleanup how we record roots in the transaction Previously we didn't have a bit field for tracking if a root was recorded in the transaction, it was simply an integer on the root. Additionally we have the ->last_trans which we record the last transaction this root was modified in. We had various memory barriers to attempt to synchronize these two updates so we could avoid taking the ->reloc_mutex all of the time for transaction starts. However this code has changed a lot, but the dubious memory ordering has remained. Fix up this code to make it more clear and get rid of the memory barriers. All we're trying to avoid here is constantly taking the ->reloc_mutex, so add a new bit to mark that the root has been recorded. If this bit is set then we can skip the expensive step, otherwise we have to take the lock and wait. Add a big comment indicating why we're doing the ordering and locking the way we are, and use ->fs_roots_radix_lock as the arbiter of this new bit. This makes the code simpler, removes the memory barriers, and documents clearly what we're doing so it's less confusing for future readers. Signed-off-by: Josef Bacik --- fs/btrfs/ctree.h | 11 ++++--- fs/btrfs/disk-io.c | 1 + fs/btrfs/transaction.c | 69 +++++++++++++++++------------------------- 3 files changed, 34 insertions(+), 47 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 70e828d33177..eb44cf191863 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -90,12 +90,13 @@ struct btrfs_path { */ enum { /* - * btrfs_record_root_in_trans is a multi-step process, and it can race - * with the balancing code. But the race is very small, and only the - * first time the root is added to each transaction. So IN_TRANS_SETUP - * is used to tell us when more checks are required + * We set this after we've recorded the root in the transaction. This + * is used to avoid taking the reloc_mutex every time we call + * btrfs_start_transaction(root). Once the root has been recorded we + * don't have to do the setup work again, and this is cleared on + * transaction commit once we're done with the root. */ - BTRFS_ROOT_IN_TRANS_SETUP, + BTRFS_ROOT_RECORDED, /* * Set if tree blocks of this root can be shared by other roots. diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c6907d533fe8..0410fac5f78e 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4805,6 +4805,7 @@ static void btrfs_free_all_qgroup_pertrans(struct btrfs_fs_info *fs_info) radix_tree_tag_clear(&fs_info->fs_roots_radix, (unsigned long)root->root_key.objectid, BTRFS_ROOT_TRANS_TAG); + clear_bit(BTRFS_ROOT_RECORDED, &root->state); } } spin_unlock(&fs_info->fs_roots_radix_lock); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 5b3333ceef04..186089ddac27 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -407,54 +407,43 @@ static int record_root_in_trans(struct btrfs_trans_handle *trans, int ret = 0; if ((test_bit(BTRFS_ROOT_SHAREABLE, &root->state) && - root->last_trans < trans->transid) || force) { + !test_bit(BTRFS_ROOT_RECORDED, &root->state)) || force) { WARN_ON(!force && root->commit_root != root->node); /* - * see below for IN_TRANS_SETUP usage rules - * we have the reloc mutex held now, so there - * is only one writer in this function + * We are using ->fs_roots_radix_lock as the ultimate protector + * of BTRFS_ROOT_RECORDED, which means we want to set it once + * we're completely done, so we call btrfs_init_reloc_root() + * first. + * + * This is ok because we are protected here by ->reloc_mutex, so + * in reality we won't ever race and call + * btrfs_init_reloc_root() twice on the same root back to back. + * However if we do it's ok, because again we're protected by + * ->reloc_mutex and the second call will simply update the + * ->last_trans on the reloc root to the current transid, making + * it essentially a no-op. + * + * We're using ->fs_roots_radix_lock here because + * btrfs_drop_snapshot will clear the tag so we don't update the + * root, and we will clear the RECORDED flag there without the + * ->reloc_mutex lock held. Again this is fine because we won't + * be messing with a dropped root anymore, but for concurrency + * it's best to keep the rules nice and simple. */ - set_bit(BTRFS_ROOT_IN_TRANS_SETUP, &root->state); - - /* make sure readers find IN_TRANS_SETUP before - * they find our root->last_trans update - */ - smp_wmb(); + ret = btrfs_init_reloc_root(trans, root); spin_lock(&fs_info->fs_roots_radix_lock); - if (root->last_trans == trans->transid && !force) { + if (test_bit(BTRFS_ROOT_RECORDED, &root->state) && !force) { spin_unlock(&fs_info->fs_roots_radix_lock); return 0; } radix_tree_tag_set(&fs_info->fs_roots_radix, (unsigned long)root->root_key.objectid, BTRFS_ROOT_TRANS_TAG); - spin_unlock(&fs_info->fs_roots_radix_lock); + set_bit(BTRFS_ROOT_RECORDED, &root->state); root->last_trans = trans->transid; - - /* this is pretty tricky. We don't want to - * take the relocation lock in btrfs_record_root_in_trans - * unless we're really doing the first setup for this root in - * this transaction. - * - * Normally we'd use root->last_trans as a flag to decide - * if we want to take the expensive mutex. - * - * But, we have to set root->last_trans before we - * init the relocation root, otherwise, we trip over warnings - * in ctree.c. The solution used here is to flag ourselves - * with root IN_TRANS_SETUP. When this is 1, we're still - * fixing up the reloc trees and everyone must wait. - * - * When this is zero, they can trust root->last_trans and fly - * through btrfs_record_root_in_trans without having to take the - * lock. smp_wmb() makes sure that all the writes above are - * done before we pop in the zero below - */ - ret = btrfs_init_reloc_root(trans, root); - smp_mb__before_atomic(); - clear_bit(BTRFS_ROOT_IN_TRANS_SETUP, &root->state); + spin_unlock(&fs_info->fs_roots_radix_lock); } return ret; } @@ -476,6 +465,7 @@ void btrfs_add_dropped_root(struct btrfs_trans_handle *trans, radix_tree_tag_clear(&fs_info->fs_roots_radix, (unsigned long)root->root_key.objectid, BTRFS_ROOT_TRANS_TAG); + clear_bit(BTRFS_ROOT_RECORDED, &root->state); spin_unlock(&fs_info->fs_roots_radix_lock); } @@ -488,13 +478,7 @@ int btrfs_record_root_in_trans(struct btrfs_trans_handle *trans, if (!test_bit(BTRFS_ROOT_SHAREABLE, &root->state)) return 0; - /* - * see record_root_in_trans for comments about IN_TRANS_SETUP usage - * and barriers - */ - smp_rmb(); - if (root->last_trans == trans->transid && - !test_bit(BTRFS_ROOT_IN_TRANS_SETUP, &root->state)) + if (test_bit(BTRFS_ROOT_RECORDED, &root->state)) return 0; mutex_lock(&fs_info->reloc_mutex); @@ -1531,6 +1515,7 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans) radix_tree_tag_clear(&fs_info->fs_roots_radix, (unsigned long)root->root_key.objectid, BTRFS_ROOT_TRANS_TAG); + clear_bit(BTRFS_ROOT_RECORDED, &root->state); spin_unlock(&fs_info->fs_roots_radix_lock); btrfs_free_log(trans, root); -- 2.43.0