Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp80203pxb; Wed, 24 Feb 2021 19:02:38 -0800 (PST) X-Google-Smtp-Source: ABdhPJze/brhyAiGDet1KCA48Il6WbmGxCtg9kV3hR9AV9FkGR779CBTqfpdgZ+YC5ITK2eQB8I2 X-Received: by 2002:aa7:d296:: with SMTP id w22mr854910edq.264.1614222158631; Wed, 24 Feb 2021 19:02:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614222158; cv=none; d=google.com; s=arc-20160816; b=BObHx3Aw8UAgnrlPVF4jxi4SXKRgcfUcArt+519z47+ZAdKnWredQwQxCc5JLzjRGl n/Mglc9fNJV/bIjDQCn+u5t6v6HoZmJofeWgslJSj3CaKYsh0tYEnP3PV9Ade6Gii5Wl aFAIrRcayh3C9Z4W0bheNb5s2IBLPOs9C0x8NUE/No5J0C6va799xMdN6Jesy6+xViSM MtUQvWSNiPwLxByR6NBtkw+BQqSrAnOwSpUALPToc0QUCqpHUQCZKZ1bBvQaptIx7oOj CaI7gIbrnVRwaAYAChdYLuYnrSW22pcR/bjYoQgHk3UZ3wD/NIpEKbnYy6m+EIGoIr3r AqGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:mail-followup-to:reply-to:message-id :subject:cc:to:from:date; bh=g9+AhCNHGB8fly9X9tLbaL+EEtRykSi2ZO/VAvh3QP8=; b=Y7ky7oTJW9NhGX+wiTBM4s3vHhitJo928CBQ7P3plLv89KsYJMrk2mcvCseIdHVqZb W2rEKg96nNlS+rHYMkQhV7IzhDtCI7jxf77f1FR4QxlLXhfQopZgn1OsHQzjP5hyWwuC xJtF7ub/6JXa2JyI1pUqUh0/LkaabEa/dAou7KBe6gJx/5t4Axto4/2LFA4/mp7OqGIS 1LEXOkiDHh1LGr3RVyFroynWu5oZDs4o5/B6k10o1uJjMVSJJB/XY015Qy6uv7XrMRxF ssOMsyZ5h5+/bnIOdHoy7ktLh+FJqjRc5vUjU3v6KkN0dOlQr/pCGYySOGfHqI7RW9Xd 2hfw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p18si2552622ejy.191.2021.02.24.19.01.25; Wed, 24 Feb 2021 19:02:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234465AbhBXSLJ (ORCPT + 99 others); Wed, 24 Feb 2021 13:11:09 -0500 Received: from mx2.suse.de ([195.135.220.15]:40464 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233640AbhBXSLD (ORCPT ); Wed, 24 Feb 2021 13:11:03 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id D10B2AAAE; Wed, 24 Feb 2021 18:10:20 +0000 (UTC) Received: by ds.suse.cz (Postfix, from userid 10065) id DBF5ADA7B0; Wed, 24 Feb 2021 19:08:20 +0100 (CET) Date: Wed, 24 Feb 2021 19:08:20 +0100 From: David Sterba To: Sasha Levin Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org, Josef Bacik , Nikolay Borisov , David Sterba , linux-btrfs@vger.kernel.org Subject: Re: [PATCH AUTOSEL 5.11 55/67] btrfs: only let one thread pre-flush delayed refs in commit Message-ID: <20210224180820.GY1993@twin.jikos.cz> Reply-To: dsterba@suse.cz Mail-Followup-To: dsterba@suse.cz, Sasha Levin , linux-kernel@vger.kernel.org, stable@vger.kernel.org, Josef Bacik , Nikolay Borisov , David Sterba , linux-btrfs@vger.kernel.org References: <20210224125026.481804-1-sashal@kernel.org> <20210224125026.481804-55-sashal@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210224125026.481804-55-sashal@kernel.org> User-Agent: Mutt/1.5.23.1-rc1 (2014-03-12) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 24, 2021 at 07:50:13AM -0500, Sasha Levin wrote: > From: Josef Bacik > > [ Upstream commit e19eb11f4f3d3b0463cd897016064a79cb6d8c6d ] > > I've been running a stress test that runs 20 workers in their own > subvolume, which are running an fsstress instance with 4 threads per > worker, which is 80 total fsstress threads. In addition to this I'm > running balance in the background as well as creating and deleting > snapshots. This test takes around 12 hours to run normally, going > slower and slower as the test goes on. > > The reason for this is because fsstress is running fsync sometimes, and > because we're messing with block groups we often fall through to > btrfs_commit_transaction, so will often have 20-30 threads all calling > btrfs_commit_transaction at the same time. > > These all get stuck contending on the extent tree while they try to run > delayed refs during the initial part of the commit. > > This is suboptimal, really because the extent tree is a single point of > failure we only want one thread acting on that tree at once to reduce > lock contention. > > Fix this by making the flushing mechanism a bit operation, to make it > easy to use test_and_set_bit() in order to make sure only one task does > this initial flush. > > Once we're into the transaction commit we only have one thread doing > delayed ref running, it's just this initial pre-flush that is > problematic. With this patch my stress test takes around 90 minutes to > run, instead of 12 hours. > > The memory barrier is not necessary for the flushing bit as it's > ordered, unlike plain int. The transaction state accessed in > btrfs_should_end_transaction could be affected by that too as it's not > always used under transaction lock. Upon Nikolay's analysis in [1] > it's not necessary: > > In should_end_transaction it's read without holding any locks. (U) > > It's modified in btrfs_cleanup_transaction without holding the > fs_info->trans_lock (U), but the STATE_ERROR flag is going to be set. > > set in cleanup_transaction under fs_info->trans_lock (L) > set in btrfs_commit_trans to COMMIT_START under fs_info->trans_lock.(L) > set in btrfs_commit_trans to COMMIT_DOING under fs_info->trans_lock.(L) > set in btrfs_commit_trans to COMMIT_UNBLOCK under > fs_info->trans_lock.(L) > > set in btrfs_commit_trans to COMMIT_COMPLETED without locks but at this > point the transaction is finished and fs_info->running_trans is NULL (U > but irrelevant). > > So by the looks of it we can have a concurrent READ race with a WRITE, > due to reads not taking a lock. In this case what we want to ensure is > we either see new or old state. I consulted with Will Deacon and he said > that in such a case we'd want to annotate the accesses to ->state with > (READ|WRITE)_ONCE so as to avoid a theoretical tear, in this case I > don't think this could happen but I imagine at some point KCSAN would > flag such an access as racy (which it is). > > [1] https://lore.kernel.org/linux-btrfs/e1fd5cc1-0f28-f670-69f4-e9958b4964e6@suse.com > > Reviewed-by: Nikolay Borisov > Signed-off-by: Josef Bacik > [ add comments regarding memory barrier ] > Signed-off-by: David Sterba > Signed-off-by: Sasha Levin Please drop this patch from autosel queue, it's part of a larger series that reworks flushing and is not a standalone fix.