Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp874145ybl; Wed, 29 Jan 2020 11:08:56 -0800 (PST) X-Google-Smtp-Source: APXvYqyI5vOaI3Uu6Y0TdmcFYBuptkk/TwJm8tH/tRUYZfls7uukZSZeVAYXwlFDfXOpzsQtqco1 X-Received: by 2002:aca:d484:: with SMTP id l126mr293573oig.114.1580324936790; Wed, 29 Jan 2020 11:08:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580324936; cv=none; d=google.com; s=arc-20160816; b=rmcj3x/2bi92hQmroi8L/0nzO9K/D+iNLlE7vyrucjQWPJ2LU+ipm3SngFyF+TUe/f h6JhN5ngHec/GVAYwtjItNeHm4P46qcUYjXiIGSEBYFx7LB8JJNUK9yFATKGuanOVJWw YtOoiQNBScyb35AZi3HDPZ+ONN/BWZ0p+dgpBmbMCCc2HIiS8k8TSIE4Id7C2wpfoP7a LVj8qeDiCpdIu3xJw1gd18paO8f4RT4jWGiE0+Rl3B+82cdOMlj0wsIeOQKGdLTiarGM hhzKdpWqfzGe1oMXOul2C4Wk3PivZ9U6c93hT9MJKDzEzziysWip13yQ+QNI6euPTN4Z BqBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language:in-reply-to:mime-version :user-agent:date:message-id:autocrypt:from:references:cc:to:subject; bh=0uG51fGBXzrhEpX6xJ6aEqQbBHcK5MMkcwCKBc0fB9E=; b=mAoHw8RhhvVqEL5E7NyIFrNWNM4f3URGGwnNkYl/U+fnlif1gz5ZxiizFyI1MJ8J03 zXrfu5igts484EmsI3waPcrgo4jtQ8Bh0WLt0AWjFhIi4X2RyzfZf4vBS5+uzcpdmw3Y zCdIMIjdYgTLSQADHR75FCBrVNcmP1GYoGf9lMGt+zvL8VnzHIPWhFZDZvOFx9l3Ux3g NE7Ta/bs0Kz9fZhmG3KPaJt8r3wZlnQeeyS/YJSlzYKaXUH51rMPuYEVoxWVOrhatKa1 b2s5GR4eECCoWXO//uE5N5LU7hcncbi8ifXdHbW51laxEj9KXporESiRedXmUJTBg4/c Tshg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u14si1625705otg.10.2020.01.29.11.08.44; Wed, 29 Jan 2020 11:08:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728489AbgA2THe (ORCPT + 99 others); Wed, 29 Jan 2020 14:07:34 -0500 Received: from mx2.suse.de ([195.135.220.15]:59656 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727635AbgA2THd (ORCPT ); Wed, 29 Jan 2020 14:07:33 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id BB1EAAE3D; Wed, 29 Jan 2020 19:07:30 +0000 (UTC) Subject: Re: [PATCH] btrfs: optimize barrier usage for Rmw atomics To: Davidlohr Bueso , dsterba@suse.com Cc: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, Davidlohr Bueso References: <20200129180324.24099-1-dave@stgolabs.net> From: Nikolay Borisov Autocrypt: addr=nborisov@suse.com; prefer-encrypt=mutual; keydata= xsFNBFiKBz4BEADNHZmqwhuN6EAzXj9SpPpH/nSSP8YgfwoOqwrP+JR4pIqRK0AWWeWCSwmZ T7g+RbfPFlmQp+EwFWOtABXlKC54zgSf+uulGwx5JAUFVUIRBmnHOYi/lUiE0yhpnb1KCA7f u/W+DkwGerXqhhe9TvQoGwgCKNfzFPZoM+gZrm+kWv03QLUCr210n4cwaCPJ0Nr9Z3c582xc bCUVbsjt7BN0CFa2BByulrx5xD9sDAYIqfLCcZetAqsTRGxM7LD0kh5WlKzOeAXj5r8DOrU2 GdZS33uKZI/kZJZVytSmZpswDsKhnGzRN1BANGP8sC+WD4eRXajOmNh2HL4P+meO1TlM3GLl EQd2shHFY0qjEo7wxKZI1RyZZ5AgJnSmehrPCyuIyVY210CbMaIKHUIsTqRgY5GaNME24w7h TyyVCy2qAM8fLJ4Vw5bycM/u5xfWm7gyTb9V1TkZ3o1MTrEsrcqFiRrBY94Rs0oQkZvunqia c+NprYSaOG1Cta14o94eMH271Kka/reEwSZkC7T+o9hZ4zi2CcLcY0DXj0qdId7vUKSJjEep c++s8ncFekh1MPhkOgNj8pk17OAESanmDwksmzh1j12lgA5lTFPrJeRNu6/isC2zyZhTwMWs k3LkcTa8ZXxh0RfWAqgx/ogKPk4ZxOXQEZetkEyTFghbRH2BIwARAQABzSJOaWtvbGF5IEJv cmlzb3YgPG5ib3Jpc292QHN1c2UuZGU+wsF4BBMBAgAiBQJYijkSAhsDBgsJCAcDAgYVCAIJ CgsEFgIDAQIeAQIXgAAKCRBxvoJG5T8oV/B6D/9a8EcRPdHg8uLEPywuJR8URwXzkofT5bZE IfGF0Z+Lt2ADe+nLOXrwKsamhweUFAvwEUxxnndovRLPOpWerTOAl47lxad08080jXnGfYFS Dc+ew7C3SFI4tFFHln8Y22Q9075saZ2yQS1ywJy+TFPADIprAZXnPbbbNbGtJLoq0LTiESnD w/SUC6sfikYwGRS94Dc9qO4nWyEvBK3Ql8NkoY0Sjky3B0vL572Gq0ytILDDGYuZVo4alUs8 LeXS5ukoZIw1QYXVstDJQnYjFxYgoQ5uGVi4t7FsFM/6ykYDzbIPNOx49Rbh9W4uKsLVhTzG BDTzdvX4ARl9La2kCQIjjWRg+XGuBM5rxT/NaTS78PXjhqWNYlGc5OhO0l8e5DIS2tXwYMDY LuHYNkkpMFksBslldvNttSNei7xr5VwjVqW4vASk2Aak5AleXZS+xIq2FADPS/XSgIaepyTV tkfnyreep1pk09cjfXY4A7qpEFwazCRZg9LLvYVc2M2eFQHDMtXsH59nOMstXx2OtNMcx5p8 0a5FHXE/HoXz3p9bD0uIUq6p04VYOHsMasHqHPbsMAq9V2OCytJQPWwe46bBjYZCOwG0+x58 fBFreP/NiJNeTQPOa6FoxLOLXMuVtpbcXIqKQDoEte9aMpoj9L24f60G4q+pL/54ql2VRscK d87BTQRYigc+ARAAyJSq9EFk28++SLfg791xOh28tLI6Yr8wwEOvM3wKeTfTZd+caVb9gBBy wxYhIopKlK1zq2YP7ZjTP1aPJGoWvcQZ8fVFdK/1nW+Z8/NTjaOx1mfrrtTGtFxVBdSCgqBB jHTnlDYV1R5plJqK+ggEP1a0mr/rpQ9dFGvgf/5jkVpRnH6BY0aYFPprRL8ZCcdv2DeeicOO YMobD5g7g/poQzHLLeT0+y1qiLIFefNABLN06Lf0GBZC5l8hCM3Rpb4ObyQ4B9PmL/KTn2FV Xq/c0scGMdXD2QeWLePC+yLMhf1fZby1vVJ59pXGq+o7XXfYA7xX0JsTUNxVPx/MgK8aLjYW hX+TRA4bCr4uYt/S3ThDRywSX6Hr1lyp4FJBwgyb8iv42it8KvoeOsHqVbuCIGRCXqGGiaeX Wa0M/oxN1vJjMSIEVzBAPi16tztL/wQtFHJtZAdCnuzFAz8ue6GzvsyBj97pzkBVacwp3/Mw qbiu7sDz7yB0d7J2tFBJYNpVt/Lce6nQhrvon0VqiWeMHxgtQ4k92Eja9u80JDaKnHDdjdwq FUikZirB28UiLPQV6PvCckgIiukmz/5ctAfKpyYRGfez+JbAGl6iCvHYt/wAZ7Oqe/3Cirs5 KhaXBcMmJR1qo8QH8eYZ+qhFE3bSPH446+5oEw8A9v5oonKV7zMAEQEAAcLBXwQYAQIACQUC WIoHPgIbDAAKCRBxvoJG5T8oV1pyD/4zdXdOL0lhkSIjJWGqz7Idvo0wjVHSSQCbOwZDWNTN JBTP0BUxHpPu/Z8gRNNP9/k6i63T4eL1xjy4umTwJaej1X15H8Hsh+zakADyWHadbjcUXCkg OJK4NsfqhMuaIYIHbToi9K5pAKnV953xTrK6oYVyd/Rmkmb+wgsbYQJ0Ur1Ficwhp6qU1CaJ mJwFjaWaVgUERoxcejL4ruds66LM9Z1Qqgoer62ZneID6ovmzpCWbi2sfbz98+kW46aA/w8r 7sulgs1KXWhBSv5aWqKU8C4twKjlV2XsztUUsyrjHFj91j31pnHRklBgXHTD/pSRsN0UvM26 lPs0g3ryVlG5wiZ9+JbI3sKMfbdfdOeLxtL25ujs443rw1s/PVghphoeadVAKMPINeRCgoJH zZV/2Z/myWPRWWl/79amy/9MfxffZqO9rfugRBORY0ywPHLDdo9Kmzoxoxp9w3uTrTLZaT9M KIuxEcV8wcVjr+Wr9zRl06waOCkgrQbTPp631hToxo+4rA1jiQF2M80HAet65ytBVR2pFGZF zGYYLqiG+mpUZ+FPjxk9kpkRYz61mTLSY7tuFljExfJWMGfgSg1OxfLV631jV1TcdUnx+h3l Sqs2vMhAVt14zT8mpIuu2VNxcontxgVr1kzYA/tQg32fVRbGr449j1gw57BV9i0vww== Message-ID: <25e3abe7-5e86-2180-424a-ceef7402c257@suse.com> Date: Wed, 29 Jan 2020 21:07:29 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: <20200129180324.24099-1-dave@stgolabs.net> Content-Type: multipart/mixed; boundary="------------961A561BD5F9D1A717978A8F" Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a multi-part message in MIME format. --------------961A561BD5F9D1A717978A8F Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit On 29.01.20 г. 20:03 ч., Davidlohr Bueso wrote: > Use smp_mb__after_atomic() instead of smp_mb() and avoid the > unnecessary barrier for non LL/SC architectures, such as x86. > > Signed-off-by: Davidlohr Bueso While on the topic of this I've been sitting on the following local patch for about a year, care to review the barriers: --------------961A561BD5F9D1A717978A8F Content-Type: text/x-patch; charset=UTF-8; name="0001-btrfs-Fix-memory-ordering-of-unlocked-dio-reads-vs-t.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-btrfs-Fix-memory-ordering-of-unlocked-dio-reads-vs-t.pa"; filename*1="tch" From e659e5db649be01aec20515aef8ca48143e10c0b Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Wed, 7 Mar 2018 17:19:12 +0200 Subject: [PATCH] btrfs: Fix memory ordering of unlocked dio reads vs truncate Signed-off-by: Nikolay Borisov --- fs/btrfs/btrfs_inode.h | 17 ----------------- fs/btrfs/inode.c | 41 ++++++++++++++++++++++++++++++++++------- 2 files changed, 34 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 4e12a477d32e..e84f58cca02e 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -317,23 +317,6 @@ struct btrfs_dio_private { blk_status_t); }; -/* - * Disable DIO read nolock optimization, so new dio readers will be forced - * to grab i_mutex. It is used to avoid the endless truncate due to - * nonlocked dio read. - */ -static inline void btrfs_inode_block_unlocked_dio(struct btrfs_inode *inode) -{ - set_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags); - smp_mb(); -} - -static inline void btrfs_inode_resume_unlocked_dio(struct btrfs_inode *inode) -{ - smp_mb__before_atomic(); - clear_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags); -} - /* Array of bytes with variable length, hexadecimal format 0x1234 */ #define CSUM_FMT "0x%*phN" #define CSUM_FMT_VALUE(size, bytes) size, bytes diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 6d2bb58d277a..d64600268c3a 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4626,10 +4626,29 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr) truncate_setsize(inode, newsize); - /* Disable nonlocked read DIO to avoid the endless truncate */ - btrfs_inode_block_unlocked_dio(BTRFS_I(inode)); + /* + * This code is very subtle. It is essentially a lock of its + * own type. BTRFS allows multiple DIO readers to race with + * writers so long as they don't read beyond EOF of an inode. + * However, if we have a pending truncate we'd like to signal + * DIO readers they should fall back to DIO_LOCKING semantics. + * This ensures that multiple aggressive DIO readers cannot + * starve the truncating thread. + * + * This semantics is achieved by the use of the below flag. If + * new readers come after the flag has been cleared then the + * state is still consistent, since the RELEASE semantics of + * clear_bit_unlock ensure the truncate inode size will be + * visible and DIO readers will bail out. + * + * The implied memory barrier by inode_dio_wait is paired with + * smp_mb__before_atomic in btrfs_direct_IO. + */ + set_bit(BTRFS_INODE_READDIO_NEED_LOCK, + &BTRFS_I(inode)->runtime_flags); inode_dio_wait(inode); - btrfs_inode_resume_unlocked_dio(BTRFS_I(inode)); + clear_bit_unlock(BTRFS_INODE_READDIO_NEED_LOCK, + &BTRFS_I(inode)->runtime_flags); ret = btrfs_truncate(inode, newsize == oldsize); if (ret && inode->i_nlink) { @@ -8070,11 +8089,19 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) dio_data.unsubmitted_oe_range_end = (u64)offset; current->journal_info = &dio_data; down_read(&BTRFS_I(inode)->dio_sem); - } else if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK, + } else { + /* + * This barrier is paired with the implied barrier in + * inode_dio_wait. It ensures that READDIO_NEED_LOCK is + * visible if we have a pending truncate. + */ + smp_mb__before_atomic(); + if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK, &BTRFS_I(inode)->runtime_flags)) { - inode_dio_end(inode); - flags = DIO_LOCKING | DIO_SKIP_HOLES; - wakeup = false; + inode_dio_end(inode); + flags = DIO_LOCKING | DIO_SKIP_HOLES; + wakeup = false; + } } ret = __blockdev_direct_IO(iocb, inode, -- 2.17.1 --------------961A561BD5F9D1A717978A8F--