Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp523594ybv; Wed, 19 Feb 2020 04:18:43 -0800 (PST) X-Google-Smtp-Source: APXvYqzqfJAapS6bXOQsDIHkr3v8tV8PJiRzgdnaBqm2HxTnI4b9b3qqVxAAkfwdWIVXcrliWBfX X-Received: by 2002:a9d:7b4e:: with SMTP id f14mr19463994oto.355.1582114723269; Wed, 19 Feb 2020 04:18:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582114723; cv=none; d=google.com; s=arc-20160816; b=jA64QhDORh37CX80ZJno9hlnaSYcf47gaMmyaBvV8ZcZhwfJYZ0xbYi7+jVBnj5Ppu Mh3Og5ev5PAR03KFl/WlFai+YSuc8W2nQQ9KnKU5slyZRWmJ/rNU3XE4yIEdiRHCuk0z uG+W6l+8+C7ISYnZOgKQDqLAGDM+1gfC31Fw3Qp95OHv177mQBGuHeqwsWJt4PdL+6CG N9oZ3+4t0GCJepO736QaGg0C7NTOF0ZVw0UUZj0vbyiuRFfyhl9MMjyiu8HcR7R1bpHP paRe9oMUkzsE7eVP2S1gLMVfbKURWoJjNM+/Y1IXpox5MsHXHJ/aAsyZA3M/KPT40Df8 aHfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:mail-followup-to :reply-to:message-id:subject:cc:to:from:date; bh=441pL/uUUwY+iI2IZfhHAB8vUCSI6i9PemdVy+AVrNg=; b=PVzsFZkqx7/aiB9UwMRIB3G3Ch1Fyhoe+9iCTC70dPkhya+eYOsUNLXR0J6GYJspR+ JdUm0KNh4xzASKZGIkMgnh3jHohn7gO9HYNwdVndJ1u1w9zlwq0PyxgdWTza1jSDTivY P0F5ZPe8MEiVgEG02F0FO7S6BWR3VF9BId9SI7d+pLMNZs21oE1nMKeUwVKfF99BNB5h hZHun3xUR/wCf4Gk6P74P2BU4lhVe4qH32WlC0KlmS32zGW8B6bvfZLhnXNgR6qxLYqp MHZtDSQmQO8FMuULdLfJKW0EaOU4JzkL4oXGJ0WCkX3S4x1GLw5T41SwRsSGc4IM3rFi OQcA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 64si1002656otx.50.2020.02.19.04.18.30; Wed, 19 Feb 2020 04:18:43 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726736AbgBSMSY (ORCPT + 99 others); Wed, 19 Feb 2020 07:18:24 -0500 Received: from mx2.suse.de ([195.135.220.15]:36674 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726491AbgBSMSX (ORCPT ); Wed, 19 Feb 2020 07:18:23 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 342B7B0E6; Wed, 19 Feb 2020 12:18:21 +0000 (UTC) Received: by ds.suse.cz (Postfix, from userid 10065) id A715FDA838; Wed, 19 Feb 2020 13:18:04 +0100 (CET) Date: Wed, 19 Feb 2020 13:18:04 +0100 From: David Sterba To: Marco Elver Cc: Al Viro , Qian Cai , Christoph Hellwig , "Darrick J. Wong" , linux-xfs@vger.kernel.org, linux-fsdevel , LKML Subject: Re: [PATCH] fs: fix a data race in i_size_write/i_size_read Message-ID: <20200219121804.GV2902@twin.jikos.cz> Reply-To: dsterba@suse.cz Mail-Followup-To: dsterba@suse.cz, Marco Elver , Al Viro , Qian Cai , Christoph Hellwig , "Darrick J. Wong" , linux-xfs@vger.kernel.org, linux-fsdevel , LKML References: <20200219045228.GO23230@ZenIV.linux.org.uk> <20200219052329.GP23230@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1-rc1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 19, 2020 at 10:21:46AM +0100, Marco Elver wrote: > Right. In reality, for mainstream architectures, it appears quite unlikely. > > There may be other valid reasons, such as documenting the fact the > write can happen concurrently with loads. > > Let's assume the WRITE_ONCE can be dropped. > > The load is a different story. While load tearing may not be an issue, > it's more likely that other optimizations can break the code. For > example load fusing can break code that expects repeated loads in a > loop. E.g. I found these uses of i_size_read in loops: > > git grep -E '(for|while) \(.*i_size_read' > fs/ocfs2/dir.c: while (ctx->pos < i_size_read(inode)) { > fs/ocfs2/dir.c: for (i = 0; i < i_size_read(inode) && > i < offset; ) { > fs/ocfs2/dir.c: while (ctx->pos < i_size_read(inode)) { > fs/ocfs2/dir.c: while (ctx->pos < i_size_read(inode) > fs/squashfs/dir.c: while (length < i_size_read(inode)) { > fs/squashfs/namei.c: while (length < i_size_read(dir)) { > > Can i_size writes happen concurrently, and if so will these break if > the compiler decides to just do i_size_read's load once, and keep the > result in a register? It depends on the semantics and the behaviour when the value is not cached in a register might be the wrong one. A concrete example with assembly and analysis can be found in d98da49977f6 ("btrfs: save i_size to avoid double evaluation of i_size_read in compress_file_range"), which is the workardound mentioned in the my other mail. C: actual_end = min_t(u64, i_size_read(inode), end + 1); Asm: mov 0x20(%rsp),%rax cmp %rax,0x48(%r15) # read movl $0x0,0x18(%rsp) mov %rax,%r12 mov %r14,%rax cmovbe 0x48(%r15),%r12 # eval Where r15 is inode and 0x48 is offset of i_size. The original fix was to revert 62b37622718c that would do an intermediate assignment and this would also avoid the doulble evaluation but is not future-proof, should the compiler merge the stores and call i_size_read anyway. There's a patch adding READ_ONCE to i_size_read but that's not being applied at the moment and we need to fix the bug. Instead, emulate READ_ONCE by two barrier()s that's what effectively happens. The assembly confirms single evaluation: mov 0x48(%rbp),%rax # read once mov 0x20(%rsp),%rcx mov $0x20,%edx cmp %rax,%rcx cmovbe %rcx,%rax mov %rax,(%rsp) mov %rax,%rcx mov %r14,%rax Where 0x48(%rbp) is inode->i_size stored to %eax.