Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1506020imu; Wed, 9 Jan 2019 20:39:17 -0800 (PST) X-Google-Smtp-Source: ALg8bN4jBk+oLNLzFu+DL9pkK1GCqXjq4Beaal2kLtW//9acl6PTFCvlnaGsiJ1IdiHE0neiuacy X-Received: by 2002:a63:554b:: with SMTP id f11mr5964114pgm.37.1547095157281; Wed, 09 Jan 2019 20:39:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547095157; cv=none; d=google.com; s=arc-20160816; b=dyOmeFck637SJyD5rEYx4B3SLgtaFoPipkrYHGm1EO2nXBU0JIuMLO/VJYtNdcjKM0 ogz9oTnFOCg/CATh3ME/qjPbsGUijJLnjXGLwthQ+hv6yVXasV92X4R+S9u2vgG1krw0 KX4Bn4xvoc5Sa6JypMB5aTEMRS5WenS5qrtmXDYBgtmXhowAKNd50xHtaldrKt9BjMAj eSoUCmKSOzLjqihBU91N7UvHtRo23VEVckCAGVOR1h7o1lF7VrO4Sgwf/EakCxFR0cqO vLdWcXwTWbrnP6YuGyUt1jf7cwxrNu7Hx506efc3eeZGKeJXkGggqcNNLNzTgcdtFZTG 5x5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=u4blSe7Ur24MJaTRyEpyipzVtNaDQk/i/OO2Vd8vauY=; b=Od2Ujmz498OSzQxfnKZyih51xDv/1vnoCV/fM5hEGfgC8hIdXsRqV6Z11iI7Bqpr0L WBWs7ORSyzhyxHPcLTvHr6uUwDTJKC0UkejIXT5A7wYog73rx41rcvPSWIVZQVa5SaqQ +4LnbNmNOPcle9yH83UahE1mK0+r0s3Y9OEoNMFTohMJftMMcBHukzZn3fb5KllQpvLI m9e6FwSIZ8h8azPglwXjL3nfBcslwOwwnejP47qJh9y27QjrVFcokxSBOO1HgnXwg1p+ ycMHM7ZBFZyLcZJ4r2jnLpQwc7otWPap9mO0/l0prcUEtkMChj2L//wEpT9KlKkxQav0 cd/g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 97si7218771plb.3.2019.01.09.20.38.56; Wed, 09 Jan 2019 20:39:17 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727304AbfAJESZ (ORCPT + 99 others); Wed, 9 Jan 2019 23:18:25 -0500 Received: from nautica.notk.org ([91.121.71.147]:58342 "EHLO nautica.notk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726923AbfAJESZ (ORCPT ); Wed, 9 Jan 2019 23:18:25 -0500 Received: by nautica.notk.org (Postfix, from userid 1001) id A04F6C01B; Thu, 10 Jan 2019 05:18:20 +0100 (CET) Date: Thu, 10 Jan 2019 05:18:05 +0100 From: Dominique Martinet To: Hou Tao Cc: v9fs-developer@lists.sourceforge.net, lucho@ionkov.net, ericvh@gmail.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, xingaopeng@huawei.com Subject: Re: [PATCH] 9p: use inode->i_lock to protect i_size_write() Message-ID: <20190110041805.GA7316@nautica> References: <20190109020522.105713-1-houtao1@huawei.com> <20190109023832.GA12389@nautica> <831fe284-c9e9-49a4-d530-5af57c2dd9d1@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <831fe284-c9e9-49a4-d530-5af57c2dd9d1@huawei.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hou Tao wrote on Thu, Jan 10, 2019: > > Hmm, I'm not familiar with the read/write seqcount code for 32 bit but I > > don't understand how locking here helps besides slowing things down (so > > if the value is constantly updated, the read thread might have a chance > > to be scheduled between two updates which was harder to do before ; and > > thus "solving" your soft lockup) > > i_size_read() will call read_seqcount_begin() under 32-bit SMP environment, > and it may loop in __read_seqcount_begin() infinitely because two or more > invocations of write_seqcount_begin interleave and s->sequence becomes > an odd number. It's noted in comments of i_size_write(): > > /* >  * NOTE: unlike i_size_read(), i_size_write() does need locking around it >  * (normally i_mutex), otherwise on 32bit/SMP an update of i_size_seqcount >  * can be lost, resulting in subsequent i_size_read() calls spinning forever. >  */ I see, I wasn't aware of how seqcount worked in details but it is a simple seq++ with barrier so that does indeed need locking. I had misunderstood the lockup reason as simply the value being updated all the time. > > Instead, a better fix would be to update v9fs_stat2inode to first read > > the inode size, and only call i_size_write if it changed - I'd bet this > > also fixes the problem and looks better than locking to me. > > (Can also probably reuse stat->length instead of the following > > i_size_read for i_blocks...) > > For read-only case, this fix will work. However if the inode size is changed > constantly, there will be two or more callers of i_size_write() and the soft-lockup > is still possible. You are right. We can still do this as an optimization to not take the lock in the read-only case, but otherwise it's safe to forget about this comment. > > On the other hand it might make sense to also lock the inode for > > stat2inode because we're dealing with partially updated inodes at time, > > but if we do this I'd rather put the locking in v9fs_stat2inode and not > > outside of it to catch all the places where it's used; but the readers > > don't lock so I'm not sure it makes much sense. > > Moving lock into v9fs_stat2inode() sounds reasonable. There are callers > which don't need it (e.g. v9fs_qid_iget() uses it to fill attribute for a > newly-created inode and v9fs_mount() uses it to fill attribute for root inode), > so i will rename v9fs_stat2inode() to v9fs_stat2inode_nolock(), and wrap > v9fs_stat2inode() upon v9fs_stat2inode_nolock(). If it's a newly created inode there is no contention and the spinlock has virtually no cost ; and v9fs_mount's root inode is the same. Let's keep this simple and always lock around i_size_write. > > There's also a window during which the inode's nlink is dropped down to > > 1 then set again appropriately if the extension is present; that's > > rather ugly and we probably should only reset it to 1 if the attribute > > wasn't set before... That can be another patch and/or I'll do it > > eventually if you don't. > > I can not follow that. Do you mean inode->i_nlink may be updated concurrently > by v9fs_stat2inode() and v9fs_remove() and that will lead to corruption of i_nlink ? Not corruption, but the value can be incoherent for a short while. If another thread looks at nlink continuously it can see the value being reset to 1 for a short time when there really is no reason to. I think this isn't as important as the other issues you've raised, so don't worry about this point unless you want to. > I also note a race about updating of v9inode->cache_validity. It seems > that it is possible the clear of V9FS_INO_INVALID_ATTR in > v9fs_remove() may lost if there are invocations of v9fs_vfs_getattr() > in the same time. We may need to ensure V9FS_INO_INVALID_ATTR is > enabled before clearing it atomically in v9fs_vfs_getattr() and i will > send another patch for it. There are many places setting this flag but that appears correct this isn't safe. It's more complicated than what you're saying though, e.g. even if the flag was already set at the start of v9fs_vfs_getattr, if another thread "sets it again" while it is running we need not to unset it -- so basically this needs to be a 3-value: - valid - invalid - invalid-being-refreshed v9fs_invalidate_inode_attr would set it to invalid regardless of the previous state. start of v9fs_vfs_getattr would set it to invalid-being-refreshed (iff it was invalid? always?) end of v9fs_vfs_getattr/v9fs_stat2inode would set it back to valid iff it is invalid-being-refreshed. You cannot clear the flag at the start of v9fs_vfs_getattr() because another getattr in parallel needs to know it cannot trust the cached value; currently that parallel getattr call would issue another stat request to the server but if we implement this we could wait until the flag becomes valid (I'm not sure if it's appropriate to poll the value even if we yield, so this might need a mutex or other lock that can sleep though...); well, for now it's ok to behave like we currently do and also proceed on to refresh the attribute in that other thread, we can think about handling it better incrementally Thanks, -- Dominique