Received: by 2002:a05:6a10:6006:0:0:0:0 with SMTP id w6csp1108957pxa; Fri, 28 Aug 2020 04:06:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxHDE37Nq9FhS3q3OXA/dMNKLe5QeWpFwMCcaU8/Ok5u6OEZw4Kp5YsTS+uVP2jZuE+y0J2 X-Received: by 2002:aa7:d3c2:: with SMTP id o2mr1277497edr.11.1598612773266; Fri, 28 Aug 2020 04:06:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1598612773; cv=none; d=google.com; s=arc-20160816; b=t25vR4tEO1pl/qBybB/wz5gsfVr+Z30afudglZQQam+hlzy3vcw80ncxzDOFZ0JAHu XH88NTSTA4WwPtS3XvZjSOTYC5GrhZ9gFgO477w0Y1n1Gy4h3QFe1JqXuzNNFcHnEb+T 2Dlpf6GDPknU413WOAHZTD8UbztvWhwt2D6IGQJtLM8vxWan72Fw3pc6dy/MpRBwwx1Q xqTgxzri929Oh+5ScRc/jxs8tpXPfHiEtGxlduqoRhw018FxrcnzxbPJq9mpJSer77yS gcWqfJMVwqG7APcurYrl5iyINMba+dsAzwiAsO1pibVWUwyAELA8ti7WFri+1t2LuLTC De8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=qlTS7ecZtt1KpRCLtclmHdR9jQQKqqF/o/FpdMugXSY=; b=b7rNk40Al5PqHuSosT8FdEJTMD+jgI9eXFku23Jv/Li9PrJ27LgUCx3ZtkJNl52Nif MumJbIKEl9Gx0r2LJvXt9H4qGc6rOd/i+cJJ8ICvjgBJ0+bTgrq1hTm+PzCaIMuAgvtH zMAArSd2sS4eKMSjwoCGPY1CpDvMzJn4RECfGsp3EDYPRlEblRMducWc1SGK0z/ZaUgR Z03X3AmHgqTB4UtN0K2WN3L3cxdi3GhmNo85oYyr1Sw5OItD98rXjvPDT3M7zUya6wRS 1gQVyMWLzEpttdJwxEZWRW5pzDVB1ybTEiyQnBjSnQqNnFQas5ok6bI4aihgmZI5vTbH mIug== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=OdkPbxH3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t14si403281eju.178.2020.08.28.04.05.48; Fri, 28 Aug 2020 04:06:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=OdkPbxH3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728680AbgH1LEc (ORCPT + 99 others); Fri, 28 Aug 2020 07:04:32 -0400 Received: from mail.kernel.org ([198.145.29.99]:47034 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729156AbgH1LEN (ORCPT ); Fri, 28 Aug 2020 07:04:13 -0400 Received: from willie-the-truck (236.31.169.217.in-addr.arpa [217.169.31.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 6EEED207DF; Fri, 28 Aug 2020 11:04:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598612653; bh=c7c25Ad7ytC5Nvm3HpoIhR0Ip8Ni9wwASMQD/vt/XEg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=OdkPbxH3//2B1lmlRwK/7kEk7/mQ+9ro8Jd3YUEfOEhH19iqvfDxISyROvjy2/glP bm3gJ7K8TB6Taob/SGu4njdFkrNda4Oa1jnzEwmVX53RM1O5zTfCmUo21Tir9tX866 BjxQc4iICM4k4NWVGLOh5kJ64l7n4KbOiXjemdRY= Date: Fri, 28 Aug 2020 12:04:08 +0100 From: Will Deacon To: Al Viro Cc: Shaokun Zhang , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Yuqi Jin , kernel test robot , Mark Rutland , Peter Zijlstra , Boqun Feng Subject: Re: [NAK] Re: [PATCH] fs: Optimized fget to improve performance Message-ID: <20200828110408.GA30722@willie-the-truck> References: <1598523584-25601-1-git-send-email-zhangshaokun@hisilicon.com> <20200827142848.GZ1236603@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200827142848.GZ1236603@ZenIV.linux.org.uk> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 27, 2020 at 03:28:48PM +0100, Al Viro wrote: > On Thu, Aug 27, 2020 at 06:19:44PM +0800, Shaokun Zhang wrote: > > From: Yuqi Jin > > > > It is well known that the performance of atomic_add is better than that of > > atomic_cmpxchg. > > The initial value of @f_count is 1. While @f_count is increased by 1 in > > __fget_files, it will go through three phases: > 0, < 0, and = 0. When the > > fixed value 0 is used as the condition for terminating the increase of 1, > > only atomic_cmpxchg can be used. When we use < 0 as the condition for > > stopping plus 1, we can use atomic_add to obtain better performance. > > Suppose another thread has just removed it from the descriptor table. > > > +static inline bool get_file_unless_negative(atomic_long_t *v, long a) > > +{ > > + long c = atomic_long_read(v); > > + > > + if (c <= 0) > > + return 0; > > Still 1. Now the other thread has gotten to dropping the last reference, > decremented counter to zero and committed to freeing the struct file. > > > + > > + return atomic_long_add_return(a, v) - 1; > > ... and you increment that sucker back to 1. Sure, you return 0, so the > caller does nothing to that struct file. Which includes undoing the > changes to its refecount. > > In the meanwhile, the third thread does fget on the same descriptor, > and there we end up bumping the refcount to 2 and succeeding. Which > leaves the caller with reference to already doomed struct file... > > IOW, NAK - this is completely broken. The whole point of > atomic_long_add_unless() is that the check and conditional increment > are atomic. Together. That's what your optimization takes out. Cheers Al, yes, this is fscked. As an aside, I've previously toyed with the idea of implementing a form of cmpxchg() using a pair of xchg() operations and a smp_cond_load_relaxed(), where the thing would transition through a "reserved value", which might perform better with the current trend of building hardware that doesn't handle CAS failure so well. But I've never had the time/motivation to hack it up, and it relies on that reserved value which obviously doesn't always work (so it would have to be a separate API). Will