From: Linus Torvalds Subject: Re: [GIT PULL] inode->i_version rework for v4.16 Date: Mon, 29 Jan 2018 13:50:47 -0800 Message-ID: References: <1517228795.5965.24.camel@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: open list , "" , Al Viro , xfs , "open list:NFS, SUNRPC, AND..." , linux-btrfs , linux-integrity , Andrew Morton , "linux-ext4@vger.kernel.org" To: Jeff Layton Return-path: In-Reply-To: <1517228795.5965.24.camel@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org On Mon, Jan 29, 2018 at 4:26 AM, Jeff Layton wrote: > > This pile of patches is a rework of the inode->i_version field. We have > traditionally incremented that field on every inode data or metadata > change. Typically this increment needs to be logged on disk even when > nothing else has changed, which is rather expensive. Hmm. I have pulled this, but it is really really broken in one place, to the degree that I always went "no, I won't pull this garbage". But the breakage is potential, not actual, and can be fixed trivially, so I'll let it slide - but I do require it to be fixed. And I require people to *think* about it. So what's to horribly horribly wrong? The inode_cmp_iversion{+raw}() functions are pure and utter crap. Why? You say that they return 0/negative/positive, but they do so in a completely broken manner. They return that ternary value as the sequence number difference in a 's64', which means that if you actually care about that ternary value, and do the *sane* thing that the kernel-doc of the function implies is the right thing, you would do int cmp = inode_cmp_iversion(inode, old); if (cmp < 0 ... and as a result you get code that looks sane, but that doesn't actually *WORK* right. To make it even worse, it will actually work in practice by accident in 99.99999% of all cases, so now you have (a) subtly buggy code (b) that looks fine (c) and that works in testing which is just about the worst possible case for any code. The interface is simply garbage that encourages bugs. And the bug wouldn't be in the user, the bug would be in this code you just sent me. The interface is simply wrong. So this absolutely needs to be fixed. I see two fixes: - just return a boolean. That's all that any current user actually wants, so the ternary value seems pointless. - make it return an 'int', and not just any int, but -1/0/1. That way there is no worry about uses, and if somebody *really* cares about the ternary value, they can now use a "switch" statement to get it (alternatively, make it return an enum, but whatever). That "ternary" function that has 18446744069414584320 incorrect return values really is unacceptable. Linus