Received: by 10.192.165.148 with SMTP id m20csp1850586imm; Sun, 6 May 2018 00:42:47 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpCekwql4MNttjXEdhnX53xFLv3ffU8M5BxMYscYdTS8VXYHJYn6Fn/qRA5cVZ3TNqBhYWx X-Received: by 2002:a63:9a42:: with SMTP id e2-v6mr18098105pgo.335.1525592567590; Sun, 06 May 2018 00:42:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525592567; cv=none; d=google.com; s=arc-20160816; b=Rlhge6JBS7MRAu76QhQxMP+MFY0ImucZBocLZa/FAQ6k9MMlG6CWaJJZ265byY8SnM ax/esD1/imSI5ZZhqhAtlTMf2bZboemwfXQ+ybkAv6h9L6W5XALmbCD1p8Tc4IxMW8XV UMz2GU+4XmwwnglWUopxmvjtMnR7MhkVXIslAHA7V/CwGIGHZnt1166ThzhcOnR0ZCe3 yzYGEpfRdafGn6tppgsrPLYccUJCJp+JyWwvFCunuhHho7YPNuHFnD+jux92doFnwrY6 QCPz/iUeBn5yG2P8r020l18XgHNvkZy2JMke86LU1Vek/K0INPCDrK9vECxr+utSMh8O Xflw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=wCRxMdGoHgu5HhFogy3/Hf63N1SnIEV9WAEiUM93/hQ=; b=z7Y0VaChBDY899vq5zpP2jP19CgReiIa3uek1bVRh9rbN52ZRLuJ36K58vE1xl7zVn r8TtErljooL7gxKdrIVsBpGvn+fO/a2z8acoNOv59E84iDpDdWvA65/jsvGSzYtfJw/3 mRNS3GmLP5CgQWx9eioP7whSnfz6znA+XzuDCl3HpNP1fPfudhvHARSk1TWS4Dg8KmLZ hgKfn42axj1TXoqE8p2XLqhQpCiIRrKnyj5forejXKih/cV8zqCM05skFLdIzbMW1XJc x1KCpouQQTXFBW+2xJB/JFTPPPTB36VvD9pqjFiYfREK1EKurY/FllISdLMyanHhi7Ld h5bA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u13-v6si8632492plq.161.2018.05.06.00.42.30; Sun, 06 May 2018 00:42:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751203AbeEFHlN (ORCPT + 99 others); Sun, 6 May 2018 03:41:13 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:36334 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750824AbeEFHlM (ORCPT ); Sun, 6 May 2018 03:41:12 -0400 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.87 #1 (Red Hat Linux)) id 1fFEH2-0005iO-Gb; Sun, 06 May 2018 07:40:13 +0000 Date: Sun, 6 May 2018 08:40:00 +0100 From: Al Viro To: John Paul Adrian Glaubitz Cc: Martin Steigerwald , Matthew Wilcox , dsterba@suse.cz, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jens Axboe , linux-m68k@lists.linux-m68k.org, Debian m68k Subject: Re: moving affs + RDB partition support to staging? Message-ID: <20180506073955.GJ30522@ZenIV.linux.org.uk> References: <20180425154602.GA8546@bombadil.infradead.org> <20180425203029.GQ21272@twin.jikos.cz> <20180426025717.GA32430@bombadil.infradead.org> <1613268.lKBQxPXt8J@merkaba> <76ca15e2-7b43-8b02-43e1-9ee65ab85356@physik.fu-berlin.de> <20180506005946.GI30522@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180506005946.GI30522@ZenIV.linux.org.uk> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, May 06, 2018 at 01:59:51AM +0100, Al Viro wrote: > > There is nothing at the moment that needs fixing. > > Funny, that... I'd been going through the damn thing for the > last week or so; open-by-fhandle/nfs export support is completely > buggered. And as for the rest... the least said about the error > handling, the better - something like rename() hitting an IO > error (read one) can not only screw the on-disk data into the > ground, it can do seriously bad things to kernel data structures. ... and while we are at it, consider the following: mkdir a mkdir b touch a/x ln a/x b process A: unlink("a/x"); process B: open("b/x"); We had two entries - one in a, another in b; the one in a is ST_FILE one, the one in b - ST_LINKFILE, with ->original refering to the ST_FILE one. unlink() needs to kick the entry out of a; since it can't leave an ST_FILE not included into any directory (and since the file should live on due to b/x still being there) it ends up removing ST_LINKFILE entry from b and moving ST_FILE one in its place. That happens in affs_remove_link(). However, open() gets there just as this surgery is about to begin. It gets to affs_lookup(), which finds the entry for b/x, reads it, stashes block number of that entry into dentry->d_fsdata, notices that it's ST_LINKFILE one, picks the block number of ST_FILE one out of it and proceeds to look up the inode; that doesn't end up doing any work (the same inode is in icache due to a/x having been looked up by unlink(2)). In the meanwhile, since the hash table of b has been unlocked once we'd done hash lookup, affs_remove_link() can proceed with the surgery. It *does* try to catch dentries with ->d_fsdata containing the block number of sacrificed ST_LINKFILE and reset that to block number of ST_FILE that gets moved in its place. However, it does so by static void affs_fix_dcache(struct inode *inode, u32 entry_ino) { struct dentry *dentry; spin_lock(&inode->i_lock); hlist_for_each_entry(dentry, &inode->i_dentry, d_u.d_alias) { if (entry_ino == (u32)(long)dentry->d_fsdata) { dentry->d_fsdata = (void *)inode->i_ino; break; } } spin_unlock(&inode->i_lock); } i.e. looping through dentries that point to our inode. Except that *our* dentry isn't attached to inode yet, so we are SOL - it's left with ->d_fsdata pointing to (destroyed) ST_LINKFILE. After a while process B closes the file and unlinks it. Take a look at affs_remove_header() and you'll see how much fun we are in for - it uses ->d_fsdata to pick the entry to find and remove... That's an fs corruptor, plain and simple. As far as I had been able to reconstruct the history, leftover after Roman's locking changes 17 years ago. AFAICS, I'd missed it back then - the races I'd spotted had been dealt with about a year later (when we started to lock the victim in vfs_unlink/vfs_rmdir/vfs_rename), but this one went unnoticed...