Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp3288449imm; Sun, 29 Jul 2018 15:07:20 -0700 (PDT) X-Google-Smtp-Source: AAOMgpcF5u5UMN0ODLmSRbQhMolagtpxAgSiMR0X2yqewrcaw+eXiHcyccMOZTxr9lfUzqJSbwqq X-Received: by 2002:a17:902:5381:: with SMTP id c1-v6mr13784139pli.201.1532902040702; Sun, 29 Jul 2018 15:07:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532902040; cv=none; d=google.com; s=arc-20160816; b=DDahsn1sbNngvMIaSIeV+XTQwWBlS5+krZXpnY7GpEyWq1BA9/h9UMoc5x7Pw3N8D7 iHaqMvz7tigo1Q1ctrHQjumCZ48qu0f4376rkMbu+e+PuxfC109JHzx7U9u/66mec96S ksVJDiY5clOUpI4meJE1r5wa1+uHMhT6nwuLvX82JcpWFLGyUEorE+j05M/SbR1QL2am uCdWaILr7FdLP3Pj8qRr8X9jyr29RJ6P2PqSU4K2ua/0btQTm4r/lXdVhozYy/Cdg1z/ INZ+p31sH9AJKip9kmTDbd61RBYhRgdI5kTxCgqPS8rJ+q3MiuAI9DJapVG4OBRkvOxL wqEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:content-disposition :mime-version:message-id:subject:cc:to:from:date :arc-authentication-results; bh=xgL4KSsG/Oqt4MEdfgMNO5ynnGgSTO7fLNUjHtX76i4=; b=rhNCBAO3lz+juIz2Gz9846SPFChBP4O3UZef/PGh4rd61yYwCzf7KNMGC8pFM5emPH fp2fySxpQfjnM4sKMMDfzvxiPhyFE+EZpK0QxcWXR7OAfTFbp31EO/XjQrnvr+LqdvG+ 7cvKOGm6uvaJuQrIqEtXFebmMRVAY6okrz+gayektkQbujLkm+PpPZojaavSd5wTX8iC UfsOa3ovEHNTc34R18/un3roQW6SiJ+l0GqDFI05xs8rfnJtND1lVAoA8Wn/yFS31e+q 3+zMZ+k7aaJHtbi94S4+YUzduVnUITpVy6qMB+eEK9Aoe/c6AQDVH+31F6aFGXDWyDa/ hu3g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q13-v6si9547585pgc.670.2018.07.29.15.07.06; Sun, 29 Jul 2018 15:07:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732002AbeG2XfU (ORCPT + 99 others); Sun, 29 Jul 2018 19:35:20 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:51516 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726826AbeG2XfU (ORCPT ); Sun, 29 Jul 2018 19:35:20 -0400 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.87 #1 (Red Hat Linux)) id 1fjtmX-0003Pd-Do; Sun, 29 Jul 2018 22:03:17 +0000 Date: Sun, 29 Jul 2018 23:03:17 +0100 From: Al Viro To: linux-fsdevel@vger.kernel.org Cc: Linus Torvalds , linux-kernel@vger.kernel.org, Miklos Szeredi Subject: [PATCHES][RFC] icache-related stuff Message-ID: <20180729220317.GB30522@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Assorted icache-related fixes for the next window; some of that is -stable fodder. 1) NFS and FUSE mkdir/open_by_handle() race fix. NFS side posted and discussed earlier, NFS folks hadn't objected... Basically, the strategy used by local filesystems to deal with that kind of races does not (and cannot) work for NFS - there the icache search key is not even known to us until the underlying (== server-side) data structures for the object being created look good. So we need a different approach - just let nfs_mkdir() use d_splice_alias() and leave the originally passed dentry unhashed negative if we'd raced and picked an existing alias. The callers of ->mkdir() are fine with that. Unlike NFS, FUSE (which has the same kind of problem) does deal with it in mainline. However, the same approach (d_splice_alias() and leave the argument unhashed negative if aliases exist) works better than what FUSE does in mainline *and* allows to kill a warty primitive nobody else is using. nfs_instantiate(): prevent multiple aliases for directory inode kill d_instantiate_no_diralias() 2) The local side of things isn't exactly correct either - typical local fh_to_dentry() will do icache lookup and if setup fails halfway through e.g. mkdir(), we are left with a nasty choice - either we leave the not-quite-set-up inode hashed (and then open_by_handle() can pick it, with subsequent nasal demons) or we unhash it and risk open_by_handle() coming immediately after unhash and getting a separate in-core inode for the same on-disk one, just as the on-disk one gets freed. Some filesystems are careful enough with those half-set-up inodes to be safe (with the first variant, that is). Some are not. Solution: new flag (I_CREATING) set by insert_inode_locked() and removed by unlock_new_inode() and a new primitive (discard_new_inode()) to be used by such halfway-through-setup failure exits instead of unlock_new_inode() / iput() combinations. That primitive unlocks new inode, but leaves I_CREATING in place. iget_locked() treats finding an I_CREATING inode as failure (-ESTALE, once we sort out the error propagation). insert_inode_locked() treats the same as instant -EBUSY. ilookup() treats those as icache miss. A bunch of filesystems switched to discard_new_inode() (btrfs, ufs, udf, ext2, jfs) new primitive: discard_new_inode() btrfs: switch to discard_new_inode() ufs: switch to discard_new_inode() udf: switch to discard_new_inode() ext2: make sure that partially set up inodes won't be returned by ext2_iget() jfs: switch to discard_new_inode() 3) Miklos' regression fix (he had been too optimistic in iget5_locked cleanups this window; I'd grumbled about that being wrong, but hadn't realized how bad it was). vfs: don't evict uninitialized inode 4) several btrfs cleanups around btrfs_iget() and friends. btrfs: btrfs_iget() never returns an is_bad_inode() inode. btrfs: IS_ERR(p) && PTR_ERR(p) == n is a weird way to spell p == ERR_PTR(n) btrfs: lift make_bad_inode() into btrfs_iget() btrfs: simplify btrfs_iget() 5) misc stuff - new primitive for filesystems that want inodes to look hashed, but don't want them polluting the hash chains (currently open-coded), making adfs use that (it never ever looks anything in icache), dropping a cargo-culted make_bad_inode() in jfs ialloc failure path. new helper: inode_fake_hash() adfs: don't put inodes into icache jfs: don't bother with make_bad_inode() in ialloc()