Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp3644814img; Mon, 25 Mar 2019 14:46:27 -0700 (PDT) X-Google-Smtp-Source: APXvYqy76ASieTNMjmCC4aMXyuViaEl4/V9iL/doOL5a/VQX5aHBcKQK/dFDBsp+PP5F/lCsB1Zu X-Received: by 2002:aa7:9090:: with SMTP id i16mr25402711pfa.85.1553550387082; Mon, 25 Mar 2019 14:46:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553550387; cv=none; d=google.com; s=arc-20160816; b=VdZW3w2QSfbwxlqMHfP4CJxPZuYyIXsC/KQKb8edG9LxqCq13Q3idoBrJU20wWjq2+ OuSvJkHBz1kp8y5ob0oVdhMj6WknBOSX0o26TM6KIFWkahWSgscZpIW//8p6i0c+gVIh Z+J4VUP0MYEvkRdr21+mtjTPHQMP5Mv/FW6cMmN1S6hYLHQAJaSMcpcVWjEIT80Rabud K2axi40VczanGk0pVS7bnIpggGACFeJKuaTxC0FwyH94YDgXadHgKkheyWDhJmbUmhc8 iYWvgD1SS7NfBKcBUspKPJYzDjh56SVln1pPy5vzg0LXakjDtp1cmfPgl2BtipySxs/1 +mjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=RyrGMeVONhvfWoKfN08XEzZDL45w36KigWKOTw1mc3I=; b=mYUyq15wxXcWUZLzbUZFpB1XOej60JdedMrxVJ86+V1T8CH9mgWQlohRZHI2hsMr02 aZUiJsuuaiNsY2Zkl25liryd2CncX6gOQIm9Q4jrlW5phmCGcieQoircvGyfCI8y/pnN WQrx4IhJHFpxgtlruv9OwcB1uCsVs9IqlUTjrIIBrZz0bW9ZV7lEtUVTU7BlBlVonyVV slZUdLyWOGOl98yMASS7Rkv7zHsm0rDftjm/7sFKRbzBbUpXwSDNeO/NwMaGS3FZTN7t qLNH0Suvd3FJqA3t6A8wqIATRKQWNl4J4rYUCqaziK6RKmWKh2w/2lYVSUrOPB1nwlWY GeFw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=Uz0aIzxx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w61si2573491plb.160.2019.03.25.14.46.12; Mon, 25 Mar 2019 14:46:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=Uz0aIzxx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730061AbfCYVpW (ORCPT + 99 others); Mon, 25 Mar 2019 17:45:22 -0400 Received: from mail-lf1-f68.google.com ([209.85.167.68]:36554 "EHLO mail-lf1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729106AbfCYVpV (ORCPT ); Mon, 25 Mar 2019 17:45:21 -0400 Received: by mail-lf1-f68.google.com with SMTP id d18so7138637lfn.3 for ; Mon, 25 Mar 2019 14:45:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=RyrGMeVONhvfWoKfN08XEzZDL45w36KigWKOTw1mc3I=; b=Uz0aIzxx/3EgWDjsNP/gnfIPI2LVbLni/Kmq7pLXtN7Ip3sVQMgxHR3LDF4zfNwEbJ eOAKhAIb+QUbOmu4vwn04d+2RgFG7hT04bLNwtW22w9kkJKfQ46cFKG2SwvQnIOV1Xyf /homDzyLih6emk2h3yM48/WKn/oFK/YFsuKek= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=RyrGMeVONhvfWoKfN08XEzZDL45w36KigWKOTw1mc3I=; b=iQe1BZ/RTTbX+zg1etI+1c3+Ni9bWPeIz0E/atdW2cdQL12n/Gr7CC1eyL2pKGVIdy DJmXtXMtZxTBWd+eFqF19F4KvoNlMTsd+HYZloBJlCPjpwv9nwVxBN3ExhFPDzWvDlj/ b7tMBT/h7eKr7B44TfSbYWsHhnMRdfXPZRxvoQus49slZv0HhWFz8ipDBBP/Ss2mLnrQ KcHKfvr2CY/Lab/6xk5MHyPBH7B/p5u8EcZu5pbKpGGbz0JmX5T7YnH9g6EyYdKu6zBZ n8lAzLydKLXmx0BByFZXWbDGtEGJ8muauu3HcWWSqGtbch5D8uvkW1UkHDaTkw0PkAQD Rw3A== X-Gm-Message-State: APjAAAVbgxru1mgyfVcdJExS2qFhk40L+K5DIPcvJ8nGDmtZ9d0SisGE vfakMy2dqodH2Ptv8oezWlK5WWt9faE= X-Received: by 2002:a19:9209:: with SMTP id u9mr14256463lfd.71.1553550317937; Mon, 25 Mar 2019 14:45:17 -0700 (PDT) Received: from mail-lj1-f180.google.com (mail-lj1-f180.google.com. [209.85.208.180]) by smtp.gmail.com with ESMTPSA id c7sm3931498lja.90.2019.03.25.14.45.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Mar 2019 14:45:16 -0700 (PDT) Received: by mail-lj1-f180.google.com with SMTP id l7so9249329ljg.6 for ; Mon, 25 Mar 2019 14:45:16 -0700 (PDT) X-Received: by 2002:a2e:960b:: with SMTP id v11mr4509575ljh.135.1553550316377; Mon, 25 Mar 2019 14:45:16 -0700 (PDT) MIME-Version: 1.0 References: <0000000000006946d2057bbd0eef@google.com> <20190325045744.GK2217@ZenIV.linux.org.uk> <20190325211405.GP2217@ZenIV.linux.org.uk> In-Reply-To: <20190325211405.GP2217@ZenIV.linux.org.uk> From: Linus Torvalds Date: Mon, 25 Mar 2019 14:45:00 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: KASAN: use-after-free Read in path_lookupat To: Al Viro Cc: syzbot , Alexei Starovoitov , Daniel Borkmann , linux-fsdevel , Linux List Kernel Mailing , syzkaller-bugs Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 25, 2019 at 2:14 PM Al Viro wrote: > > Maybe, but we really need to come up with sane documentation on the > entire drop_inode/evict_inode/destroy_inode/rcu_destroy_inode > group ;-/ Yeah. I actually think the "destroy_inode/rcu_destroy_inode" part is the simplest one to understand: it's just tearing down the inode, and the RCU part is (obviously, as shown by this thread) somewhat subtle, but at the same time not really all that hard to explain: "inodes that can be looked up through RCU lookup need to be destroyed with RCU delay". I think drop_inode->evict_inode is arguably the much more subtle stuff. That's the "we're not destroying things, we're making decisions about the state" kind of thing. And we actually don't have any documentation at all for those two. Well, there's the "porting" thing, but.. > And I want to understand the writeback-related issues > in ocfs2 and f2fs - the current kludges in those smell fishy. I'm assuming you're talking about exactly that drop_inode() kind of subtlety. At least for ocfs2, the whole "rcu_destroy_inode()" patch I sent out would simplify things a lot. *All* that the ocfs2_destroy_inode() function does is to schedule the inode freeing for RCU delay. Assuming my patch is fine (as mentioned, it was entirely untested), that patch would actually simplify that a lot. Get rid of ocfs2_destroy_inode() entirely, and just make ocfs2_rcu_destroy_inode() do that kmem_cache_free(). Mucho clean-up (we have that ocfs2_rcu_destroy_inode() currently as ocfs2_i_callback(), but with the ugly rcu-head interface). > > if (unlikely(inode_init_always(sb, inode))) { > > - if (inode->i_sb->s_op->destroy_inode) > > + if (inode->i_sb->s_op->destroy_inode) { > > inode->i_sb->s_op->destroy_inode(inode); > > - else > > + if (!inode->i_sb->s_op->rcu_destroy_inode) > > + return NULL; > > + } > > + if (!inode->i_sb->s_op->rcu_destroy_inode || > > + !inode->i_sb->s_op->rcu_destroy_inode(inode)) > > kmem_cache_free(inode_cachep, inode); > > ITYM i_callback(inode); here, possibly suitably renamed. Yeah, I guess we could just call that i_callback() directly. I wrote it that way because it was how the code was organized (it didn't call i_callback() before either), but yes, it woudl avoid some duplicate code to do it that way. And yes, at that point we'd probably want to rename it too. On the whole, looking at a few filesystems, I really think that 'rcu_destroy_inode()" would simplify several of them. That ocfs2 case I already mentioned, but looking around the exact same is trye in v9fs, vxfs, dlmfs, hostfs, bfs, coda, fatfs, isofs, minix, nfs, openprom, proc, qnx4, qnx6, sysv, befs, adfs, affs, efs, ext2, f2fs, gfs2, hfs, hfsplus, hpfs, jffs2, nilfs, reiserfs, romfs, squashfs) And in other filesystems that actually *do* have a real destroy_inode() that does other things too, they still want the rcu delayed part as well (eg btrfs, ceph, fuse, hugetlbfs, afs, ecryptfs, ext4, jfs, organgefs, ovl, ubifs). In fact, every single filesystem I looked at (and I looked at most) would be simplified by that patch. And for some it would make it easier to fix bugs in the process (ie bpf) because they don't currently have that rcu delay and they need it. So looking at all the existing users of destroy_inode(), I really do think we should have that rcu_destroy_inode() callback. Because that 3 files changed, 28 insertions(+), 6 deletions(-) will allow quite a lot of lines to be removed from low-level filesystems with all the call_rcu() and callback container_of(head, struct inode, i_rcu) noise just going away. But yes, it would be good to have more documentation fo the drop/evict phase. As mentioned, I think the destroy phase is actually the easy one, with the RCU part being the most complex one that this would actually simplify. Linus