Received: by 2002:ac0:8c9a:0:0:0:0:0 with SMTP id r26csp3646694ima; Mon, 4 Feb 2019 02:45:22 -0800 (PST) X-Google-Smtp-Source: ALg8bN779RdHdyf27s0FfK4u3oi+7wN1dgMDLrzfCJnd9fQ5EY05mdFJs5sVsywRmNH4S7u1Mu/L X-Received: by 2002:a62:2c4d:: with SMTP id s74mr49947096pfs.6.1549277122928; Mon, 04 Feb 2019 02:45:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549277122; cv=none; d=google.com; s=arc-20160816; b=nzbW6/SHH1PdBBxbMEXev5Q7GTXYpYXpOEz80Wc54enn8M5UqCcmO5NcQaCSLi1lU8 dTzEPbs+gGehAalUHW7pQYgY7/hEAYqHAHm9w24g4c8nbl2jCQZ8EZQhMgsAE4fXbY7M 6azh9Pbi0wER5nb81amwIxWiUTLzjU6px4YYbXnZ28OHcbCxmGmAs6PTsyDDLS/Jua8u vebXfEBuEj1BZhfViFP+bPT/UYxyAFiBENLMiWid60sBHNCpD5cP5mST9L5YhzAtUCSQ bFR+lZkvmKakaH1v3TbMAihWmu8dSITrMaQkqubfM556QQrr28GPvRBxn9blydMTOAxN OmOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=a609i6DJp0EDOMiXIZdyX/Z2WcAY9JeELFE+ingFinE=; b=F/2JuM91gTt4B/CYpBogW4cxrUzi5TA1VHz5YqPsQ60HXCrfHXOpu5SdGj8O7/6UeY 53wrIQ10TON1gK+YbRF2JdgRqJbO1zNK5UX9h7wK+9cSMm50iYAwewp/+GIzDLHBxXdS reb45p2lfqxA3iYnxa3yZJIKpgw/xIDooscaFQludL9PGNXi6zWrfibDv+fS9ZcgVyQT /ttHgP0jKn2d+VXXzJTPWxGpDlKVqDHGXXy8pm6YX2FN46pEoxLG2Z3RxC3EYGcrpw5+ ZeurGyHhKerXA3kh3e5+CJ4iPLPQlCkQQW2r4MHSo/pE0hMnB+jEIUh12cNbNSfDXuKT ULAA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=2v9hCDpP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w83si5092111pfk.125.2019.02.04.02.45.07; Mon, 04 Feb 2019 02:45:22 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=2v9hCDpP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730944AbfBDKoP (ORCPT + 99 others); Mon, 4 Feb 2019 05:44:15 -0500 Received: from mail.kernel.org ([198.145.29.99]:42004 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729882AbfBDKoN (ORCPT ); Mon, 4 Feb 2019 05:44:13 -0500 Received: from localhost (5356596B.cm-6-7b.dynamic.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A1D2B2070C; Mon, 4 Feb 2019 10:44:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1549277052; bh=fRCRjrVBczx6q76F20m0u8R1WDV9NlG4S3CuC3iWbzA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=2v9hCDpPEVng/MHeKC41Fgxt0HKZrXgzFogesEAf5gZ/S1OVbdjoMfROdRPj+HC9T 7gE0zLR3GFOdQ37WpLY7mrE4/A6daLKzIQYkYu6ZEbcB3OYZhoiMv8hU1EWrand0an 8RHByfls5rkiK2EN4ce5DQIWatLGAn51tou6R/X0= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, stable@kernel.org, Dave Chinner , Al Viro , Aaron Lu Subject: [PATCH 4.9 29/30] fs: dont scan the inode cache before SB_BORN is set Date: Mon, 4 Feb 2019 11:37:07 +0100 Message-Id: <20190204103610.652434682@linuxfoundation.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190204103605.271746870@linuxfoundation.org> References: <20190204103605.271746870@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review X-Patchwork-Hint: ignore MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.9-stable review patch. If anyone has any objections, please let me know. ------------------ From: Dave Chinner commit 79f546a696bff2590169fb5684e23d65f4d9f591 upstream. We recently had an oops reported on a 4.14 kernel in xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage and so the m_perag_tree lookup walked into lala land. It produces an oops down this path during the failed mount: radix_tree_gang_lookup_tag+0xc4/0x130 xfs_perag_get_tag+0x37/0xf0 xfs_reclaim_inodes_count+0x32/0x40 xfs_fs_nr_cached_objects+0x11/0x20 super_cache_count+0x35/0xc0 shrink_slab.part.66+0xb1/0x370 shrink_node+0x7e/0x1a0 try_to_free_pages+0x199/0x470 __alloc_pages_slowpath+0x3a1/0xd20 __alloc_pages_nodemask+0x1c3/0x200 cache_grow_begin+0x20b/0x2e0 fallback_alloc+0x160/0x200 kmem_cache_alloc+0x111/0x4e0 The problem is that the superblock shrinker is running before the filesystem structures it depends on have been fully set up. i.e. the shrinker is registered in sget(), before ->fill_super() has been called, and the shrinker can call into the filesystem before fill_super() does it's setup work. Essentially we are exposed to both use-after-free and use-before-initialisation bugs here. To fix this, add a check for the SB_BORN flag in super_cache_count. In general, this flag is not set until ->fs_mount() completes successfully, so we know that it is set after the filesystem setup has completed. This matches the trylock_super() behaviour which will not let super_cache_scan() run if SB_BORN is not set, and hence will not allow the superblock shrinker from entering the filesystem while it is being set up or after it has failed setup and is being torn down. Cc: stable@kernel.org Signed-Off-By: Dave Chinner Signed-off-by: Al Viro Signed-off-by: Aaron Lu Signed-off-by: Greg Kroah-Hartman --- fs/super.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) --- a/fs/super.c +++ b/fs/super.c @@ -119,13 +119,23 @@ static unsigned long super_cache_count(s sb = container_of(shrink, struct super_block, s_shrink); /* - * Don't call trylock_super as it is a potential - * scalability bottleneck. The counts could get updated - * between super_cache_count and super_cache_scan anyway. - * Call to super_cache_count with shrinker_rwsem held - * ensures the safety of call to list_lru_shrink_count() and - * s_op->nr_cached_objects(). + * We don't call trylock_super() here as it is a scalability bottleneck, + * so we're exposed to partial setup state. The shrinker rwsem does not + * protect filesystem operations backing list_lru_shrink_count() or + * s_op->nr_cached_objects(). Counts can change between + * super_cache_count and super_cache_scan, so we really don't need locks + * here. + * + * However, if we are currently mounting the superblock, the underlying + * filesystem might be in a state of partial construction and hence it + * is dangerous to access it. trylock_super() uses a MS_BORN check to + * avoid this situation, so do the same here. The memory barrier is + * matched with the one in mount_fs() as we don't hold locks here. */ + if (!(sb->s_flags & MS_BORN)) + return 0; + smp_rmb(); + if (sb->s_op && sb->s_op->nr_cached_objects) total_objects = sb->s_op->nr_cached_objects(sb, sc); @@ -1193,6 +1203,14 @@ mount_fs(struct file_system_type *type, sb = root->d_sb; BUG_ON(!sb); WARN_ON(!sb->s_bdi); + + /* + * Write barrier is for super_cache_count(). We place it before setting + * MS_BORN as the data dependency between the two functions is the + * superblock structure contents that we just set up, not the MS_BORN + * flag. + */ + smp_wmb(); sb->s_flags |= MS_BORN; error = security_sb_kern_mount(sb, flags, secdata);