Received: by 2002:a17:90b:8d0:0:0:0:0 with SMTP id ds16csp4883917pjb; Mon, 27 Jul 2020 07:33:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzM092g77Yk3boH1XLiDw4heFKjghy3neFEgUzCNXkqq+OTfupBycP8z3GVUSDdXlo1+SBo X-Received: by 2002:aa7:c98d:: with SMTP id c13mr21257605edt.188.1595860396071; Mon, 27 Jul 2020 07:33:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595860396; cv=none; d=google.com; s=arc-20160816; b=O43rQJru9iowTcgHZjksIsEA8XogP9fEHKdM/3fNTUT85n5GopIfWTlTvQknihkIC6 6tHKfCewfcENWBrxbBFoApjnZSFqUEaiq3AGil7Dcehc1Sw+sO9LCgbTtqAlRBh0C+Wk A0kFvKi6nRtsSZ5tVSn/MxvC9LiiXxoiOUDUH5qD2jz8pCsmxdX7c/AloERbTOXj2L9+ 2ZNbQIvlmC7fnOEOaccd1gp7Aw7d4Mv1aBvT7cRshdcqxTxR0OVmTrEltLOyp7YqbnSw mPEX8n/7xaO0xupPiAlNmsXHTAX8QP2G/38Ed3HjN0SPYIHW41uylZryayXcz9SNDlQQ AHBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=hpKNfVtZPFV3VXkfg4w+KlIO7WsibG+tSyP4CvMGebs=; b=LM4DGWTU3icK77KBuib/5sC3Kh7wLmquHcmyHBH+s62Zwx/G8qqutGCV5oNJlrpg3O OkWmZK29Nw62z0npBJyi55V2BJdrp78rX/o3YZpgJ63CVzKIXdFrsYltZLF18y6FaxaL p8ooRK2+vWwIl7DLfSpDP27uCnJ7P3I2O9FW/rCZTe5svof+o/otHTeSeBcGUQ1U/ZBh mYJuBmpMRZDkg2ZHDre5FbHls3OYxj/waCbxokEZAekGhQGbKOMLB1qCLpDbch7iLCBs u/mztD7Xea6GS5nToZlPdZIDuBOf24UXu2+CinoNUCkO3GAcxdGabyqOczmU/d/RGW6s mjcg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=r26xTxCM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f1si5666348edm.601.2020.07.27.07.32.54; Mon, 27 Jul 2020 07:33:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=r26xTxCM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732302AbgG0Obi (ORCPT + 99 others); Mon, 27 Jul 2020 10:31:38 -0400 Received: from mail.kernel.org ([198.145.29.99]:48468 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731526AbgG0OUQ (ORCPT ); Mon, 27 Jul 2020 10:20:16 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 5E73520775; Mon, 27 Jul 2020 14:20:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1595859615; bh=M4XB6A0Or7ENzp2Ogu+rq/XEJ+qLF4cqMriz9m8WTfE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=r26xTxCMXMnXc1rPKAW/XbbTKMJI0xhuXvrYExk59dRavBfxapkhU183nh1ZFjFdP XEVGwCPA+s/2kPPHUNy2dGjX/cePqznR+wYFeMxNDaAfGZyG+2en5feW9X40Slw7zQ r1BLQfKH2OXZOC8Ii7LslRxZ6PdBSj4NcnwnyLk0= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Boris Burkov , David Sterba Subject: [PATCH 5.7 039/179] btrfs: fix mount failure caused by race with umount Date: Mon, 27 Jul 2020 16:03:34 +0200 Message-Id: <20200727134934.579218640@linuxfoundation.org> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200727134932.659499757@linuxfoundation.org> References: <20200727134932.659499757@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Boris Burkov commit 48cfa61b58a1fee0bc49eef04f8ccf31493b7cdd upstream. It is possible to cause a btrfs mount to fail by racing it with a slow umount. The crux of the sequence is generic_shutdown_super not yet calling sop->put_super before btrfs_mount_root calls btrfs_open_devices. If that occurs, btrfs_open_devices will decide the opened counter is non-zero, increment it, and skip resetting fs_devices->total_rw_bytes to 0. From here, mount will call sget which will result in grab_super trying to take the super block umount semaphore. That semaphore will be held by the slow umount, so mount will block. Before up-ing the semaphore, umount will delete the super block, resulting in mount's sget reliably allocating a new one, which causes the mount path to dutifully fill it out, and increment total_rw_bytes a second time, which causes the mount to fail, as we see double the expected bytes. Here is the sequence laid out in greater detail: CPU0 CPU1 down_write sb->s_umount btrfs_kill_super kill_anon_super(sb) generic_shutdown_super(sb); shrink_dcache_for_umount(sb); sync_filesystem(sb); evict_inodes(sb); // SLOW btrfs_mount_root btrfs_scan_one_device fs_devices = device->fs_devices fs_info->fs_devices = fs_devices // fs_devices-opened makes this a no-op btrfs_open_devices(fs_devices, mode, fs_type) s = sget(fs_type, test, set, flags, fs_info); find sb in s_instances grab_super(sb); down_write(&s->s_umount); // blocks sop->put_super(sb) // sb->fs_devices->opened == 2; no-op spin_lock(&sb_lock); hlist_del_init(&sb->s_instances); spin_unlock(&sb_lock); up_write(&sb->s_umount); return 0; retry lookup don't find sb in s_instances (deleted by CPU0) s = alloc_super return s; btrfs_fill_super(s, fs_devices, data) open_ctree // fs_devices total_rw_bytes improperly set! btrfs_read_chunk_tree read_one_dev // increment total_rw_bytes again!! super_total_bytes < fs_devices->total_rw_bytes // ERROR!!! To fix this, we clear total_rw_bytes from within btrfs_read_chunk_tree before the calls to read_one_dev, while holding the sb umount semaphore and the uuid mutex. To reproduce, it is sufficient to dirty a decent number of inodes, then quickly umount and mount. for i in $(seq 0 500) do dd if=/dev/zero of="/mnt/foo/$i" bs=1M count=1 done umount /mnt/foo& mount /mnt/foo does the trick for me. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Boris Burkov Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/volumes.c | 8 ++++++++ 1 file changed, 8 insertions(+) --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7056,6 +7056,14 @@ int btrfs_read_chunk_tree(struct btrfs_f mutex_lock(&fs_info->chunk_mutex); /* + * It is possible for mount and umount to race in such a way that + * we execute this code path, but open_fs_devices failed to clear + * total_rw_bytes. We certainly want it cleared before reading the + * device items, so clear it here. + */ + fs_info->fs_devices->total_rw_bytes = 0; + + /* * Read all device items, and then all the chunk items. All * device items are found before any chunk item (their object id * is smaller than the lowest possible object id for a chunk