Received: by 2002:a17:90b:8d0:0:0:0:0 with SMTP id ds16csp201794pjb; Thu, 16 Jul 2020 11:52:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzIkNvZxpmEpNogJNRrskXdGQTx3Qub/zRg6TwhafL0nq/BaDUITVqIkZrR0OQk1r2eSsl8 X-Received: by 2002:a17:906:4c42:: with SMTP id d2mr4983442ejw.345.1594925559104; Thu, 16 Jul 2020 11:52:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594925559; cv=none; d=google.com; s=arc-20160816; b=o+D7Qk31Atds8fFBkYJKU5JQJZPPAaJ+ZuRuLj4ky3x8avi9QAz60y1kHcyY85v//Z h8eUsKvStoKYtFxB4eTuFDradBWq0sk3Ch+phIQswddVykoGGL6qiD/0Hc3ypakNir1n 0y/6+dTKt6Dd2IEI/puKWwnFhIEqntXWUWF8eGRUcvvdmpRyotb8xlnm2QVfX8GhFA8E 6co7el+H4f2HvhHgSJSbn+CAecdCZTnyGjRn2vNnxSB/V9MQYI/OicrR9Ou7oXP2osPh oJ+6lVhDXvIycA8axHOeJr3TbhaVBwmxbKNTEkYH2aZSVV0g+1W1TljEFOk7vl+RwX1w hj6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:mail-followup-to :reply-to:message-id:subject:cc:to:from:date; bh=j1mRqj4NP4uBXpIVXZrcj+rQer1XJKVJQCOsd/klCVk=; b=RLQzaf+cG2Qr802VyfxAHTIuTcIPrhyaLE2cYhxinzioU92D4Mmgya7WGpNuSO0sI4 tpee9KaSGlOozDcvrWvIQRJwjEkL3Iu2IVpgVfkqWzluOLzuvSfzBz7tOnmrWpMHlPkK 614vx1Dug1hYP9PkWRS55+Nf5EqhXXu67vqCsXN6r5rer9LFTxrDHeajael2VtxhKPsY hBgizaBQlbx4/nWqxFcq4pxqua+g6mdsRBDfysHFd1Q4Ul83bH1P3NIKzRkVT9F0/y5u MNVfjq0IWe+ZlJl+NVdrquXppULII6CJlhtxg6dwEvYe70ogZb0YkZ79Wtf5+Yg83N38 SD/g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d21si3940409edj.417.2020.07.16.11.52.16; Thu, 16 Jul 2020 11:52:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728560AbgGPSvi (ORCPT + 99 others); Thu, 16 Jul 2020 14:51:38 -0400 Received: from mx2.suse.de ([195.135.220.15]:40704 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726986AbgGPSvh (ORCPT ); Thu, 16 Jul 2020 14:51:37 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 446E8AB76; Thu, 16 Jul 2020 18:51:39 +0000 (UTC) Received: by ds.suse.cz (Postfix, from userid 10065) id 468A6DA790; Thu, 16 Jul 2020 20:51:10 +0200 (CEST) Date: Thu, 16 Jul 2020 20:51:10 +0200 From: David Sterba To: Boris Burkov Cc: Chris Mason , Josef Bacik , David Sterba , linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH v2] btrfs: fix mount failure caused by race with umount Message-ID: <20200716185110.GB3703@twin.jikos.cz> Reply-To: dsterba@suse.cz Mail-Followup-To: dsterba@suse.cz, Boris Burkov , Chris Mason , Josef Bacik , David Sterba , linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com References: <20200710172304.139763-1-boris@bur.io> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200710172304.139763-1-boris@bur.io> User-Agent: Mutt/1.5.23.1-rc1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 10, 2020 at 10:23:04AM -0700, Boris Burkov wrote: > Here is the sequence laid out in greater detail: > > CPU0 CPU1 > down_write sb->s_umount > btrfs_kill_super > kill_anon_super(sb) > generic_shutdown_super(sb); > shrink_dcache_for_umount(sb); > sync_filesystem(sb); > evict_inodes(sb); // SLOW > > btrfs_mount_root > btrfs_scan_one_device > fs_devices = device->fs_devices > fs_info->fs_devices = fs_devices > // fs_devices-opened makes this a no-op > btrfs_open_devices(fs_devices, mode, fs_type) > s = sget(fs_type, test, set, flags, fs_info); > find sb in s_instances > grab_super(sb); > down_write(&s->s_umount); // blocks > > sop->put_super(sb) > // sb->fs_devices->opened == 2; no-op > spin_lock(&sb_lock); > hlist_del_init(&sb->s_instances); > spin_unlock(&sb_lock); > up_write(&sb->s_umount); > return 0; > retry lookup > don't find sb in s_instances (deleted by CPU0) > s = alloc_super > return s; > btrfs_fill_super(s, fs_devices, data) > open_ctree // fs_devices total_rw_bytes improperly set! > btrfs_read_chunk_tree > read_one_dev // increment total_rw_bytes again!! > super_total_bytes < fs_devices->total_rw_bytes // ERROR!!! It seems weird that umount and mount can be mixed in such way but with the VFS locks and structures it's valid, so the devices managed by btrfs slipped through. With the suggested fix, the bit BTRFS_DEV_STATE_IN_FS_METADATA becomes quite important and the synchronization of the device related data. The semantics seems quite subtle and inconsistent regarding other uses of set_bit or clear_bit and the total_rw_bytes. I'm thinkig about unconditional setting of IN_FS_METADATA as it is now, but recalculating total_rw_size outside of read_one_dev in btrfs_read_chunk_tree. There it should not matter if the bit was set by the unmounted or the mounted filesystem, as long as the locking rules for updating fs_devices hold. For that we have uuid_mutex and fs_devices::device_list_mutex, this is used elsewhere so fixing it using existing mechanisms is IMHO better way than relying on subtle undocumented semantics of the state bit.