Received: by 2002:a05:6a10:8a4d:0:0:0:0 with SMTP id dn13csp711073pxb; Fri, 13 Aug 2021 04:48:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyDjSIN0YqnNQ0KmSbmOHzPKiW/Kyv0f5N/YnP2Ppsfqifyo8GHlAa3X4/vDVexXVYLwEj/ X-Received: by 2002:a17:906:4fd6:: with SMTP id i22mr2048883ejw.92.1628855324887; Fri, 13 Aug 2021 04:48:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628855324; cv=none; d=google.com; s=arc-20160816; b=ZMUI3KXDtWAOm6bQdQiWLQly9yUXLTRo1982PBwjI4icmnCtGJIWCYTlitzMWfhkJo YJCzfpsfJGVvqtSKgC+b/1witmtvdOquUsiJ1LCVNfA4PqVzNAty0U4ZKXcIS8oMlX/Z X6oEf9hM+W6lE/BHXr/ZZwCG6zlGBvesOdKiDC1Qk2XzOd6JcUrMzKG9yNCWsS2SmBDW 5SboT1uN/+MIapM+HDnciZAJ7qvuSR4ugKo5uU2Nb9K8ZNh90FCIjuFBJr8aADl9tkOU RpMLa2ObZp3JyzEIvJ1tcO/sj9/zMAT6P8GBw/Ei05kiCpwUG+oa8nVlVj4PczLMvxzG 7OKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:mail-followup-to:reply-to:message-id :subject:cc:to:from:date:dkim-signature:dkim-signature; bh=THaFvJ0mMbfDh3kdowKax6bW6A4Pv8w9QwsEiqD14UI=; b=ZHFIvykatEsHoaBue3B20aKBmbFw+QkCTz5yk4haoF7+ZQirgC4+kRfNu9YxGqt8L5 J3UqasHo4ncNnGFhtxnCClcO+VEmjVAMyp8Ve97ZUUgGnXa/rW7Yo+0tGUoPfgTvR0f4 eHfiKNjtrVX8yKEgxVbSVNzPM5RuNGaWS4Wr6YF8BMKLXG0Kgpki6w+ffGjGCg7bqC7N +8q3AGZ2hxyZD+CeXzYiHsuIhHI8ilUQEykmrLc7wrCzZYBKfthFF1PzFaZ8krjl3o1l U8swpkuBziJxCX4zSAqOkLbFVoGid7aO2CLuOjNztjT/EHIzxppFh5Pr4AqUZ463J5jI b/kw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=1lCDKFU5; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519 header.b=9x9HZVG+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bl4si1756480ejb.689.2021.08.13.04.48.21; Fri, 13 Aug 2021 04:48:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=1lCDKFU5; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519 header.b=9x9HZVG+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239557AbhHMKdz (ORCPT + 99 others); Fri, 13 Aug 2021 06:33:55 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:48354 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238615AbhHMKdy (ORCPT ); Fri, 13 Aug 2021 06:33:54 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id EDD4F201B2; Fri, 13 Aug 2021 10:33:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1628850806; h=from:from:reply-to:reply-to:date:date:message-id:message-id:to:to: cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=THaFvJ0mMbfDh3kdowKax6bW6A4Pv8w9QwsEiqD14UI=; b=1lCDKFU5NzKjrckU12DycGARob7wjYTUQRJnEcmQbXa+uNz1PDy+dg5JZXjsCNQl2B3ET6 q6G1J+sd2kWCoS9+3LSqZ3vpcXzPd4yWYx1LEQnI6NKie4un45Odrkbg2jTBbafs0MVWvW dofC/Xvs5vjDdKgYEs05zGOvGvb+4Vw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1628850806; h=from:from:reply-to:reply-to:date:date:message-id:message-id:to:to: cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=THaFvJ0mMbfDh3kdowKax6bW6A4Pv8w9QwsEiqD14UI=; b=9x9HZVG+Iu2H2NF5yuSqyumy5ZIO8zBpSeb150vsVzVEHMbNnfR4FSTS8UgoS5kggkzFx+ xTIv/3oBOWJt4yAA== Received: from ds.suse.cz (ds.suse.cz [10.100.12.205]) by relay2.suse.de (Postfix) with ESMTP id D9FA5A3B84; Fri, 13 Aug 2021 10:33:26 +0000 (UTC) Received: by ds.suse.cz (Postfix, from userid 10065) id DBCD1DB25F; Fri, 13 Aug 2021 12:30:32 +0200 (CEST) Date: Fri, 13 Aug 2021 12:30:32 +0200 From: David Sterba To: Desmond Cheong Zhi Xi Cc: dsterba@suse.cz, clm@fb.com, josef@toxicpanda.com, dsterba@suse.com, anand.jain@oracle.com, linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, skhan@linuxfoundation.org, gregkh@linuxfoundation.org, linux-kernel-mentees@lists.linuxfoundation.org, syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com Subject: Re: [PATCH v2] btrfs: fix rw device counting in __btrfs_free_extra_devids Message-ID: <20210813103032.GR5047@twin.jikos.cz> Reply-To: dsterba@suse.cz Mail-Followup-To: dsterba@suse.cz, Desmond Cheong Zhi Xi , clm@fb.com, josef@toxicpanda.com, dsterba@suse.com, anand.jain@oracle.com, linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, skhan@linuxfoundation.org, gregkh@linuxfoundation.org, linux-kernel-mentees@lists.linuxfoundation.org, syzbot+a70e2ad0879f160b9217@syzkaller.appspotmail.com References: <20210727071303.113876-1-desmondcheongzx@gmail.com> <20210812103851.GC5047@twin.jikos.cz> <3c48eec9-590c-4974-4026-f74cafa5ac48@gmail.com> <20210812155032.GL5047@twin.jikos.cz> <1e0aafb2-9e55-5f64-d347-1765de0560c5@gmail.com> <20210813085137.GQ5047@twin.jikos.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1-rc1 (2014-03-12) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 13, 2021 at 05:57:26PM +0800, Desmond Cheong Zhi Xi wrote: > On 13/8/21 4:51 pm, David Sterba wrote: > > On Fri, Aug 13, 2021 at 01:31:25AM +0800, Desmond Cheong Zhi Xi wrote: > >> On 12/8/21 11:50 pm, David Sterba wrote: > >>> On Thu, Aug 12, 2021 at 11:43:16PM +0800, Desmond Cheong Zhi Xi wrote: > >>>> On 12/8/21 6:38 pm, David Sterba wrote: > >>>>> On Tue, Jul 27, 2021 at 03:13:03PM +0800, Desmond Cheong Zhi Xi wrote: > >>>>>> --- a/fs/btrfs/volumes.c > >>>>>> +++ b/fs/btrfs/volumes.c > >>>>>> @@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices, > >>>>>> if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) { > >>>>>> list_del_init(&device->dev_alloc_list); > >>>>>> clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state); > >>>>>> + fs_devices->rw_devices--; > >>>>>> } > >>>>>> list_del_init(&device->dev_list); > >>>>>> fs_devices->num_devices--; > >>>>> > >>>>> I've hit a crash on master branch with stacktrace very similar to one > >>>>> this bug was supposed to fix. It's a failed assertion on device close. > >>>>> This patch was the last one to touch it and it matches some of the > >>>>> keywords, namely the BTRFS_DEV_STATE_REPLACE_TGT bit that used to be in > >>>>> the original patch but was not reinstated in your fix. > >>>>> > >>>>> I'm not sure how reproducible it is, right now I have only one instance > >>>>> and am hunting another strange problem. They could be related. > >>>>> > >>>>> assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150 > >>>>> > >>>>> https://susepaste.org/view/raw/18223056 full log with other stacktraces, > >>>>> possibly relatedg > >>>>> > >>>> > >>>> Looking at the logs, it seems that a dev_replace was started, then > >>>> suspended. But it wasn't canceled or resumed before the fs devices were > >>>> closed. > >>>> > >>>> I'll investigate further, just throwing some observations out there. > >>> > >>> Thanks. I'm testing the patch revert, no crash after first loop, I'll > >>> run a few more to be sure as it's not entirely reliable. > >>> > >>> Sending the revert is option of last resort as we're approaching end of > >>> 5.14 dev cycle and the crash prevents testing (unlike the fuzzer > >>> warning). > >>> > >> > >> I might be missing something, so any thoughts would be appreciated. But > >> I don't think the assertion in btrfs_close_one_device is correct. > >> > >> From what I see, this crash happens when close_ctree is called while a > >> dev_replace hasn't completed. In close_ctree, we suspend the > >> dev_replace, but keep the replace target around so that we can resume > >> the dev_replace procedure when we mount the root again. This is the call > >> trace: > >> > >> close_ctree(): > >> btrfs_dev_replace_suspend_for_unmount(); > >> btrfs_close_devices(): > >> btrfs_close_fs_devices(): > >> btrfs_close_one_device(): > >> ASSERT(!test_bit(BTRFS_DEV_STATE_REPLACE_TGT, > >> &device->dev_state)); > >> > >> However, since the replace target sticks around, there is a device with > >> BTRFS_DEV_STATE_REPLACE_TGT set, and we fail the assertion in > >> btrfs_close_one_device. > >> > >> Two options I can think of: > >> > >> - We could remove the assertion. > >> > >> - Or we could clear the BTRFS_DEV_STATE_REPLACE_TGT bit in > >> btrfs_dev_replace_suspend_for_unmount. This is fine since the bit is set > >> again in btrfs_init_dev_replace if the dev_replace->replace_state is > >> BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED. But this approach strikes me as > >> a little odd because the device is still the replace target when > >> mounting in the future. > > > > The option #2 does not sound safe because the TGT bit is checked in > > several places where device list is queried for various reasons, even > > without a mounted filesystem. > > > > Removing the assertion makes more sense but I'm still not convinced that > > the this is expected/allowed state of a closed device. > > > > Would it be better if we cleared the REPLACE_TGT bit only when closing > the device where device->devid == BTRFS_DEV_REPLACE_DEVID? > > The first conditional in btrfs_close_one_device assumes that we can come > across such a device. If we come across it, we should properly reset it. > > If other devices has this bit set, the ASSERT will still catch it and > let us know something is wrong. That sounds great. > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 70f94b75f25a..a5afebb78ecf 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -1130,6 +1130,9 @@ static void btrfs_close_one_device(struct btrfs_device *device) > fs_devices->rw_devices--; > } > > + if (device->devid == BTRFS_DEV_REPLACE_DEVID) > + clear_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state); > + > if (test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state)) > fs_devices->missing_devices--; I'll do a few test rounds, thanks.