Received: by 2002:a05:6a10:6d25:0:0:0:0 with SMTP id gq37csp1544258pxb; Sun, 12 Sep 2021 23:22:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxwyd9zqEbclriizb9kLAahw2qFw+Pin1daq8PVbr9SQCyVTRK61W1RvibU5oeebxJZSUon X-Received: by 2002:a6b:611a:: with SMTP id v26mr7870518iob.93.1631514174641; Sun, 12 Sep 2021 23:22:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631514174; cv=none; d=google.com; s=arc-20160816; b=xzPkUiTyYBWF8dGXcWlrEblIhOrQVL05d4HHdt05nQ7bcaJZgkxmsYvDM9avVQhiql SRcN/ZmmAw8L81Itsg8pW/jLGr995+m8V0lCAGxuzDCKST6tm0U5YUBW5d0Ncg1WlppK gEL5xUItibPsNG2NuFNj0nc/G2FN9W7Pg92GqwCBPMXMWXKpCW7uWehrj7LXYcOkjVef WhKX4dZSyjxYoTmDUO1Qd76CUoB5r+OYeUawShJiHXwJSFgQNZHnayGNxeU8G55yrzeL ubtSg8slPlukxTS/4l/jWlwylcmR8pZvGGhWqw06P4BdfEE7v1/XoxZkHWfdXC+ehNkZ zfWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=rAvbQZDh0rA/jdFvov963HhcAT5O3kcEu2jd8eApkDg=; b=F0SfWQYJdtJklyp7GBWvMxdw1uvFDadYe5/freFRz56Ro2ptMAZzbYqdqXfrayKy3N w7ajsnrBlRqbAvU65ghRoAANFc1fl1IJtUuYzUpWT7dyf7bO+gtoKh7oMBmkJ7GqlvxI nPV4Iwy8aD8++QVFsUcQbVeTZ1qXljZA13hdCDn4h+4GqjAYAx3yDKnKsQN0nrg6zCI5 zF7SeXFUlHmptIPwgG5ZTTuhurPm4l3k7F51pnWPdXLoPaXlQCrUPYKsChFtN7IE1Vw7 qfp57AmCZelWn0IlBdR+f76JHdPvhcfZ778p0gSEEoVCUsYgLp2yXbmHHIDb9A9tymp5 2x1w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n20si6009711iod.5.2021.09.12.23.22.34; Sun, 12 Sep 2021 23:22:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236244AbhIMGXq (ORCPT + 99 others); Mon, 13 Sep 2021 02:23:46 -0400 Received: from szxga08-in.huawei.com ([45.249.212.255]:16193 "EHLO szxga08-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230003AbhIMGXq (ORCPT ); Mon, 13 Sep 2021 02:23:46 -0400 Received: from dggemv711-chm.china.huawei.com (unknown [172.30.72.56]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4H7GZy5xhLz1DGvv; Mon, 13 Sep 2021 14:21:30 +0800 (CST) Received: from dggema766-chm.china.huawei.com (10.1.198.208) by dggemv711-chm.china.huawei.com (10.1.198.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2308.8; Mon, 13 Sep 2021 14:22:28 +0800 Received: from [10.174.177.210] (10.174.177.210) by dggema766-chm.china.huawei.com (10.1.198.208) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2308.8; Mon, 13 Sep 2021 14:22:27 +0800 Subject: Re: [PATCH v2] ext4: flush s_error_work before journal destroy in ext4_fill_super To: , CC: , References: <20210831120449.2910005-1-yangerkun@huawei.com> From: yangerkun Message-ID: <7843daee-6de9-c684-9d11-1b4e1a90fbec@huawei.com> Date: Mon, 13 Sep 2021 14:22:27 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.2 MIME-Version: 1.0 In-Reply-To: <20210831120449.2910005-1-yangerkun@huawei.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.210] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggema766-chm.china.huawei.com (10.1.198.208) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org ping... 在 2021/8/31 20:04, yangerkun 写道: > The error path in ext4_fill_super forget to flush s_error_work before > journal destroy, and it may trigger the follow bug since > flush_stashed_error_work can run concurrently with journal destroy > without any protection for sbi->s_journal. > > [32031.740193] EXT4-fs (loop66): get root inode failed > [32031.740484] EXT4-fs (loop66): mount failed > [32031.759805] ------------[ cut here ]------------ > [32031.759807] kernel BUG at fs/jbd2/transaction.c:373! > [32031.760075] invalid opcode: 0000 [#1] SMP PTI > [32031.760336] CPU: 5 PID: 1029268 Comm: kworker/5:1 Kdump: loaded > 4.18.0 > [32031.765112] Call Trace: > [32031.765375] ? __switch_to_asm+0x35/0x70 > [32031.765635] ? __switch_to_asm+0x41/0x70 > [32031.765893] ? __switch_to_asm+0x35/0x70 > [32031.766148] ? __switch_to_asm+0x41/0x70 > [32031.766405] ? _cond_resched+0x15/0x40 > [32031.766665] jbd2__journal_start+0xf1/0x1f0 [jbd2] > [32031.766934] jbd2_journal_start+0x19/0x20 [jbd2] > [32031.767218] flush_stashed_error_work+0x30/0x90 [ext4] > [32031.767487] process_one_work+0x195/0x390 > [32031.767747] worker_thread+0x30/0x390 > [32031.768007] ? process_one_work+0x390/0x390 > [32031.768265] kthread+0x10d/0x130 > [32031.768521] ? kthread_flush_work_fn+0x10/0x10 > [32031.768778] ret_from_fork+0x35/0x40 > > static int start_this_handle(...) > BUG_ON(journal->j_flags & JBD2_UNMOUNT); <---- Trigger this > > Besides, after we enable fast commit, ext4_fc_replay can add work to > s_error_work but return success, so the latter journal destroy in > ext4_load_journal can trigger this problem too. > > Fix this problem with two steps: > 1. Call ext4_commit_super directly in ext4_handle_error for the case > that called from ext4_fc_replay > 2. Since it's hard to pair the init and flush for s_error_work, we'd > better add a extras flush_work before journal destroy in > ext4_fill_super > > Fixes: c92dc856848f ("ext4: defer saving error info from atomic context") > Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available") > Signed-off-by: yangerkun > --- > fs/ext4/super.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index d6df62fc810c..06b5ad34d892 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -659,7 +659,7 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro, int error, > * constraints, it may not be safe to do it right here so we > * defer superblock flushing to a workqueue. > */ > - if (continue_fs) > + if (continue_fs && journal) > schedule_work(&EXT4_SB(sb)->s_error_work); > else > ext4_commit_super(sb); > @@ -5172,12 +5172,15 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) > sbi->s_ea_block_cache = NULL; > > if (sbi->s_journal) { > + /* flush s_error_work before journal destroy. */ > + flush_work(&sbi->s_error_work); > jbd2_journal_destroy(sbi->s_journal); > sbi->s_journal = NULL; > } > failed_mount3a: > ext4_es_unregister_shrinker(sbi); > failed_mount3: > + /* flush s_error_work before sbi destroy */ > flush_work(&sbi->s_error_work); > del_timer_sync(&sbi->s_err_report); > ext4_stop_mmpd(sbi); >