Received: by 2002:a05:6a10:c7c6:0:0:0:0 with SMTP id h6csp2342229pxy; Tue, 3 Aug 2021 04:13:47 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy2VCZosvfotoktXw8Itarep9gtEOvBhAKO53BSc2VaWHTHKwZbp/m4DxXiDzsZnGirBsFh X-Received: by 2002:a17:906:1ec9:: with SMTP id m9mr3133728ejj.115.1627989227333; Tue, 03 Aug 2021 04:13:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627989227; cv=none; d=google.com; s=arc-20160816; b=yztAjzkUWCNjTwqZESjcPwyHR/GyrtWFV8XhBO4LNXDDgZa09blEzcXPr2x8whpxBV QQ3bwOSfTNln/hwRj0qX6nJLNsYmwC6hh6qfPOP4Wk86Mm9S0kpzW4KAuykX9dP3u5np r8KMJV/eaghMMoss1n4JxVK3qtqvtyO0SMGsL467zqMZpjPG+jcJxM839uDWLGstbCLS uOgu5PqyXidcLk+f3pqhHnoY8Q46yL37lRDrkFOEf1IQZkmHj/9uXUewqZ1b6zjLAfzH eW4dikRKrbJ9jxbHW4sO03+DkiOvd2GKr2kQFx/9DYD6EBJpHTtwCaeA8OZb5ZsETBVv m3Yg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=0/aSr0nk5k51I7wLrDupMjILJzKWUgOJHZKg9sJRkJQ=; b=CYuvddS3A1GpUw/T1pfw9PVo1/UYxq5ZcqCX09Gq0ka+ifldpci6GgPOPMI57DbvAf 9MLEAqEAnOaL6S8r+kwb/dPoAE2k1wKQOxr1RyUU4tgSDccz5/h8s3U/gV+nz5Z2WvTw OZXoc87em7XmrZ2ISyBjLA0ShK12ajjK5My+uD9fYmwWwm4iEknyzXI1kV/2SHk8EfMT za5rGgL4zuAoi9laHF8HIh1RaDu5RKNiuokIvCRisQqAL23DMob0lmxHGAzoaUqCTCtv xwU55C86oMZfL5rICfEf8zn+1qaNkvgVw9PPnfnWBThVXefbPTCr4/+fnPxx0IlGC/vX v3Dg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o17si12411496eja.545.2021.08.03.04.13.11; Tue, 03 Aug 2021 04:13:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235306AbhHCLNU (ORCPT + 99 others); Tue, 3 Aug 2021 07:13:20 -0400 Received: from szxga02-in.huawei.com ([45.249.212.188]:7918 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234156AbhHCLNU (ORCPT ); Tue, 3 Aug 2021 07:13:20 -0400 Received: from dggemv703-chm.china.huawei.com (unknown [172.30.72.57]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4GfBvx4zjcz835C; Tue, 3 Aug 2021 19:09:17 +0800 (CST) Received: from dggema766-chm.china.huawei.com (10.1.198.208) by dggemv703-chm.china.huawei.com (10.3.19.46) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2176.2; Tue, 3 Aug 2021 19:13:07 +0800 Received: from [10.174.177.210] (10.174.177.210) by dggema766-chm.china.huawei.com (10.1.198.208) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2176.2; Tue, 3 Aug 2021 19:13:07 +0800 Subject: Re: [PATCH] ext4: flush s_error_work before journal destroy in ext4_fill_super To: Jan Kara CC: Theodore Ts'o , , References: <20210726132613.GC20621@quack2.suse.cz> From: yangerkun Message-ID: <897e53a5-8fc5-1c0a-4734-60ab4fb0fc65@huawei.com> Date: Tue, 3 Aug 2021 19:13:07 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.2 MIME-Version: 1.0 In-Reply-To: <20210726132613.GC20621@quack2.suse.cz> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.210] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggema766-chm.china.huawei.com (10.1.198.208) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org 在 2021/7/26 21:26, Jan Kara 写道: > On Mon 26-07-21 15:13:34, yangerkun wrote: >> >> >> 在 2021/7/24 3:11, Theodore Ts'o 写道: >>> On Fri, Jul 23, 2021 at 09:11:08PM +0800, yangerkun wrote: >>>> >>>> For example, before wo goto failed_mount_wq, we may meet some error and will >>>> goto ext4_handle_error which can call >>>> schedule_work(&EXT4_SB(sb)->s_error_work). So the work may start concurrent >>>> with ext4_fill_super goto failed_mount_wq. There does not have any lock to >>>> protect the concurrent read and modifies for sbi->s_journal. >>> >>> Yes, and I'm asking *how* is this actually happening in practice? >>> I've been going through the code paths and I don't see any place where >>> ext4_error*() would be called. That's why I wanted to see your test >>> case which was reproducing it. (Not just where you added the msleep, >>> but how the error was getting triggered in the first place.) >> >> Hi Ted, >> >> >> The problem only happened once early with parallel ltp testcase(but we >> cannot reproduce it again with same case...). And dmesg with latter: >> >> >> [32031.739678] EXT4-fs error (device loop66): ext4_fill_super:4672: comm >> chdir01: inode #2: comm chdir01: iget: illegal inode # >> [32031.740193] EXT4-fs (loop66): get root inode failed >> [32031.740484] EXT4-fs (loop66): mount failed > > Oh, OK. I guess s_inodes_count was <= 1 which made a check in __ext4_iget() > trip. That is theoretically possible if s_inodes_per_block == 1, > s_inodes_per_group == 1 and s_groups_count == 1. I guess ext4_fill_super() > needs to check s_inodes_count is large enough at least for all reserved > inodes to fit. But it's still unclear to me how we could have succeeded in > loading the journal which apparently has inode number 8 from the error > message below. > > In parallel LTP checks I've sometimes noticed strange errors in the past > that looked very much like filesystem being modified through the block > device while being in use by the kernel (maybe some bug in the test > framework). And this is something we just don't support and the kernel can > crash in this case. > > In either case I agree it makes sense to make sure error work cannot race > with journal destruction either by flushing it before destroying the > journal or some other means... Hi Ted, Should we apply this patch first, or do you have some better solution... Thanks, Kun. > > Honza > >> [32031.758811] EXT4-fs error (device loop66): ext4_map_blocks:595: inode #8: >> block 532: comm chdir01: lblock 1 mapped to illegal pblock 532 (length 1) >> [32031.759293] jbd2_journal_bmap: journal block not found at offset 1 on >> loop66-8 >> [32031.759805] ------------[ cut here ]------------ >> [32031.759807] kernel BUG at fs/jbd2/transaction.c:373! >> >> >> ext4_fill_super >> ext4_load_journal >> EXT4_SB(sb)->s_journal = journal >> root = ext4_iget(sb, EXT4_ROOT_INO, EXT4_IGET_SPECIAL) >> // will failed and goto failed_mount4 >> __ext4_iget >> __ext4_error >> ext4_handle_error >> schedule_work(&EXT4_SB(sb)->s_error_work) >> >> >> And this trigger the concurrent read and modifies for sbi->s_journal... >> >> Thanks, >> Kun. >> >> >>> >>> >>> On Fri, Jul 23, 2021 at 09:25:12PM +0800, yangerkun wrote: >>>> >>>>> Can you share with me your test case? Your patch will result in the >>>>> shrinker potentially not getting released in some error paths (which >>>>> will cause other kernel panics), and in any case, once the journal is >>>> >>>> The only logic we have changed is that we move the flush_work before we call >>>> jbd2_journal_destory. I have not seen the problem you describe... Can you >>>> help to explain more... >>> >>> Sorry, I was mistaken. I thought you were moving the >>> ext4_es_unregister_shrinker() and flush_work() before the label for >>> failed_mount_wq; that was a misreading of your patch. >>> >>> The other way we could fix this might be something like this: >>> >>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c >>> index dfa09a277b56..d663d11fa0de 100644 >>> --- a/fs/ext4/super.c >>> +++ b/fs/ext4/super.c >>> @@ -693,7 +693,7 @@ static void flush_stashed_error_work(struct work_struct *work) >>> { >>> struct ext4_sb_info *sbi = container_of(work, struct ext4_sb_info, >>> s_error_work); >>> - journal_t *journal = sbi->s_journal; >>> + journal_t *journal = READ_ONCE(sbi->s_journal); >>> handle_t *handle; >>> /* >>> @@ -1184,9 +1184,11 @@ static void ext4_put_super(struct super_block *sb) >>> ext4_unregister_sysfs(sb); >>> if (sbi->s_journal) { >>> - aborted = is_journal_aborted(sbi->s_journal); >>> - err = jbd2_journal_destroy(sbi->s_journal); >>> - sbi->s_journal = NULL; >>> + journal_t *journal = sbi->s_journal; >>> + >>> + WRITE_ONCE(sbi->s_journal, NULL); >>> + aborted = is_journal_aborted(journal); >>> + err = jbd2_journal_destroy(journal); >>> if ((err < 0) && !aborted) { >>> ext4_abort(sb, -err, "Couldn't clean up the journal"); >>> } >>> @@ -5175,8 +5177,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) >>> sbi->s_ea_block_cache = NULL; >>> if (sbi->s_journal) { >>> - jbd2_journal_destroy(sbi->s_journal); >>> - sbi->s_journal = NULL; >>> + journal_t *journal = sbi->s_journal; >>> + >>> + WRITE_ONCE(sbi->s_journal, NULL); >>> + jbd2_journal_destroy(journal); >>> } >>> failed_mount3a: >>> ext4_es_unregister_shrinker(sbi); >>> @@ -5487,7 +5491,7 @@ static int ext4_load_journal(struct super_block *sb, >>> EXT4_SB(sb)->s_journal = journal; >>> err = ext4_clear_journal_err(sb, es); >>> if (err) { >>> - EXT4_SB(sb)->s_journal = NULL; >>> + WRITE_ONCE(EXT4_SB(sb)->s_journal, NULL); >>> jbd2_journal_destroy(journal); >>> return err; >>> } >>> >>> ... and here's another possible fix: >>> >>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c >>> index dfa09a277b56..e9e122e52ce8 100644 >>> --- a/fs/ext4/super.c >>> +++ b/fs/ext4/super.c >>> @@ -704,7 +704,8 @@ static void flush_stashed_error_work(struct work_struct *work) >>> * We use directly jbd2 functions here to avoid recursing back into >>> * ext4 error handling code during handling of previous errors. >>> */ >>> - if (!sb_rdonly(sbi->s_sb) && journal) { >>> + if (!sb_rdonly(sbi->s_sb) && journal && >>> + !(journal->j_flags & JBD2_UNMOUNT)) { >>> struct buffer_head *sbh = sbi->s_sbh; >>> handle = jbd2_journal_start(journal, 1); >>> if (IS_ERR(handle)) >>> >>> >>> >>> But I would be interested in understanding how we could be triggering >>> this problem in the first place before deciding what's the best fix. >>> >>> Cheers, >>> >>> - Ted >>> . >>>