Received: by 2002:a05:6520:3645:b029:c0:f950:43e0 with SMTP id l5csp6296293lki; Thu, 4 Mar 2021 09:29:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJweE4USkunZZS2yGACorSaerUPi/Wwci0lao0hv9Sxi/BG+hbPIpLa6bB/1D6EI8usI1lD6 X-Received: by 2002:a17:906:1e4f:: with SMTP id i15mr5331069ejj.349.1614878946877; Thu, 04 Mar 2021 09:29:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614878946; cv=none; d=google.com; s=arc-20160816; b=y6VuVp7B1wgUdpYxR/XUA/B528RshhzeAKxLwZlubVk0fkzJRzqUcMNjgowRUzgxcg Z/PlvVJHBAz3/RY9NoXBl5LWfU1fs9ek0ntqAjTtXCLlA0F/vGTKi7Q9WJckiNwTkpLQ IcBqE4AnoXlwG2g7f1RR1hNTW1y0x2WJCJcjaCElV3cJobE52FMnjUPYc90/iH5/EBSr Q3+b8x63iVbmI3SXd4Fhi4GPe5prh1ROk5ywMmyFH+uFY5SF/lqqR2uXk3dad3ISxYKb jEczajIJzsErvL1JD4L/Bi+P25HVokm5ch7agvl58nVI7vFauxweDJfHlrZ/0M/ILi+h nrfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=BuVOAGL2ilT7C3X8btVLErtBf6ys9GZ9b1JlEUlDG8o=; b=sDRiwt1gBlLQGjs7lAInWMkyRny75RZw+84yte1svyIbhWpfCTdWEWjuPO4CRtvjiS DFBCc78sKpoG0lp8guDVpezlSrPFaU5nVbHPDiAIhmQvt217fumHjWYtulZjDxrBFHy4 mUnCqNxTiAn2NoBEcawGezuC3knqil5s8+MSMRTFRfA4UOl/113wo63TACurgRMMeXOz DFI/CwtYqqNEClAirHDVT1nF2oQnlpre71JscgcelUfemMReW6A0hkJPVuz/X2c3IisQ BgOjGeZ7zabctsNZYSBGbSK6zqIoqyPOXIi7xR/hDU5dDcJNanQbEB/N7icIQ8scZDkw nl1A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a10si9897568ejr.227.2021.03.04.09.28.42; Thu, 04 Mar 2021 09:29:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241321AbhCDNi7 (ORCPT + 99 others); Thu, 4 Mar 2021 08:38:59 -0500 Received: from szxga06-in.huawei.com ([45.249.212.32]:13432 "EHLO szxga06-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241065AbhCDNid (ORCPT ); Thu, 4 Mar 2021 08:38:33 -0500 Received: from DGGEMS401-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4DrsMt2L4JzjVKJ; Thu, 4 Mar 2021 21:36:26 +0800 (CST) Received: from [10.174.176.202] (10.174.176.202) by DGGEMS401-HUB.china.huawei.com (10.3.19.201) with Microsoft SMTP Server id 14.3.498.0; Thu, 4 Mar 2021 21:37:43 +0800 Subject: Re: [PATCH] block_dump: don't put the last refcount when marking inode dirty To: Jan Kara CC: , , , , , Ye Bin References: <20210226103103.3048803-1-yi.zhang@huawei.com> <20210301112102.GD25026@quack2.suse.cz> From: "zhangyi (F)" Message-ID: <5f72dc70-9fb0-0d3b-dc31-f60d35929991@huawei.com> Date: Thu, 4 Mar 2021 21:37:42 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.3.1 MIME-Version: 1.0 In-Reply-To: <20210301112102.GD25026@quack2.suse.cz> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.176.202] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On 2021/3/1 19:21, Jan Kara wrote: > On Fri 26-02-21 18:31:03, zhangyi (F) wrote: >> There is an AA deadlock problem when using block_dump on ext4 file >> system with data=journal mode. >> >> watchdog: BUG: soft lockup - CPU#19 stuck for 22s! [jbd2/pmem0-8:1002] >> CPU: 19 PID: 1002 Comm: jbd2/pmem0-8 >> RIP: 0010:queued_spin_lock_slowpath+0x60/0x3b0 >> ... >> Call Trace: >> _raw_spin_lock+0x57/0x70 >> jbd2_journal_invalidatepage+0x166/0x680 >> __ext4_journalled_invalidatepage+0x8c/0x120 >> ext4_journalled_invalidatepage+0x12/0x40 >> truncate_cleanup_page+0x10e/0x1c0 >> truncate_inode_pages_range+0x2c8/0xec0 >> truncate_inode_pages_final+0x41/0x90 >> ext4_evict_inode+0x254/0xac0 >> evict+0x11c/0x2f0 >> iput+0x20e/0x3a0 >> dentry_unlink_inode+0x1bf/0x1d0 >> __dentry_kill+0x14c/0x2c0 >> dput+0x2bc/0x630 >> block_dump___mark_inode_dirty.cold+0x5c/0x111 >> __mark_inode_dirty+0x678/0x6b0 >> mark_buffer_dirty+0x16e/0x1d0 >> __jbd2_journal_temp_unlink_buffer+0x127/0x1f0 >> __jbd2_journal_unfile_buffer+0x24/0x80 >> __jbd2_journal_refile_buffer+0x12f/0x1b0 >> jbd2_journal_commit_transaction+0x244b/0x3030 >> >> The problem is a race between jbd2 committing data buffer and user >> unlink the file concurrently. The jbd2 will get jh->b_state_lock and >> redirty the inode's data buffer and inode itself. If block_dump is >> enabled, it will try to find inode's dentry and invoke the last dput() >> after the inode was unlinked. Then the evict procedure will unmap >> buffer and get jh->b_state_lock again in journal_unmap_buffer(), and >> finally lead to deadlock. It works fine if block_dump is not enabled >> because the last evict procedure is not invoked in jbd2 progress and >> the jh->b_state_lock will also prevent inode use after free. >> >> jbd2 xxx >> vfs_unlink >> ext4_unlink >> jbd2_journal_commit_transaction >> **get jh->b_state_lock** >> jbd2_journal_refile_buffer >> mark_buffer_dirty >> __mark_inode_dirty >> block_dump___mark_inode_dirty >> d_find_alias >> d_delete >> unhash >> dput //put the last refcount >> evict >> journal_unmap_buffer >> **get jh->b_state_lock again** >> >> In most cases of where invoking mark_inode_dirty() will get inode's >> refcount and the last iput may not happen, but it's not safe. After >> checking the block_dump code, it only want to dump the file name of the >> dirty inode, so there is no need to get and put denrty, and dump an >> unhashed dentry is also fine. This patch remove the dget() && dput(), >> print the dentry name directly. >> >> Signed-off-by: zhangyi (F) >> Signed-off-by: yebin (H) > > Hrm, ok. Honestly, I wanted to just delete that code for a long time. IMO > tracepoints (and we have one in __mark_inode_dirty) are much more useful > for tracing anyway. This code exists only because it was there much before > tracepoints existed... Do you have a strong reason why are you using > block_dump instead of tracepoint trace_writeback_mark_inode_dirty() for > your monitoring? > Hi, Jan. We just do some stress tests and find this issue, I'm not sure who are still using this old debug interface and gather it may need time. Could we firstly fix this issue, and then delete this code if no opposed? Thanks, Yi.