Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1892435pxb; Fri, 5 Mar 2021 02:11:24 -0800 (PST) X-Google-Smtp-Source: ABdhPJwMHXTc9CBBufiPD2YIa5l8J6/P8tYDxXCAtzOJRHoCxDxNgCWF29UEXFF3CWsZqEwGAABt X-Received: by 2002:a17:907:77d4:: with SMTP id kz20mr1598514ejc.93.1614939084621; Fri, 05 Mar 2021 02:11:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614939084; cv=none; d=google.com; s=arc-20160816; b=jEjUxoCEm9KZd85xPaQ82SoBB2h43s1te5n6FRUp0mWH7oPpyyOD3kW1GTIJTikuMy ilkO4yG3xO17u9z2e2OgedULXDXZHUIGnhVl7raKzpYpA2ruliKTnIrcux3+/+CpUXOn O5eVOBZxWLAfxjsxp/8fzdwx30BcrqkvL7U8TnBg6t2z7MqTG1mPrEFt/wAwVEl4aTFh 3g79ubi59T3XOXZk/5U99nBxgUL+HvRb7kDknbwegdYcTSnC3GR5+7rbl6rFqt/u6k4+ w2gkYxDO2XGFLzW0BvY0tH/z8mvSazk72yqnPv03sE41z79Ies/tR5xeO1bXNmM8bbTP aJDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=5XEWj/QGA0CL+0reavIq/yuBtkPvkMsD05b90FgAMPM=; b=szfnOiFPpy8IryyuowvGSuLBH0ninovrGkIDCxFTzTS2gP3mqr9d6as0jA9RZ9Dl/3 /x5IA5JUmA0/dBo4BnBJ3pSrYev8QxGxIF/M935cfPePIaLreJ+BoMwgwBrzhK/BPi0A N7oT5MiB3N04rFZusYjukkWqW1eYbRxKuOSmpF/5xyeusv/W7Dn0zXV9x124cnOD4H+B Gp8aoHMm3tAAkZdL1ooyfj8ly9wg/NNtka6yHW8I8Rsln+UlgQQWUQWFortJtIZL86DS v2gAHOJXTSFl+4lnJOGjSCnWQX1QZPFAfLVGEdaLBf45d1+qJgknA7pjCLLD1+PVfHTL YMAQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l3si1161910ejd.83.2021.03.05.02.10.58; Fri, 05 Mar 2021 02:11:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229976AbhCEKK0 (ORCPT + 99 others); Fri, 5 Mar 2021 05:10:26 -0500 Received: from mx2.suse.de ([195.135.220.15]:54096 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229948AbhCEKKR (ORCPT ); Fri, 5 Mar 2021 05:10:17 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 89BB0AD2B; Fri, 5 Mar 2021 10:10:16 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id D49211F2B2F; Fri, 5 Mar 2021 11:10:05 +0100 (CET) Date: Fri, 5 Mar 2021 11:10:05 +0100 From: Jan Kara To: "zhangyi (F)" Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, tytso@mit.edu, viro@zeniv.linux.org.uk, linfeilong@huawei.com, Ye Bin Subject: Re: [PATCH] block_dump: don't put the last refcount when marking inode dirty Message-ID: <20210305101005.GA14142@quack2.suse.cz> References: <20210226103103.3048803-1-yi.zhang@huawei.com> <20210301112102.GD25026@quack2.suse.cz> <5f72dc70-9fb0-0d3b-dc31-f60d35929991@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5f72dc70-9fb0-0d3b-dc31-f60d35929991@huawei.com> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Thu 04-03-21 21:37:42, zhangyi (F) wrote: > On 2021/3/1 19:21, Jan Kara wrote: > > On Fri 26-02-21 18:31:03, zhangyi (F) wrote: > >> There is an AA deadlock problem when using block_dump on ext4 file > >> system with data=journal mode. > >> > >> watchdog: BUG: soft lockup - CPU#19 stuck for 22s! [jbd2/pmem0-8:1002] > >> CPU: 19 PID: 1002 Comm: jbd2/pmem0-8 > >> RIP: 0010:queued_spin_lock_slowpath+0x60/0x3b0 > >> ... > >> Call Trace: > >> _raw_spin_lock+0x57/0x70 > >> jbd2_journal_invalidatepage+0x166/0x680 > >> __ext4_journalled_invalidatepage+0x8c/0x120 > >> ext4_journalled_invalidatepage+0x12/0x40 > >> truncate_cleanup_page+0x10e/0x1c0 > >> truncate_inode_pages_range+0x2c8/0xec0 > >> truncate_inode_pages_final+0x41/0x90 > >> ext4_evict_inode+0x254/0xac0 > >> evict+0x11c/0x2f0 > >> iput+0x20e/0x3a0 > >> dentry_unlink_inode+0x1bf/0x1d0 > >> __dentry_kill+0x14c/0x2c0 > >> dput+0x2bc/0x630 > >> block_dump___mark_inode_dirty.cold+0x5c/0x111 > >> __mark_inode_dirty+0x678/0x6b0 > >> mark_buffer_dirty+0x16e/0x1d0 > >> __jbd2_journal_temp_unlink_buffer+0x127/0x1f0 > >> __jbd2_journal_unfile_buffer+0x24/0x80 > >> __jbd2_journal_refile_buffer+0x12f/0x1b0 > >> jbd2_journal_commit_transaction+0x244b/0x3030 > >> > >> The problem is a race between jbd2 committing data buffer and user > >> unlink the file concurrently. The jbd2 will get jh->b_state_lock and > >> redirty the inode's data buffer and inode itself. If block_dump is > >> enabled, it will try to find inode's dentry and invoke the last dput() > >> after the inode was unlinked. Then the evict procedure will unmap > >> buffer and get jh->b_state_lock again in journal_unmap_buffer(), and > >> finally lead to deadlock. It works fine if block_dump is not enabled > >> because the last evict procedure is not invoked in jbd2 progress and > >> the jh->b_state_lock will also prevent inode use after free. > >> > >> jbd2 xxx > >> vfs_unlink > >> ext4_unlink > >> jbd2_journal_commit_transaction > >> **get jh->b_state_lock** > >> jbd2_journal_refile_buffer > >> mark_buffer_dirty > >> __mark_inode_dirty > >> block_dump___mark_inode_dirty > >> d_find_alias > >> d_delete > >> unhash > >> dput //put the last refcount > >> evict > >> journal_unmap_buffer > >> **get jh->b_state_lock again** > >> > >> In most cases of where invoking mark_inode_dirty() will get inode's > >> refcount and the last iput may not happen, but it's not safe. After > >> checking the block_dump code, it only want to dump the file name of the > >> dirty inode, so there is no need to get and put denrty, and dump an > >> unhashed dentry is also fine. This patch remove the dget() && dput(), > >> print the dentry name directly. > >> > >> Signed-off-by: zhangyi (F) > >> Signed-off-by: yebin (H) > > > > Hrm, ok. Honestly, I wanted to just delete that code for a long time. IMO > > tracepoints (and we have one in __mark_inode_dirty) are much more useful > > for tracing anyway. This code exists only because it was there much before > > tracepoints existed... Do you have a strong reason why are you using > > block_dump instead of tracepoint trace_writeback_mark_inode_dirty() for > > your monitoring? > > > > Hi, Jan. We just do some stress tests and find this issue, I'm not sure who > are still using this old debug interface and gather it may need time. Could > we firstly fix this issue, and then delete this code if no opposed? I'd do it the other way around :) Delete the code and only fix it if someone complains that the feature is still used and so we should not delete it. Will you send a patch or should I do it? Honza -- Jan Kara SUSE Labs, CR