Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp8679564rwd; Tue, 20 Jun 2023 19:33:13 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6BoK/HMzHiLcRfMxu+p5dkSylj2EvWwXk0YGu65jYmw8uslwol0Mvfd52sooBa3ibvboN1 X-Received: by 2002:a17:902:db04:b0:1b6:4bc2:74bc with SMTP id m4-20020a170902db0400b001b64bc274bcmr10346446plx.2.1687314793285; Tue, 20 Jun 2023 19:33:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687314793; cv=none; d=google.com; s=arc-20160816; b=1LLy7b2J2MPYW3kLuZQYtFUyD7IktYitDepYIW1XLBV8wA6LDjsmcoW1Br4/ODS4GG C3c19HRxb6Qdw1H3DHFJeBFHXUoDX8cb+LVFEOAdwcAl5wo7MxHCYH4PXCZMKii8ic0w KkcyZ/ZzObxBv7o6xTj5QgkJj2OxIGkku4gBw4fc6V6u8IXNwHRaQhRoXkUJnZ33sBud oOYZD0uBHbrTrCZfDws79OO/UNfq30qW8E67OsLfE8ay02s9JtRDTv5VbxgknJMEpnUj h7PBHuKPF95hyMSeRHoJHNp+plLQN1eZX1MEnS3EkrBfF2kMuKva3cmxm4NPR+c8yyCq l+3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=BUA1WJ7dWtk2Gbt94BqMozu7qbaxnmAM3ahCD5Ye9D4=; b=pMFBLgfCFU27ZznB++NJ0lE76bb/ahHlQZURYqGDz7BMJVBrS2aamBsW/C5bmxsNuO BT9cRt0p46BNt+k++64z5/xAh92VRo37D8OsoC4vn5yPkUAP+2KEoYcnRYbzh79C5lWI v9nevQHJYkTu0TBTKUcsY/jJzTVVT8Br9hB8lYCXn6Z5wBqUAE75XS+YFAqjVucReSfT d3Ng/NUw6YHRF0vLLC72xV5bZCLtzGsVEw49q0+gjkNpPDDqDC93Tbo6E/PdYP2ZeVIP FO68igvjf0fVqErnFxm25D4G45D8F26OAJH+QUvvQxlY0ATfqWKA7hrOQH7AwjywSd6z tfwg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e25-20020a637459000000b00553810ea8e5si1615145pgn.303.2023.06.20.19.32.58; Tue, 20 Jun 2023 19:33:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229486AbjFUCZS (ORCPT + 99 others); Tue, 20 Jun 2023 22:25:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229490AbjFUCZR (ORCPT ); Tue, 20 Jun 2023 22:25:17 -0400 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D7B05186; Tue, 20 Jun 2023 19:25:14 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4Qm6m706X7z4f3l93; Wed, 21 Jun 2023 10:25:11 +0800 (CST) Received: from [10.174.176.73] (unknown [10.174.176.73]) by APP4 (Coremail) with SMTP id gCh0CgAHuKuDX5JkyASxMA--.25370S3; Wed, 21 Jun 2023 10:25:08 +0800 (CST) Subject: Re: [PATCH v3] md: fix duplicate filename for rdev To: Song Liu , Yu Kuai Cc: pmenzel@molgen.mpg.de, akpm@linux-foundation.org, neilb@suse.de, linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, yi.zhang@huawei.com, yangerkun@huawei.com, "yukuai (C)" References: <20230523012727.3042247-1-yukuai1@huaweicloud.com> From: Yu Kuai Message-ID: <85c760bb-cb44-ed27-2ad3-f0ee2923226e@huaweicloud.com> Date: Wed, 21 Jun 2023 10:25:06 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID: gCh0CgAHuKuDX5JkyASxMA--.25370S3 X-Coremail-Antispam: 1UD129KBjvJXoWfGr1rWFy8uF1kKryrZFW8JFb_yoWDAF45pr WSqas7Ar4UXrykXr48t3srWryrtayqqrZrGr93Gr10kF4ruryDZry5Kr10gFy5GrykAa1U JayjyrW7Xa4DtrUanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUU9Y14x267AKxVW8JVW5JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4U JVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gc CE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E 2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJV W8JwACjcxG0xvEwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2ka 0xkIwI1lc7I2V7IY0VAS07AlzVAYIcxG8wCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7x kEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E 67AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCw CI42IY6xIIjxv20xvEc7CjxVAFwI0_Jr0_Gr1lIxAIcVCF04k26cxKx2IYs7xG6rW3Jr0E 3s1lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVWUJVW8JbIYCT nIWIevJa73UjIFyTuYvjfUoOJ5UUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,KHOP_HELO_FCRDNS, MAY_BE_FORGED,NICE_REPLY_A,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, 在 2023/06/21 5:38, Song Liu 写道: > On Tue, May 23, 2023 at 6:33 PM Yu Kuai wrote: >> >> Hi, >> >> 在 2023/05/24 2:05, Song Liu 写道: >>> On Mon, May 22, 2023 at 6:30 PM Yu Kuai wrote: >>>> >>>> From: Yu Kuai >>>> >>>> Commit 5792a2856a63 ("[PATCH] md: avoid a deadlock when removing a device >>>> from an md array via sysfs") delays the deletion of rdev, however, this >>>> introduces a window that rdev can be added again while the deletion is >>>> not done yet, and sysfs will complain about duplicate filename. >>>> >>>> Follow up patches try to fix this problem by flushing workqueue, however, >>>> flush_rdev_wq() is just dead code, the progress in >>>> md_kick_rdev_from_array(): >>>> >>>> 1) list_del_rcu(&rdev->same_set); >>>> 2) synchronize_rcu(); >>>> 3) queue_work(md_rdev_misc_wq, &rdev->del_work); >>>> >>>> So in flush_rdev_wq(), if rdev is found in the list, work_pending() can >>>> never pass, in the meantime, if work is queued, then rdev can never be >>>> found in the list. >>>> >>>> flush_rdev_wq() can be replaced by flush_workqueue() directly, however, >>>> this approach is not good: >>>> - the workqueue is global, this synchronization for all raid disks is >>>> not necessary. >>>> - flush_workqueue can't be called under 'reconfig_mutex', there is still >>>> a small window between flush_workqueue() and mddev_lock() that other >>>> contexts can queue new work, hence the problem is not solved completely. >>>> >>>> sysfs already has apis to support delete itself through writer, and >>>> these apis, specifically sysfs_break/unbreak_active_protection(), is used >>>> to support deleting rdev synchronously. Therefore, the above commit can be >>>> reverted, and sysfs duplicate filename can be avoided. >>>> >>>> A new mdadm regression test is proposed as well([1]). >>>> >>>> [1] https://lore.kernel.org/linux-raid/20230428062845.1975462-1-yukuai1@huaweicloud.com/ >>>> Fixes: 5792a2856a63 ("[PATCH] md: avoid a deadlock when removing a device from an md array via sysfs") >>>> Signed-off-by: Yu Kuai >>> >>> Thanks for the fix! I made the following changes and applied it >>> to md-next: >>> >>> 1. remove md_rdev->del_work, which is not used any more; >>> 2. change list_empty_safe to list_empty protected by the mutex, as >>> list_empty_safe doesn't seem safe here. > > Hmm.. it appears that I missed a circular locking dependency with mdadm > test 21raid5cache (delete_mutex and open_mutex). > > Please take a look at this. > Thanks, I think following patch can fix this, but run this test keep failing in my VM. Song, can you pass this test? diff --git a/drivers/md/md.c b/drivers/md/md.c index 1be64ab76f5c..62b7d2d0e873 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -766,17 +766,29 @@ static void md_free_rdev(struct mddev *mddev) { struct md_rdev *rdev; struct md_rdev *tmp; + LIST_HEAD(delete); - mutex_lock(&mddev->delete_mutex); - if (list_empty(&mddev->deleting)) - goto out; +retry: + if (list_empty_careful(&mddev->deleting)) + return; + + if (!mutex_trylock(&mddev->delete_mutex)) + goto retry; - list_for_each_entry_safe(rdev, tmp, &mddev->deleting, same_set) { + /* + * Clear 'mddev->deleting' first, because export_rdev can trigger + * mddev_unlock(), and grab 'delete_mutex' again will cause deadlock. + * make sure md_free_rdev() will exit directly untill all rdev is + * deleted. + */ + list_splice(&delete, &mddev->deleting); + + list_for_each_entry_safe(rdev, tmp, &delete, same_set) { list_del_init(&rdev->same_set); kobject_del(&rdev->kobj); export_rdev(rdev, mddev); } -out: + mutex_unlock(&mddev->delete_mutex); } > Thanks, > Song > > [ 239.802277] ====================================================== > [ 239.803650] WARNING: possible circular locking dependency detected > [ 239.804929] 6.4.0-rc2+ #772 Not tainted > [ 239.805569] ------------------------------------------------------ > [ 239.806568] kworker/20:1/222 is trying to acquire lock: > [ 239.807406] ffff88815335b3f0 (&mddev->delete_mutex){+.+.}-{3:3}, > at: mddev_unlock+0xe0/0x2d0 > [ 239.808653] > but task is already holding lock: > [ 239.809481] ffffc9000246fe00 > ((work_completion)(&log->disable_writeback_work)){+.+.}-{0:0}, at: > process_one_work+0x462/0xa50 > [ 239.811049] > which lock already depends on the new lock. > > [ 239.812187] > the existing dependency chain (in reverse order) is: > [ 239.813230] > -> #3 > ((work_completion)(&log->disable_writeback_work)){+.+.}-{0:0}: > [ 239.814468] __flush_work+0xdb/0x690 > [ 239.815068] r5l_exit_log+0x59/0xc0 > [ 239.815649] free_conf+0x34/0x320 > [ 239.816243] raid5_free+0x11/0x40 > [ 239.816788] __md_stop+0x9f/0x140 > [ 239.817336] do_md_stop+0x2af/0xaf0 > [ 239.817901] md_ioctl+0xb34/0x1e30 > [ 239.818469] blkdev_ioctl+0x2bf/0x3d0 > [ 239.819079] __x64_sys_ioctl+0xbe/0x100 > [ 239.819701] do_syscall_64+0x3a/0x90 > [ 239.820294] entry_SYSCALL_64_after_hwframe+0x72/0xdc > [ 239.821076] > -> #2 (&mddev->open_mutex){+.+.}-{3:3}: > [ 239.821971] __mutex_lock+0x11d/0x13f0 > [ 239.822592] md_open+0xad/0x180 > [ 239.822937] kobject: 'md0' (ffff88811c8e5498): kobject_uevent_env > [ 239.823113] blkdev_get_whole+0x50/0x120 > [ 239.824216] kobject: 'md0' (ffff88811c8e5498): fill_kobj_path: path > = '/devices/virtual/block/md0' > [ 239.824725] blkdev_get_by_dev+0x309/0x4f0 > [ 239.826747] blkdev_open+0x8a/0x110 > [ 239.827319] do_dentry_open+0x2a5/0x7b0 > [ 239.827939] path_openat+0xcee/0x1070 > [ 239.828545] do_filp_open+0x148/0x1d0 > [ 239.829137] do_sys_openat2+0x2ec/0x470 > [ 239.829791] do_sys_open+0x8a/0xd0 > [ 239.830352] do_syscall_64+0x3a/0x90 > [ 239.830935] entry_SYSCALL_64_after_hwframe+0x72/0xdc > [ 239.831728] > -> #1 (&disk->open_mutex){+.+.}-{3:3}: > [ 239.832615] __mutex_lock+0x11d/0x13f0 > [ 239.833231] blkdev_put+0x65/0x350 > [ 239.833785] export_rdev.isra.63+0x72/0xe0 > [ 239.834456] mddev_unlock+0x1b1/0x2d0 > [ 239.835043] md_ioctl+0x96c/0x1e30 > [ 239.835616] blkdev_ioctl+0x2bf/0x3d0 > [ 239.836213] __x64_sys_ioctl+0xbe/0x100 > [ 239.836828] do_syscall_64+0x3a/0x90 > [ 239.837414] entry_SYSCALL_64_after_hwframe+0x72/0xdc > [ 239.838225] > -> #0 (&mddev->delete_mutex){+.+.}-{3:3}: > [ 239.839145] __lock_acquire+0x1e42/0x34b0 > [ 239.839797] lock_acquire+0x161/0x3d0 > [ 239.840395] __mutex_lock+0x11d/0x13f0 > [ 239.840999] mddev_unlock+0xe0/0x2d0 > [ 239.841582] r5c_disable_writeback_async+0x261/0x270 > [ 239.842354] process_one_work+0x55f/0xa50 > [ 239.842996] worker_thread+0x69/0x660 > [ 239.843595] kthread+0x1a1/0x1e0 > [ 239.844131] ret_from_fork+0x2c/0x50 > [ 239.844712] > other info that might help us debug this: > > [ 239.845813] Chain exists of: > &mddev->delete_mutex --> &mddev->open_mutex --> > (work_completion)(&log->disable_writeback_work) > > [ 239.847776] Possible unsafe locking scenario: > > [ 239.848623] CPU0 CPU1 > [ 239.849262] ---- ---- > [ 239.849927] lock((work_completion)(&log->disable_writeback_work)); > [ 239.850831] lock(&mddev->open_mutex); > [ 239.851719] > lock((work_completion)(&log->disable_writeback_work)); > [ 239.852945] lock(&mddev->delete_mutex); > [ 239.853517] > *** DEADLOCK *** > > [ 239.854345] 2 locks held by kworker/20:1/222: > [ 239.854960] #0: ffff8881000b2748 > ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x462/0xa50 > [ 239.856287] #1: ffffc9000246fe00 > ((work_completion)(&log->disable_writeback_work)){+.+.}-{0:0}, at: > process_one_work+0x462/0xa50 > [ 239.857939] > stack backtrace: > [ 239.858567] CPU: 20 PID: 222 Comm: kworker/20:1 Not tainted 6.4.0-rc2+ #772 > [ 239.859533] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 > [ 239.861135] Workqueue: events r5c_disable_writeback_async > [ 239.861903] Call Trace: > [ 239.862281] > [ 239.862606] dump_stack_lvl+0x46/0x80 > [ 239.863144] check_noncircular+0x1ff/0x240 > [ 239.863737] ? __pfx_check_noncircular+0x10/0x10 > [ 239.864401] ? mark_lock.part.45+0x11a/0x1350 > [ 239.865019] ? add_chain_block+0x23b/0x310 > [ 239.865624] __lock_acquire+0x1e42/0x34b0 > [ 239.866202] ? select_task_rq_fair+0x2b0/0x1e50 > [ 239.866845] ? __pfx___lock_acquire+0x10/0x10 > [ 239.867473] ? ttwu_queue_wakelist+0x1cc/0x1f0 > [ 239.868107] ? __smp_call_single_queue+0x137/0x2a0 > [ 239.868816] ? __default_send_IPI_dest_field+0x2b/0xa0 > [ 239.869548] ? __lock_acquire+0xa5d/0x34b0 > [ 239.870131] lock_acquire+0x161/0x3d0 > [ 239.870660] ? mddev_unlock+0xe0/0x2d0 > [ 239.871216] ? __pfx_lock_acquire+0x10/0x10 > [ 239.871808] ? lock_is_held_type+0xd8/0x130 > [ 239.872405] __mutex_lock+0x11d/0x13f0 > [ 239.872936] ? mddev_unlock+0xe0/0x2d0 > [ 239.873474] ? mddev_unlock+0xe0/0x2d0 > [ 239.874006] ? __mutex_unlock_slowpath+0x12c/0x410 > [ 239.874695] ? __pfx___mutex_lock+0x10/0x10 > [ 239.875297] ? __pfx_rcu_read_lock_held+0x10/0x10 > [ 239.875986] ? mddev_unlock+0xe0/0x2d0 > [ 239.876530] mddev_unlock+0xe0/0x2d0 > [ 239.877044] r5c_disable_writeback_async+0x261/0x270 > [ 239.877753] ? __pfx_r5c_disable_writeback_async+0x10/0x10 > [ 239.878531] ? __switch_to+0x2d8/0x770 > [ 239.879066] ? __pfx_autoremove_wake_function+0x10/0x10 > [ 239.879813] process_one_work+0x55f/0xa50 > [ 239.880398] ? __pfx_process_one_work+0x10/0x10 > [ 239.881073] ? _raw_spin_lock_irq+0x5c/0x90 > [ 239.881675] worker_thread+0x69/0x660 > [ 239.882206] ? __kthread_parkme+0xe4/0x100 > [ 239.882786] ? __pfx_worker_thread+0x10/0x10 > [ 239.883395] kthread+0x1a1/0x1e0 > [ 239.883863] ? __pfx_kthread+0x10/0x10 > [ 239.884406] ret_from_fork+0x2c/0x50 > [ 239.884925] > . >