Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp527989pxv; Thu, 22 Jul 2021 06:13:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzGCS66I4GlQQ/+4yXEl9b4/oGKCRvAVf/3HcUvPsVz9cQAvb4RyTsUVk9FiKGUqofBbk3r X-Received: by 2002:a17:906:7151:: with SMTP id z17mr44612223ejj.166.1626959616764; Thu, 22 Jul 2021 06:13:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626959616; cv=none; d=google.com; s=arc-20160816; b=W/wAzKQgC3QS2z5c2EdBR43KLDVPcLQrXW8D8kFwBakcdGtOio6IWBS/Dd/IKquUhc 0ckYEezSG94mxytzO3lPa9eOAdB1R4HQojwDW1Bc+ZAk1F2exTz7LaI0nO9dwYlGuneD Va3oSuLcgyOJp+dyxns+Ci2797x1QqlVSRVE8nNA0QWncvubAdL16MJFyKGMucQbiTNt C8Ij1w+RAmGGqTtHUyT7gfPoi7LKXkGBgyLLVEXLrSJhjRUe004l7dF+8IgItzLgAHa7 VQ70ceRGFi0yxvjjjYS5wpxwv9pJQ85Ua13zDGDcz7lsj+oaLLaxP5Ek/aGF1InEOCv9 ScMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:to:subject; bh=eLt5FJONmzoWu3wq33s6aXnRv2nLklwHhYolyTZvI8k=; b=zBf5rePJ0iRfZeuae2POUQiVvJJmbxDfxfAx6fHS3SkKhi8NXShowCmy5rp3oLYyLe KFYEMb3F8LS13ucsYU4r/gYtj89yPd9lVgpjDYZRepVomRWUPiHCEJP+nFSKaCTYO8yF zfBsVu/WSBQU/TgTAxFfaUvEV0ureiVspL7wmqbUpYak3EqRlXF1yaJfSqMppenoqgA2 fDQWxuIrNebEQNmPkaBT5yQjYO6V4i4S5LQmXUQrb23+AATdU9/Kk2DsrlQw7zv9b9XY tWjw6oEopNQ6sDE6w+D4PTCNSXAHooSzBAZUuW3Naflg5qW2D1Mu41zOvRdqBcTpkQJ8 fYEg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bo10si21590850edb.577.2021.07.22.06.13.12; Thu, 22 Jul 2021 06:13:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232072AbhGVM3L (ORCPT + 99 others); Thu, 22 Jul 2021 08:29:11 -0400 Received: from szxga01-in.huawei.com ([45.249.212.187]:15048 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230418AbhGVM3L (ORCPT ); Thu, 22 Jul 2021 08:29:11 -0400 Received: from dggeme754-chm.china.huawei.com (unknown [172.30.72.54]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4GVt4X25DBzZrfs; Thu, 22 Jul 2021 21:06:20 +0800 (CST) Received: from [10.174.178.185] (10.174.178.185) by dggeme754-chm.china.huawei.com (10.3.19.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2176.2; Thu, 22 Jul 2021 21:09:43 +0800 Subject: Re: [PATH v2] scsi: scsi_dh_rdac: Avoid crash during rdac_bus_attach To: , , , References: <20210113063103.2698953-1-yebin10@huawei.com> From: yebin Message-ID: <60F96E17.6030306@huawei.com> Date: Thu, 22 Jul 2021 21:09:43 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <20210113063103.2698953-1-yebin10@huawei.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.178.185] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggeme754-chm.china.huawei.com (10.3.19.100) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/1/13 14:31, Ye Bin wrote: > We get follow BUG_ON when rdac scan: > [595952.944297] kernel BUG at drivers/scsi/device_handler/scsi_dh_rdac.c:427! > [595952.951143] Internal error: Oops - BUG: 0 [#1] SMP > ...... > [595953.251065] Call trace: > [595953.259054] check_ownership+0xb0/0x118 > [595953.269794] rdac_bus_attach+0x1f0/0x4b0 > [595953.273787] scsi_dh_handler_attach+0x3c/0xe8 > [595953.278211] scsi_dh_add_device+0xc4/0xe8 > [595953.282291] scsi_sysfs_add_sdev+0x8c/0x2a8 > [595953.286544] scsi_probe_and_add_lun+0x9fc/0xd00 > [595953.291142] __scsi_scan_target+0x598/0x630 > [595953.295395] scsi_scan_target+0x120/0x130 > [595953.299481] fc_user_scan+0x1a0/0x1c0 [scsi_transport_fc] > [595953.304944] store_scan+0xb0/0x108 > [595953.308420] dev_attr_store+0x44/0x60 > [595953.312160] sysfs_kf_write+0x58/0x80 > [595953.315893] kernfs_fop_write+0xe8/0x1f0 > [595953.319888] __vfs_write+0x60/0x190 > [595953.323448] vfs_write+0xac/0x1c0 > [595953.326836] ksys_write+0x74/0xf0 > [595953.330221] __arm64_sys_write+0x24/0x30 > > BUG_ON code is in check_ownership: > list_for_each_entry_rcu(tmp, &h->ctlr->dh_list, node) { > /* h->sdev should always be valid */ > BUG_ON(!tmp->sdev); > tmp->sdev->access_state = access_state; > } > rdac_bus_attach > initialize_controller > list_add_rcu(&h->node, &h->ctlr->dh_list); > h->sdev = sdev; > rdac_bus_detach > list_del_rcu(&h->node); > h->sdev = NULL; > > Test as follow steps: > (1) Find IO error, remove disk; > (2) Insert disk back; > (3) trigger scan disk; > > There is race between rdac_bus_attach and rdac_bus_detach, maybe access > rdac_dh_data which h->sdev has been set NULL when process rdac attach. And also > find that "h->sdev" set value after add list, this may lead to reference NULL ptr. > > Signed-off-by: Ye Bin > --- > drivers/scsi/device_handler/scsi_dh_rdac.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c > index 5efc959493ec..85a71bafaea7 100644 > --- a/drivers/scsi/device_handler/scsi_dh_rdac.c > +++ b/drivers/scsi/device_handler/scsi_dh_rdac.c > @@ -453,8 +453,8 @@ static int initialize_controller(struct scsi_device *sdev, > if (!h->ctlr) > err = SCSI_DH_RES_TEMP_UNAVAIL; > else { > - list_add_rcu(&h->node, &h->ctlr->dh_list); > h->sdev = sdev; > + list_add_rcu(&h->node, &h->ctlr->dh_list); > } > spin_unlock(&list_lock); > err = SCSI_DH_OK; > @@ -778,11 +778,11 @@ static void rdac_bus_detach( struct scsi_device *sdev ) > spin_lock(&list_lock); > if (h->ctlr) { > list_del_rcu(&h->node); > - h->sdev = NULL; > kref_put(&h->ctlr->kref, release_controller); > } > spin_unlock(&list_lock); > sdev->handler_data = NULL; > + synchronize_rcu(); > kfree(h); > } > ping ...