2023-08-03 12:42:59

by Huang Cun

[permalink] [raw]
Subject: [PATCH] scsi: scsi_dh_rdac: Avoid crash when a disk attach failed

When a disk fails to attach, the struct rdac_dh_data is released,
but it is not removed from the ctlr->dh_list. When attaching another
disk, the released rdac_dh_data will be accessed and the following
BUG_ON() may be observed:

[ 414.696167] scsi 5:0:0:7: rdac: Attach failed (8)
...
[ 423.615364] kernel BUG at drivers/scsi/device_handler/scsi_dh_rdac.c:427!
[ 423.615731] invalid opcode: 0000 [#1] SMP NOPTI
...
[ 423.623247] Call Trace:
[ 423.623598] rdac_bus_attach+0x203/0x4c0
[ 423.623949] ? scsi_dh_handler_attach+0x2d/0x90
[ 423.624300] scsi_dh_handler_attach+0x2d/0x90
[ 423.624652] scsi_sysfs_add_sdev+0x88/0x270
[ 423.625004] scsi_probe_and_add_lun+0xc47/0xd50
[ 423.625354] scsi_report_lun_scan+0x339/0x3b0
[ 423.625705] __scsi_scan_target+0xe9/0x220
[ 423.626056] scsi_scan_target+0xf6/0x100
[ 423.626404] fc_scsi_scan_rport+0xa5/0xb0
[ 423.626757] process_one_work+0x15e/0x3f0
[ 423.627106] worker_thread+0x4c/0x440
[ 423.627453] ? rescuer_thread+0x350/0x350
[ 423.627804] kthread+0xf8/0x130
[ 423.628153] ? kthread_destroy_worker+0x40/0x40
[ 423.628509] ret_from_fork+0x1f/0x40

Fixes: 1a5dc166cd88 ("scsi_dh_rdac: update 'access_state' field")
Signed-off-by: Huang Cun <[email protected]>
Signed-off-by: Ding Hui <[email protected]>
Cc: Donglin Peng <[email protected]>
---
drivers/scsi/device_handler/scsi_dh_rdac.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c
index c5538645057a..9d487c2b7708 100644
--- a/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -762,8 +762,10 @@ static int rdac_bus_attach(struct scsi_device *sdev)

clean_ctlr:
spin_lock(&list_lock);
+ list_del_rcu(&h->node);
kref_put(&h->ctlr->kref, release_controller);
spin_unlock(&list_lock);
+ synchronize_rcu();

failed:
kfree(h);
--
2.27.0



2023-09-06 10:05:42

by Ding Hui

[permalink] [raw]
Subject: Re: [PATCH] scsi: scsi_dh_rdac: Avoid crash when a disk attach failed

Friendly ping.

On 2023/8/3 19:28, Huang Cun wrote:
> When a disk fails to attach, the struct rdac_dh_data is released,
> but it is not removed from the ctlr->dh_list. When attaching another
> disk, the released rdac_dh_data will be accessed and the following
> BUG_ON() may be observed:
>
> [ 414.696167] scsi 5:0:0:7: rdac: Attach failed (8)
> ...
> [ 423.615364] kernel BUG at drivers/scsi/device_handler/scsi_dh_rdac.c:427!
> [ 423.615731] invalid opcode: 0000 [#1] SMP NOPTI
> ...
> [ 423.623247] Call Trace:
> [ 423.623598] rdac_bus_attach+0x203/0x4c0
> [ 423.623949] ? scsi_dh_handler_attach+0x2d/0x90
> [ 423.624300] scsi_dh_handler_attach+0x2d/0x90
> [ 423.624652] scsi_sysfs_add_sdev+0x88/0x270
> [ 423.625004] scsi_probe_and_add_lun+0xc47/0xd50
> [ 423.625354] scsi_report_lun_scan+0x339/0x3b0
> [ 423.625705] __scsi_scan_target+0xe9/0x220
> [ 423.626056] scsi_scan_target+0xf6/0x100
> [ 423.626404] fc_scsi_scan_rport+0xa5/0xb0
> [ 423.626757] process_one_work+0x15e/0x3f0
> [ 423.627106] worker_thread+0x4c/0x440
> [ 423.627453] ? rescuer_thread+0x350/0x350
> [ 423.627804] kthread+0xf8/0x130
> [ 423.628153] ? kthread_destroy_worker+0x40/0x40
> [ 423.628509] ret_from_fork+0x1f/0x40
>
> Fixes: 1a5dc166cd88 ("scsi_dh_rdac: update 'access_state' field")
> Signed-off-by: Huang Cun <[email protected]>
> Signed-off-by: Ding Hui <[email protected]>
> Cc: Donglin Peng <[email protected]>
> ---
> drivers/scsi/device_handler/scsi_dh_rdac.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c
> index c5538645057a..9d487c2b7708 100644
> --- a/drivers/scsi/device_handler/scsi_dh_rdac.c
> +++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
> @@ -762,8 +762,10 @@ static int rdac_bus_attach(struct scsi_device *sdev)
>
> clean_ctlr:
> spin_lock(&list_lock);
> + list_del_rcu(&h->node);
> kref_put(&h->ctlr->kref, release_controller);
> spin_unlock(&list_lock);
> + synchronize_rcu();
>
> failed:
> kfree(h);

--
Thanks,
- Ding Hui

2023-09-07 05:39:59

by Ding Hui

[permalink] [raw]
Subject: Re: [PATCH] scsi: scsi_dh_rdac: Avoid crash when a disk attach failed

On 2023/9/6 23:51, Mike Christie wrote:
> On 8/3/23 6:28 AM, Huang Cun wrote:
>> When a disk fails to attach, the struct rdac_dh_data is released,
>> but it is not removed from the ctlr->dh_list. When attaching another
>> disk, the released rdac_dh_data will be accessed and the following
>> BUG_ON() may be observed:
>>
>> [ 414.696167] scsi 5:0:0:7: rdac: Attach failed (8)
>> ...
>> [ 423.615364] kernel BUG at drivers/scsi/device_handler/scsi_dh_rdac.c:427!
>> [ 423.615731] invalid opcode: 0000 [#1] SMP NOPTI
>> ...
>> [ 423.623247] Call Trace:
>> [ 423.623598] rdac_bus_attach+0x203/0x4c0
>> [ 423.623949] ? scsi_dh_handler_attach+0x2d/0x90
>> [ 423.624300] scsi_dh_handler_attach+0x2d/0x90
>> [ 423.624652] scsi_sysfs_add_sdev+0x88/0x270
>> [ 423.625004] scsi_probe_and_add_lun+0xc47/0xd50
>> [ 423.625354] scsi_report_lun_scan+0x339/0x3b0
>> [ 423.625705] __scsi_scan_target+0xe9/0x220
>> [ 423.626056] scsi_scan_target+0xf6/0x100
>> [ 423.626404] fc_scsi_scan_rport+0xa5/0xb0
>> [ 423.626757] process_one_work+0x15e/0x3f0
>> [ 423.627106] worker_thread+0x4c/0x440
>> [ 423.627453] ? rescuer_thread+0x350/0x350
>> [ 423.627804] kthread+0xf8/0x130
>> [ 423.628153] ? kthread_destroy_worker+0x40/0x40
>> [ 423.628509] ret_from_fork+0x1f/0x40
>>
>> Fixes: 1a5dc166cd88 ("scsi_dh_rdac: update 'access_state' field")
>> Signed-off-by: Huang Cun <[email protected]>
>> Signed-off-by: Ding Hui <[email protected]>
>> Cc: Donglin Peng <[email protected]>
>> ---
>> drivers/scsi/device_handler/scsi_dh_rdac.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c b/drivers/scsi/device_handler/scsi_dh_rdac.c
>> index c5538645057a..9d487c2b7708 100644
>> --- a/drivers/scsi/device_handler/scsi_dh_rdac.c
>> +++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
>> @@ -762,8 +762,10 @@ static int rdac_bus_attach(struct scsi_device *sdev)
>>
>> clean_ctlr:
>> spin_lock(&list_lock);
>> + list_del_rcu(&h->node);
>> kref_put(&h->ctlr->kref, release_controller);
>> spin_unlock(&list_lock);
>> + synchronize_rcu();
>>
>
> Should this be:
>
> spin_lock(&list_lock);
> list_del_rcu(&h->node);
> spin_unlock(&list_lock);
>
> synchronize_rcu();
>
> kref_put(&h->ctlr->kref, release_controller);
>
>
> ?
>
> If you do the synchronize_rcu after the kref_put, then the kref_put
> could free the rdac_dh_data, while check_ownership is still
> accessing the rdac_dh_data, right?
>

You are right.

But I think we should keep the kref_put() and release callback be protected by list_lock, and only free
the ctlr after synchronize_rcu().

So how about the additional modify (not yet tested):

--- a/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -166,6 +166,7 @@ struct rdac_controller {
struct scsi_device *ms_sdev;
struct list_head ms_head;
struct list_head dh_list;
+ struct rcu_head rcu;
};

struct c2_inquiry {
@@ -320,7 +321,7 @@ static void release_controller(struct kref *kref)
ctlr = container_of(kref, struct rdac_controller, kref);

list_del(&ctlr->node);
- kfree(ctlr);
+ kfree_rcu(ctlr, rcu);
}

static struct rdac_controller *get_controller(int index, char *array_name,


--
Thanks,
- Ding Hui