2020-11-28 21:59:35

by Ding Hui

[permalink] [raw]
Subject: [PATCH] scsi: ses: Fix crash caused by kfree an invalid pointer

We can get a crash when disconnecting the iSCSI session,
the call trace like this:

[ffff00002a00fb70] kfree at ffff00000830e224
[ffff00002a00fba0] ses_intf_remove at ffff000001f200e4
[ffff00002a00fbd0] device_del at ffff0000086b6a98
[ffff00002a00fc50] device_unregister at ffff0000086b6d58
[ffff00002a00fc70] __scsi_remove_device at ffff00000870608c
[ffff00002a00fca0] scsi_remove_device at ffff000008706134
[ffff00002a00fcc0] __scsi_remove_target at ffff0000087062e4
[ffff00002a00fd10] scsi_remove_target at ffff0000087064c0
[ffff00002a00fd70] __iscsi_unbind_session at ffff000001c872c4
[ffff00002a00fdb0] process_one_work at ffff00000810f35c
[ffff00002a00fe00] worker_thread at ffff00000810f648
[ffff00002a00fe70] kthread at ffff000008116e98

In ses_intf_add, components count could be 0, and kcalloc 0 size scomp,
but not saved in edev->component[i].scratch

In this situation, edev->component[0].scratch is an invalid pointer,
when kfree it in ses_intf_remove_enclosure, a crash like above would happen
The call trace also could be other random cases when kfree cannot catch
the invalid pointer

We should not use edev->component[] array when the components count is 0
We also need check index when use edev->component[] array in
ses_enclosure_data_process

Tested-by: Zeng Zhicong <[email protected]>
Cc: stable <[email protected]> # 2.6.25+
Signed-off-by: Ding Hui <[email protected]>
---
drivers/scsi/ses.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
index c2afba2a5414..f5ef0a91f0eb 100644
--- a/drivers/scsi/ses.c
+++ b/drivers/scsi/ses.c
@@ -477,9 +477,6 @@ static int ses_enclosure_find_by_addr(struct enclosure_device *edev,
int i;
struct ses_component *scomp;

- if (!edev->component[0].scratch)
- return 0;
-
for (i = 0; i < edev->components; i++) {
scomp = edev->component[i].scratch;
if (scomp->addr != efd->addr)
@@ -565,8 +562,10 @@ static void ses_enclosure_data_process(struct enclosure_device *edev,
components++,
type_ptr[0],
name);
- else
+ else if (components < edev->components)
ecomp = &edev->component[components++];
+ else
+ ecomp = ERR_PTR(-EINVAL);

if (!IS_ERR(ecomp)) {
if (addl_desc_ptr)
@@ -731,9 +730,11 @@ static int ses_intf_add(struct device *cdev,
buf = NULL;
}
page2_not_supported:
- scomp = kcalloc(components, sizeof(struct ses_component), GFP_KERNEL);
- if (!scomp)
- goto err_free;
+ if (components > 0) {
+ scomp = kcalloc(components, sizeof(struct ses_component), GFP_KERNEL);
+ if (!scomp)
+ goto err_free;
+ }

edev = enclosure_register(cdev->parent, dev_name(&sdev->sdev_gendev),
components, &ses_enclosure_callbacks);
@@ -813,7 +814,8 @@ static void ses_intf_remove_enclosure(struct scsi_device *sdev)
kfree(ses_dev->page2);
kfree(ses_dev);

- kfree(edev->component[0].scratch);
+ if (edev->components > 0)
+ kfree(edev->component[0].scratch);

put_device(&edev->edev);
enclosure_unregister(edev);
--
2.17.1


2020-11-28 23:32:11

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] scsi: ses: Fix crash caused by kfree an invalid pointer

On Sat, 2020-11-28 at 20:23 +0800, Ding Hui wrote:
> We can get a crash when disconnecting the iSCSI session,
> the call trace like this:
>
> [ffff00002a00fb70] kfree at ffff00000830e224
> [ffff00002a00fba0] ses_intf_remove at ffff000001f200e4
> [ffff00002a00fbd0] device_del at ffff0000086b6a98
> [ffff00002a00fc50] device_unregister at ffff0000086b6d58
> [ffff00002a00fc70] __scsi_remove_device at ffff00000870608c
> [ffff00002a00fca0] scsi_remove_device at ffff000008706134
> [ffff00002a00fcc0] __scsi_remove_target at ffff0000087062e4
> [ffff00002a00fd10] scsi_remove_target at ffff0000087064c0
> [ffff00002a00fd70] __iscsi_unbind_session at ffff000001c872c4
> [ffff00002a00fdb0] process_one_work at ffff00000810f35c
> [ffff00002a00fe00] worker_thread at ffff00000810f648
> [ffff00002a00fe70] kthread at ffff000008116e98
>
> In ses_intf_add, components count could be 0, and kcalloc 0 size
> scomp,
> but not saved in edev->component[i].scratch
>
> In this situation, edev->component[0].scratch is an invalid pointer,
> when kfree it in ses_intf_remove_enclosure, a crash like above would
> happen
> The call trace also could be other random cases when kfree cannot
> catch
> the invalid pointer
>
> We should not use edev->component[] array when the components count
> is 0
> We also need check index when use edev->component[] array in
> ses_enclosure_data_process
>
> Tested-by: Zeng Zhicong <[email protected]>
> Cc: stable <[email protected]> # 2.6.25+
> Signed-off-by: Ding Hui <[email protected]>

This doesn't really look to be the right thing to do: an enclosure
which has no component can't usefully be controlled by the driver since
there's nothing for it to do, so what we should do in this situation is
refuse to attach like the proposed patch below.

It does seem a bit odd that someone would build an enclosure that
doesn't enclose anything, so would you mind running

sg_ses -e

on it and reporting back what it shows? It's possible there's another
type that the enclosure device should be tracking.

Regards,

James

---8>8>8><8<8<8--------
From: James Bottomley <[email protected]>
Subject: [PATCH] scsi: ses: don't attach if enclosure has no components

An enclosure with no components can't usefully be operated by the
driver (since effectively it has nothing to manage), so report the
problem and don't attach. Not attaching also fixes an oops which
could occur if the driver tries to manage a zero component enclosure.

Reported-by: Ding Hui <[email protected]>
Cc: [email protected]
Signed-off-by: James Bottomley <[email protected]>
---
drivers/scsi/ses.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
index c2afba2a5414..9624298b9c89 100644
--- a/drivers/scsi/ses.c
+++ b/drivers/scsi/ses.c
@@ -690,6 +690,11 @@ static int ses_intf_add(struct device *cdev,
type_ptr[0] == ENCLOSURE_COMPONENT_ARRAY_DEVICE)
components += type_ptr[1];
}
+ if (components == 0) {
+ sdev_printk(KERN_ERR, sdev, "enclosure has no enumerated components\n");
+ goto err_free;
+ }
+
ses_dev->page1 = buf;
ses_dev->page1_len = len;
buf = NULL;
--
2.26.2



2020-11-29 05:18:21

by Douglas Gilbert

[permalink] [raw]
Subject: Re: [PATCH] scsi: ses: Fix crash caused by kfree an invalid pointer

On 2020-11-28 6:27 p.m., James Bottomley wrote:
> On Sat, 2020-11-28 at 20:23 +0800, Ding Hui wrote:
>> We can get a crash when disconnecting the iSCSI session,
>> the call trace like this:
>>
>> [ffff00002a00fb70] kfree at ffff00000830e224
>> [ffff00002a00fba0] ses_intf_remove at ffff000001f200e4
>> [ffff00002a00fbd0] device_del at ffff0000086b6a98
>> [ffff00002a00fc50] device_unregister at ffff0000086b6d58
>> [ffff00002a00fc70] __scsi_remove_device at ffff00000870608c
>> [ffff00002a00fca0] scsi_remove_device at ffff000008706134
>> [ffff00002a00fcc0] __scsi_remove_target at ffff0000087062e4
>> [ffff00002a00fd10] scsi_remove_target at ffff0000087064c0
>> [ffff00002a00fd70] __iscsi_unbind_session at ffff000001c872c4
>> [ffff00002a00fdb0] process_one_work at ffff00000810f35c
>> [ffff00002a00fe00] worker_thread at ffff00000810f648
>> [ffff00002a00fe70] kthread at ffff000008116e98
>>
>> In ses_intf_add, components count could be 0, and kcalloc 0 size
>> scomp,
>> but not saved in edev->component[i].scratch
>>
>> In this situation, edev->component[0].scratch is an invalid pointer,
>> when kfree it in ses_intf_remove_enclosure, a crash like above would
>> happen
>> The call trace also could be other random cases when kfree cannot
>> catch
>> the invalid pointer
>>
>> We should not use edev->component[] array when the components count
>> is 0
>> We also need check index when use edev->component[] array in
>> ses_enclosure_data_process
>>
>> Tested-by: Zeng Zhicong <[email protected]>
>> Cc: stable <[email protected]> # 2.6.25+
>> Signed-off-by: Ding Hui <[email protected]>
>
> This doesn't really look to be the right thing to do: an enclosure
> which has no component can't usefully be controlled by the driver since
> there's nothing for it to do, so what we should do in this situation is
> refuse to attach like the proposed patch below.
>
> It does seem a bit odd that someone would build an enclosure that
> doesn't enclose anything, so would you mind running
>
> sg_ses -e

'-e' is the short form of '--enumerate'. That will report the names
and abbreviations of the diagnostic pages that the utility itself
knows about (and supports). It won't show anything specific about
the environment that sg_ses is executed in.

You probably meant:
sg_ses <ses_device>

Examples of the likely forms are:
sg_ses /dev/bsg/1:0:0:0
sg_ses /dev/sg2
sg_ses /dev/ses0

This from a nearby machine:

$ lsscsi -gs
[3:0:0:0] disk ATA Samsung SSD 850 1B6Q /dev/sda /dev/sg0 120GB
[4:0:0:0] disk IBM-207x HUSMM8020ASS20 J4B6 /dev/sdc /dev/sg2 200GB
[4:0:1:0] disk ATA INTEL SSDSC2KW25 003C /dev/sdd /dev/sg3 256GB
[4:0:2:0] disk SEAGATE ST10000NM0096 E005 /dev/sde /dev/sg4 10.0TB
[4:0:3:0] enclosu Areca Te ARC-802801.37.69 0137 - /dev/sg5 -
[4:0:4:0] enclosu Intel RES2SV240 0d00 - /dev/sg6 -
[7:0:0:0] disk Kingston DataTravelerMini PMAP /dev/sdb /dev/sg1 1.03GB
[N:0:0:1] disk WDC WDS256G1X0C-00ENX0__1 /dev/nvme0n1 - 256GB

# sg_ses /dev/sg5
Areca Te ARC-802801.37.69 0137
Supported diagnostic pages:
Supported Diagnostic Pages [sdp] [0x0]
Configuration (SES) [cf] [0x1]
Enclosure Status/Control (SES) [ec,es] [0x2]
String In/Out (SES) [str] [0x4]
Threshold In/Out (SES) [th] [0x5]
Element Descriptor (SES) [ed] [0x7]
Additional Element Status (SES-2) [aes] [0xa]
Supported SES Diagnostic Pages (SES-2) [ssp] [0xd]
Download Microcode (SES-2) [dm] [0xe]
Subenclosure Nickname (SES-2) [snic] [0xf]
Protocol Specific (SAS transport) [] [0x3f]

# sg_ses -p cf /dev/sg5
Areca Te ARC-802801.37.69 0137
Configuration diagnostic page:
number of secondary subenclosures: 0
generation code: 0x0
enclosure descriptor list
Subenclosure identifier: 0 [primary]
relative ES process id: 1, number of ES processes: 1
number of type descriptor headers: 9
enclosure logical identifier (hex): d5b401503fc0ec16
enclosure vendor: Areca Te product: ARC-802801.37.69 rev: 0137
vendor-specific data:
11 22 33 44 55 00 00 00 ."3DU...

type descriptor header and text list
Element type: Array device slot, subenclosure id: 0
number of possible elements: 24
text: ArrayDevicesInSubEnclsr0
Element type: Enclosure, subenclosure id: 0
number of possible elements: 1
text: EnclosureElementInSubEnclsr0
Element type: SAS expander, subenclosure id: 0
number of possible elements: 1
text: SAS Expander
Element type: Cooling, subenclosure id: 0
number of possible elements: 5
text: CoolingElementInSubEnclsr0
Element type: Temperature sensor, subenclosure id: 0
number of possible elements: 2
text: TempSensorsInSubEnclsr0
Element type: Voltage sensor, subenclosure id: 0
number of possible elements: 2
text: VoltageSensorsInSubEnclsr0
Element type: SAS connector, subenclosure id: 0
number of possible elements: 3
text: ConnectorsInSubEnclsr0
Element type: Power supply, subenclosure id: 0
number of possible elements: 2
text: PowerSupplyInSubEnclsr0
Element type: Audible alarm, subenclosure id: 0
number of possible elements: 1
text: AudibleAlarmInSubEnclsr0

Doug Gilbert

> on it and reporting back what it shows? It's possible there's another
> type that the enclosure device should be tracking.
>
> Regards,
>
> James
>
> ---8>8>8><8<8<8--------
> From: James Bottomley <[email protected]>
> Subject: [PATCH] scsi: ses: don't attach if enclosure has no components
>
> An enclosure with no components can't usefully be operated by the
> driver (since effectively it has nothing to manage), so report the
> problem and don't attach. Not attaching also fixes an oops which
> could occur if the driver tries to manage a zero component enclosure.
>
> Reported-by: Ding Hui <[email protected]>
> Cc: [email protected]
> Signed-off-by: James Bottomley <[email protected]>
> ---
> drivers/scsi/ses.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
> index c2afba2a5414..9624298b9c89 100644
> --- a/drivers/scsi/ses.c
> +++ b/drivers/scsi/ses.c
> @@ -690,6 +690,11 @@ static int ses_intf_add(struct device *cdev,
> type_ptr[0] == ENCLOSURE_COMPONENT_ARRAY_DEVICE)
> components += type_ptr[1];
> }
> + if (components == 0) {
> + sdev_printk(KERN_ERR, sdev, "enclosure has no enumerated components\n");
> + goto err_free;
> + }
> +
> ses_dev->page1 = buf;
> ses_dev->page1_len = len;
> buf = NULL;
>

2020-11-30 02:32:10

by Ding Hui

[permalink] [raw]
Subject: Re: [PATCH] scsi: ses: Fix crash caused by kfree an invalid pointer

On 2020/11/29 13:12, Douglas Gilbert wrote:
> On 2020-11-28 6:27 p.m., James Bottomley wrote:
>> On Sat, 2020-11-28 at 20:23 +0800, Ding Hui wrote:
>>> We can get a crash when disconnecting the iSCSI session,
>>> the call trace like this:
>>>
>>>    [ffff00002a00fb70] kfree at ffff00000830e224
>>>    [ffff00002a00fba0] ses_intf_remove at ffff000001f200e4
>>>    [ffff00002a00fbd0] device_del at ffff0000086b6a98
>>>    [ffff00002a00fc50] device_unregister at ffff0000086b6d58
>>>    [ffff00002a00fc70] __scsi_remove_device at ffff00000870608c
>>>    [ffff00002a00fca0] scsi_remove_device at ffff000008706134
>>>    [ffff00002a00fcc0] __scsi_remove_target at ffff0000087062e4
>>>    [ffff00002a00fd10] scsi_remove_target at ffff0000087064c0
>>>    [ffff00002a00fd70] __iscsi_unbind_session at ffff000001c872c4
>>>    [ffff00002a00fdb0] process_one_work at ffff00000810f35c
>>>    [ffff00002a00fe00] worker_thread at ffff00000810f648
>>>    [ffff00002a00fe70] kthread at ffff000008116e98
>>>
>>> In ses_intf_add, components count could be 0, and kcalloc 0 size
>>> scomp,
>>> but not saved in edev->component[i].scratch
>>>
>>> In this situation, edev->component[0].scratch is an invalid pointer,
>>> when kfree it in ses_intf_remove_enclosure, a crash like above would
>>> happen
>>> The call trace also could be other random cases when kfree cannot
>>> catch
>>> the invalid pointer
>>>
>>> We should not use edev->component[] array when the components count
>>> is 0
>>> We also need check index when use edev->component[] array in
>>> ses_enclosure_data_process
>>>
>>> Tested-by: Zeng Zhicong <[email protected]>
>>> Cc: stable <[email protected]> # 2.6.25+
>>> Signed-off-by: Ding Hui <[email protected]>
>>
>> This doesn't really look to be the right thing to do: an enclosure
>> which has no component can't usefully be controlled by the driver since
>> there's nothing for it to do, so what we should do in this situation is
>> refuse to attach like the proposed patch below.
>>
>> It does seem a bit odd that someone would build an enclosure that
>> doesn't enclose anything, so would you mind running
>>
>> sg_ses -e
>
> '-e' is the short form of '--enumerate'. That will report the names
> and abbreviations of the diagnostic pages that the utility itself
> knows about (and supports). It won't show anything specific about
> the environment that sg_ses is executed in.
>
> You probably meant:
>   sg_ses <ses_device>
>
> Examples of the likely forms are:
>   sg_ses /dev/bsg/1:0:0:0
>   sg_ses /dev/sg2
>   sg_ses /dev/ses0
>
> This from a nearby machine:
>
> $ lsscsi -gs
> [3:0:0:0]  disk  ATA      Samsung SSD 850  1B6Q  /dev/sda   /dev/sg0
> 120GB
> [4:0:0:0]  disk  IBM-207x HUSMM8020ASS20   J4B6  /dev/sdc   /dev/sg2
> 200GB
> [4:0:1:0]  disk  ATA      INTEL SSDSC2KW25 003C  /dev/sdd   /dev/sg3
> 256GB
> [4:0:2:0]  disk  SEAGATE  ST10000NM0096    E005  /dev/sde   /dev/sg4
> 10.0TB
> [4:0:3:0]  enclosu Areca Te ARC-802801.37.69 0137  -
> /dev/sg5        -
> [4:0:4:0]  enclosu Intel    RES2SV240        0d00  -
> /dev/sg6        -
> [7:0:0:0]  disk    Kingston DataTravelerMini PMAP  /dev/sdb /dev/sg1
> 1.03GB
> [N:0:0:1]  disk    WDC WDS256G1X0C-00ENX0__1       /dev/nvme0n1  -
> 256GB
>
> # sg_ses /dev/sg5
>   Areca Te  ARC-802801.37.69  0137
> Supported diagnostic pages:
>   Supported Diagnostic Pages [sdp] [0x0]
>   Configuration (SES) [cf] [0x1]
>   Enclosure Status/Control (SES) [ec,es] [0x2]
>   String In/Out (SES) [str] [0x4]
>   Threshold In/Out (SES) [th] [0x5]
>   Element Descriptor (SES) [ed] [0x7]
>   Additional Element Status (SES-2) [aes] [0xa]
>   Supported SES Diagnostic Pages (SES-2) [ssp] [0xd]
>   Download Microcode (SES-2) [dm] [0xe]
>   Subenclosure Nickname (SES-2) [snic] [0xf]
>   Protocol Specific (SAS transport) [] [0x3f]
>
> # sg_ses -p cf /dev/sg5
>   Areca Te  ARC-802801.37.69  0137
> Configuration diagnostic page:
>   number of secondary subenclosures: 0
>   generation code: 0x0
>   enclosure descriptor list
>     Subenclosure identifier: 0 [primary]
>       relative ES process id: 1, number of ES processes: 1
>       number of type descriptor headers: 9
>       enclosure logical identifier (hex): d5b401503fc0ec16
>       enclosure vendor: Areca Te  product: ARC-802801.37.69  rev: 0137
>       vendor-specific data:
>         11 22 33 44 55 00 00 00                             ."3DU...
>
>   type descriptor header and text list
>     Element type: Array device slot, subenclosure id: 0
>       number of possible elements: 24
>       text: ArrayDevicesInSubEnclsr0
>     Element type: Enclosure, subenclosure id: 0
>       number of possible elements: 1
>       text: EnclosureElementInSubEnclsr0
>     Element type: SAS expander, subenclosure id: 0
>       number of possible elements: 1
>       text: SAS Expander
>     Element type: Cooling, subenclosure id: 0
>       number of possible elements: 5
>       text: CoolingElementInSubEnclsr0
>     Element type: Temperature sensor, subenclosure id: 0
>       number of possible elements: 2
>       text: TempSensorsInSubEnclsr0
>     Element type: Voltage sensor, subenclosure id: 0
>       number of possible elements: 2
>       text: VoltageSensorsInSubEnclsr0
>     Element type: SAS connector, subenclosure id: 0
>       number of possible elements: 3
>       text: ConnectorsInSubEnclsr0
>     Element type: Power supply, subenclosure id: 0
>       number of possible elements: 2
>       text: PowerSupplyInSubEnclsr0
>     Element type: Audible alarm, subenclosure id: 0
>       number of possible elements: 1
>       text: AudibleAlarmInSubEnclsr0
>
> Doug Gilbert
>
>> on it and reporting back what it shows?  It's possible there's another
>> type that the enclosure device should be tracking.

kernel log:

2020-11-30 09:29:44.228339 info [kernel:] [425726.567579] scsi host18:
iSCSI Initiator over TCP/IP
2020-11-30 09:29:44.476319 notice [kernel:] [425726.817417] scsi
18:0:0:0: Direct-Access DELL MD32xxi 0784 PQ: 0 ANSI: 5
2020-11-30 09:29:44.480314 notice [kernel:] [425726.820591] scsi
18:0:0:0: rdac: LUN 0 (IOSHIP) (owned)
2020-11-30 09:29:44.480319 notice [kernel:] [425726.820810] sd 18:0:0:0:
Attached scsi generic sg30 type 0
2020-11-30 09:29:44.480320 notice [kernel:] [425726.820812] sd 18:0:0:0:
Embedded Enclosure Device
2020-11-30 09:29:44.480321 warning [kernel:] [425726.821119] sd
18:0:0:0: Mode parameters changed
2020-11-30 09:29:44.492316 info [kernel:] [425726.831444] sd 18:0:0:0:
[ses_intf_add]:777 sdev: ffff8027ca170000 edev:ffff80271459cc00 com:
0 edev->component[0].scratch: 0000000000000000
2020-11-30 09:29:44.492326 notice [kernel:] [425726.832326] sd 18:0:0:0:
[sdt] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB)
2020-11-30 09:29:44.496308 notice [kernel:] [425726.836252] sd 18:0:0:0:
[sdt] Write Protect is off
2020-11-30 09:29:44.496313 debug [kernel:] [425726.836255] sd 18:0:0:0:
[sdt] Mode Sense: 83 00 10 08
2020-11-30 09:29:44.500310 notice [kernel:] [425726.838436] sd 18:0:0:0:
[sdt] Write cache: enabled, read cache: enabled, supports DPO and FUA
2020-11-30 09:29:44.500315 notice [kernel:] [425726.838573] scsi
18:0:0:1: Direct-Access DELL MD32xxi 0784 PQ: 0 ANSI: 5
2020-11-30 09:29:44.501437 notice [kernel:] [425726.842314] scsi
18:0:0:1: rdac: LUN 1 (IOSHIP) (owned)
2020-11-30 09:29:44.501442 notice [kernel:] [425726.842501] sd 18:0:0:1:
Attached scsi generic sg31 type 0
2020-11-30 09:29:44.501442 notice [kernel:] [425726.842502] sd 18:0:0:1:
Embedded Enclosure Device
2020-11-30 09:29:44.501443 warning [kernel:] [425726.842688] sd
18:0:0:1: Mode parameters changed
2020-11-30 09:29:44.512325 info [kernel:] [425726.852905] sd 18:0:0:1:
[ses_intf_add]:777 sdev: ffff8027ca162000 edev:ffff80271459d400 com:
0 edev->component[0].scratch: 0000000000000000
2020-11-30 09:29:44.512335 notice [kernel:] [425726.853249] sd 18:0:0:1:
[sdu] 104857600 512-byte logical blocks: (53.7 GB/50.0 GiB)
2020-11-30 09:29:44.514333 notice [kernel:] [425726.853778] sd 18:0:0:1:
[sdu] Write Protect is off
2020-11-30 09:29:44.514338 debug [kernel:] [425726.853780] sd 18:0:0:1:
[sdu] Mode Sense: 83 00 10 08
2020-11-30 09:29:44.514339 notice [kernel:] [425726.853908] scsi
18:0:0:31: Direct-Access DELL Universal Xport 0784 PQ: 0 ANSI: 5
2020-11-30 09:29:44.514340 notice [kernel:] [425726.854524] sd 18:0:0:1:
[sdu] Write cache: enabled, read cache: enabled, supports DPO and FUA
2020-11-30 09:29:44.514340 notice [kernel:] [425726.855140] sd 18:0:0:0:
[sdt] Attached SCSI disk
2020-11-30 09:29:44.514341 notice [kernel:] [425726.855221] scsi
18:0:0:31: Attached scsi generic sg32 type 0
2020-11-30 09:29:44.514342 notice [kernel:] [425726.855223] scsi
18:0:0:31: Embedded Enclosure Device
2020-11-30 09:29:44.514383 warning [kernel:] [425726.855571] scsi
18:0:0:31: Power-on or device reset occurred
2020-11-30 09:29:44.514383 err [kernel:] [425726.855587] scsi 18:0:0:31:
Failed to get diagnostic page 0x1
2020-11-30 09:29:44.514384 err [kernel:] [425726.855594] scsi 18:0:0:31:
Failed to bind enclosure -19
2020-11-30 09:29:44.514388 info [kernel:] [425726.855599] scsi
18:0:0:31: sdev: ffff8027ca173000
2020-11-30 09:29:44.524329 notice [kernel:] [425726.863747] sd 18:0:0:1:
[sdu] Attached SCSI disk
2020-11-30 09:29:44.764335 err [kernel:] [425727.103405] I/O scheduler
mq-deadline not found
2020-11-30 09:29:44.888323 err [kernel:] [425727.228994] I/O scheduler
mq-deadline not found
2020-11-30 09:29:45.001274 warning [kernel:] [425727.342284]
device-mapper: multipath round-robin: repeat_count > 1 is deprecated,
using 1 instead
2020-11-30 09:31:41.116334 info [kernel:] [425843.460697] scsi host19:
iSCSI Initiator over TCP/IP
2020-11-30 09:31:41.356325 notice [kernel:] [425843.700556] scsi
19:0:0:0: Direct-Access DELL MD32xxi 0784 PQ: 0 ANSI: 5
2020-11-30 09:31:41.360325 notice [kernel:] [425843.704270] scsi
19:0:0:0: rdac: LUN 0 (IOSHIP) (owned)
2020-11-30 09:31:41.360334 notice [kernel:] [425843.704478] sd 19:0:0:0:
Attached scsi generic sg33 type 0
2020-11-30 09:31:41.360334 notice [kernel:] [425843.704480] sd 19:0:0:0:
Embedded Enclosure Device
2020-11-30 09:31:41.360335 warning [kernel:] [425843.704732] sd
19:0:0:0: Mode parameters changed
2020-11-30 09:31:41.380310 info [kernel:] [425843.723464] sd 19:0:0:0:
[ses_intf_add]:777 sdev: ffff8027c8984000 edev:ffff802723f68400 com:
0 edev->component[0].scratch: 0000000000000000
2020-11-30 09:31:41.380317 notice [kernel:] [425843.724232] sd 19:0:0:0:
[sdv] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB)
2020-11-30 09:31:41.384308 notice [kernel:] [425843.729742] sd 19:0:0:0:
[sdv] Write Protect is off
2020-11-30 09:31:41.384313 debug [kernel:] [425843.729745] sd 19:0:0:0:
[sdv] Mode Sense: 83 00 10 08
2020-11-30 09:31:41.388310 notice [kernel:] [425843.730436] sd 19:0:0:0:
[sdv] Write cache: enabled, read cache: enabled, supports DPO and FUA
2020-11-30 09:31:41.388316 notice [kernel:] [425843.730557] scsi
19:0:0:1: Direct-Access DELL MD32xxi 0784 PQ: 0 ANSI: 5
2020-11-30 09:31:41.392312 notice [kernel:] [425843.736948] scsi
19:0:0:1: rdac: LUN 1 (IOSHIP) (owned)
2020-11-30 09:31:41.392317 notice [kernel:] [425843.737146] sd 19:0:0:1:
Attached scsi generic sg34 type 0
2020-11-30 09:31:41.392317 notice [kernel:] [425843.737148] sd 19:0:0:1:
Embedded Enclosure Device
2020-11-30 09:31:41.392318 warning [kernel:] [425843.737829] sd
19:0:0:1: Mode parameters changed
2020-11-30 09:31:41.404316 info [kernel:] [425843.749482] sd 19:0:0:1:
[ses_intf_add]:777 sdev: ffff8027c899e000 edev:ffff802723f6e800 com:
0 edev->component[0].scratch: 0000000000000000
2020-11-30 09:31:41.404326 notice [kernel:] [425843.749862] sd 19:0:0:1:
[sdw] 104857600 512-byte logical blocks: (53.7 GB/50.0 GiB)
2020-11-30 09:31:41.408331 notice [kernel:] [425843.750597] sd 19:0:0:1:
[sdw] Write Protect is off
2020-11-30 09:31:41.408346 debug [kernel:] [425843.750598] sd 19:0:0:1:
[sdw] Mode Sense: 83 00 10 08
2020-11-30 09:31:41.408347 notice [kernel:] [425843.751325] scsi
19:0:0:31: Direct-Access DELL Universal Xport 0784 PQ: 0 ANSI: 5
2020-11-30 09:31:41.408347 notice [kernel:] [425843.751979] sd 19:0:0:1:
[sdw] Write cache: enabled, read cache: enabled, supports DPO and FUA
2020-11-30 09:31:41.408348 notice [kernel:] [425843.753105] scsi
19:0:0:31: Attached scsi generic sg35 type 0
2020-11-30 09:31:41.408349 notice [kernel:] [425843.753109] scsi
19:0:0:31: Embedded Enclosure Device
2020-11-30 09:31:41.408350 notice [kernel:] [425843.753165] sd 19:0:0:0:
[sdv] Attached SCSI disk
2020-11-30 09:31:41.408351 warning [kernel:] [425843.753377] scsi
19:0:0:31: Power-on or device reset occurred
2020-11-30 09:31:41.408352 err [kernel:] [425843.753388] scsi 19:0:0:31:
Failed to get diagnostic page 0x1
2020-11-30 09:31:41.408352 err [kernel:] [425843.753390] scsi 19:0:0:31:
Failed to bind enclosure -19
2020-11-30 09:31:41.408353 info [kernel:] [425843.753391] scsi
19:0:0:31: sdev: ffff8027c899c000
2020-11-30 09:31:41.416332 notice [kernel:] [425843.759795] sd 19:0:0:1:
[sdw] Attached SCSI disk


# cat /sys/class/enclosure/18\:0\:0\:0/components
0
# cat /sys/class/enclosure/18\:0\:0\:1/components
0
# cat /sys/class/enclosure/19\:0\:0\:0/components
0
# cat /sys/class/enclosure/19\:0\:0\:1/components
0

# sg_ses -e
Diagnostic pages, followed by abbreviation(s) then page code:
Supported Diagnostic Pages [sdp] [0x0]
Configuration (SES) [cf] [0x1]
Enclosure Status/Control (SES) [ec,es] [0x2]
Help Text (SES) [ht] [0x3]
String In/Out (SES) [str] [0x4]
Threshold In/Out (SES) [th] [0x5]
Array Status/Control (SES, obsolete) [ac,as] [0x6]
Element Descriptor (SES) [ed] [0x7]
Short Enclosure Status (SES) [ses] [0x8]
Enclosure Busy (SES-2) [eb] [0x9]
Additional Element Status (SES-2) [aes] [0xa]
Subenclosure Help Text (SES-2) [sht] [0xb]
Subenclosure String In/Out (SES-2) [sstr] [0xc]
Supported SES Diagnostic Pages (SES-2) [ssp] [0xd]
Download Microcode (SES-2) [dm] [0xe]
Subenclosure Nickname (SES-2) [snic] [0xf]
Protocol Specific (SAS transport) [] [0x3f]
Translate Address (SBC) [] [0x40]
Device Status (SBC) [] [0x41]
Rebuild Assist (SBC) [] [0x42]

SES element type names, followed by abbreviation and element type code:
Unspecified [un] [0x0]
Device slot [dev] [0x1]
Power supply [ps] [0x2]
Cooling [coo] [0x3]
Temperature sensor [ts] [0x4]
Door [do] [0x5]
Audible alarm [aa] [0x6]
Enclosure services controller electronics [esc] [0x7]
SCC controller electronics [sce] [0x8]
Nonvolatile cache [nc] [0x9]
Invalid operation reason [ior] [0xa]
Uninterruptible power supply [ups] [0xb]
Display [dis] [0xc]
Key pad entry [kpe] [0xd]
Enclosure [enc] [0xe]
SCSI port/transceiver [sp] [0xf]
Language [lan] [0x10]
Communication port [cp] [0x11]
Voltage sensor [vs] [0x12]
Current sensor [cs] [0x13]
SCSI target port [stp] [0x14]
SCSI initiator port [sip] [0x15]
Simple subenclosure [ss] [0x16]
Array device slot [arr] [0x17]
SAS expander [sse] [0x18]
SAS connector [ssc] [0x19]

# sg_ses /dev/sdu
DELL MD32xxi 0784
disk device has EncServ bit set
Supported diagnostic pages:
Supported Diagnostic Pages [sdp] [0x0]
Configuration (SES) [cf] [0x1]
Enclosure Status/Control (SES) [ec,es] [0x2]
Array Status/Control (SES, obsolete) [ac,as] [0x6]

# sg_ses -p cf /dev/sdu
DELL MD32xxi 0784
disk device has EncServ bit set
Configuration diagnostic page:
number of secondary subenclosures: 0
generation code: 0x12c
enclosure descriptor list
Subenclosure identifier: 0 (primary)
relative ES process id: 0, number of ES processes: 0
number of type descriptor headers: 5
enclosure logical identifier (hex): 0000000000000000
enclosure vendor: DELL product: MD32xxi rev: 0784
vendor-specific data:
30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
00 00 00 00
type descriptor header/text list
Element type: Device slot, subenclosure id: 0
number of possible elements: 12
Element type: Power supply, subenclosure id: 0
number of possible elements: 2
Element type: Cooling, subenclosure id: 0
number of possible elements: 4
Element type: Temperature sensor, subenclosure id: 0
number of possible elements: 4
Element type: SCC controller electronics, subenclosure id: 0
number of possible elements: 1

I'm not goot at ses, but it seems that ses does not get the right
component count

>>
>> Regards,
>>
>> James
>>
>> ---8>8>8><8<8<8--------
>> From: James Bottomley <[email protected]>
>> Subject: [PATCH] scsi: ses: don't attach if enclosure has no components
>>
>> An enclosure with no components can't usefully be operated by the
>> driver (since effectively it has nothing to manage), so report the
>> problem and don't attach.  Not attaching also fixes an oops which
>> could occur if the driver tries to manage a zero component enclosure.
>>
>> Reported-by: Ding Hui <[email protected]>
>> Cc: [email protected]
>> Signed-off-by: James Bottomley <[email protected]>
>> ---
>>   drivers/scsi/ses.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
>> index c2afba2a5414..9624298b9c89 100644
>> --- a/drivers/scsi/ses.c
>> +++ b/drivers/scsi/ses.c
>> @@ -690,6 +690,11 @@ static int ses_intf_add(struct device *cdev,
>>               type_ptr[0] == ENCLOSURE_COMPONENT_ARRAY_DEVICE)
>>               components += type_ptr[1];
>>       }
>> +    if (components == 0) {
>> +        sdev_printk(KERN_ERR, sdev, "enclosure has no enumerated
>> components\n");
>> +        goto err_free;
>> +    }
>> +
>>       ses_dev->page1 = buf;
>>       ses_dev->page1_len = len;
>>       buf = NULL;
>>
>


--
Thanks,
- Ding Hui

2020-12-01 01:53:58

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] scsi: ses: Fix crash caused by kfree an invalid pointer

On Mon, 2020-11-30 at 10:26 +0800, Ding Hui wrote:
[...]
> sg_ses -e
> Diagnostic pages, followed by abbreviation(s) then page code:
> Supported Diagnostic Pages [sdp] [0x0]
> Configuration (SES) [cf] [0x1]
> Enclosure Status/Control (SES) [ec,es] [0x2]
> Help Text (SES) [ht] [0x3]
>
> # sg_ses -p cf /dev/sdu
> DELL MD32xxi 0784
> disk device has EncServ bit set
> Configuration diagnostic page:
> number of secondary subenclosures: 0
> generation code: 0x12c
> enclosure descriptor list
> Subenclosure identifier: 0 (primary)
> relative ES process id: 0, number of ES processes: 0
> number of type descriptor headers: 5
> enclosure logical identifier (hex): 0000000000000000
> enclosure vendor: DELL product: MD32xxi rev:
> 0784
> vendor-specific data:
> 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
> 00 00 00 00
> type descriptor header/text list
> Element type: Device slot, subenclosure id: 0
> number of possible elements: 12
> Element type: Power supply, subenclosure id: 0
> number of possible elements: 2
> Element type: Cooling, subenclosure id: 0
> number of possible elements: 4
> Element type: Temperature sensor, subenclosure id: 0
> number of possible elements: 4
> Element type: SCC controller electronics, subenclosure id: 0
> number of possible elements: 1
>
> I'm not goot at ses, but it seems that ses does not get the right
> component count

Actually there is a possible explanation. Your kernel log has this in
the middle:

> 2020-11-30 09:31:41.360334 notice [kernel:] [425843.704480] sd
> 19:0:0:0: Embedded Enclosure Device
> 2020-11-30 09:31:41.360335 warning [kernel:] [425843.704732] sd
> 19:0:0:0: Mode parameters changed

That "Mode parameters changed" implies that what the kernel read first
time around may not be the actual configuration of the enclosure. In
particular, the generation code being 0x12c is also an indicator things
may have changed. My theory is when the kernel boots the device is
returning 0 for most of the possible elements, but it changes this at a
later stage. One way to verify would be to compile ses as modular but
blacklist it so it can't be inserted, then do sg_ses -p to get the
enclosure and then insert the ses module to see if the two agree on the
components. Unfortunately, even if that's successful, figuring out
what we have to do to the enclosure to get it to finish its internal
scanning may not be so easy.

James


2021-03-18 12:34:13

by Ding Hui

[permalink] [raw]
Subject: Re: [PATCH] scsi: ses: Fix crash caused by kfree an invalid pointer

On 2020/11/29 7:27, James Bottomley wrote:
> ---8>8>8><8<8<8--------
> From: James Bottomley <[email protected]>
> Subject: [PATCH] scsi: ses: don't attach if enclosure has no components
>
> An enclosure with no components can't usefully be operated by the
> driver (since effectively it has nothing to manage), so report the
> problem and don't attach. Not attaching also fixes an oops which
> could occur if the driver tries to manage a zero component enclosure.
>
> Reported-by: Ding Hui <[email protected]>
> Cc: [email protected]
> Signed-off-by: James Bottomley <[email protected]>
> ---
> drivers/scsi/ses.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
> index c2afba2a5414..9624298b9c89 100644
> --- a/drivers/scsi/ses.c
> +++ b/drivers/scsi/ses.c
> @@ -690,6 +690,11 @@ static int ses_intf_add(struct device *cdev,
> type_ptr[0] == ENCLOSURE_COMPONENT_ARRAY_DEVICE)
> components += type_ptr[1];
> }
> + if (components == 0) {
> + sdev_printk(KERN_ERR, sdev, "enclosure has no enumerated components\n");
> + goto err_free;
> + }
> +
> ses_dev->page1 = buf;
> ses_dev->page1_len = len;
> buf = NULL;
>
Can I ask you to resubmit your patch ("scsi: ses: don't attach if
enclosure has no components") to kernel, thanks

--
Thanks,
- Ding Hui