2017-06-11 19:44:04

by Jérôme Carretero

[permalink] [raw]
Subject: arcmsr: abort device command message weirdness (LUN mismatch)

Hi Ching,


Context: when a drive finally failed in my JBOD array, I discovered that
the whole ARC1880X controller would timeout, disabling access to any drive,
which is kind of sad.
I've performed a firmware upgrade and added back the failing drive to see
what happens with a newer device firmware (to be continued).

While doing "cat /dev/${FAILING_DRIVE}", at some point the command fails
(as expected) and while looking at the system logs, I observed that there
were reports of 2 abort sequences initiated then completed, but the
completion message mentions a LUN that is not the one of the failing drive,
which is curious.

[ 959.065760] arcmsr0: abort device command of scsi id = 0 lun = 4
[ 961.804842] arcmsr0: abort device command of scsi id = 0 lun = 4

...

[ 991.834471] arcmsr0: abort device command of scsi id = 0 lun = 0
[ 991.840503] arcmsr0: scsi id = 0 lun = 0 ccb = '0xffff8808594a6b80' poll command abort successfully
[ 991.849675] arcmsr0: scsi id = 0 lun = 4 ccb = '0xffff880859424600' poll command abort successfully
[ 991.858869] arcmsr: executing bus reset eh.....num_resets = 0, num_aborts = 3
[ 991.866199] arcmsr0: executing hw bus reset .....
[ 1005.135825] Areca RAID Controller0: Model ARC-1880, F/W V1.54 2016-11-23
[ 1005.229790] arcmsr: scsi bus reset eh returns with success
[ 1019.145652] sd 0:0:0:4: [sdac] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 1019.154095] sd 0:0:0:4: [sdac] tag#0 Sense Key : Medium Error [current]
[ 1019.160876] sd 0:0:0:4: [sdac] tag#0 Add. Sense: Unrecovered read error
[ 1019.167512] sd 0:0:0:4: [sdac] tag#0 CDB: Read(10) 28 00 00 04 36 00 00 02 00 00
[ 1019.174920] blk_update_request: I/O error, dev sdac, sector 275968

(kernel 4.12.0-rc4-00310-g6b7ed4588ce6).


Regards,

--
Jérôme