I was playing around with Kurt Garloff's rescan-scsi-bus.sh, more
precisely, I moved an entire bus from my Tekram DC-390U (with sym53c8xx
driver) to my Tekram DC-390 (with tmscsim driver). Then, I ran that
script which effectively sends a lot of stuff to /proc/scsi/scsi (scsi
add-single-device).
I ran that script several times since it did not collect all devices,
and at one time, I got two oopsen that I decoded.
The script is available at: http://www.garloff.de/kurt/linux/
The Kernel is a patched 2.2.17, but I think it has no SCSI-related
patches apart from an updated (2.0e3) tmscsim driver. Here's the full patch
list:
* ide.2.2.17.all.20000904
* i?c 2.5.4
* lm_sensors 2.5.4
* dc390 2.0e3
* serial 5.05
* NFSv3: linux-2.2.17-nfsv3-0.23.1.dif dhiggen-over-0.23.1 restore_rsize
* ReiserFS v3.5.27
These are the oopses:
ksymoops 2.3.5 on i586 2.2.17ma3. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.2.17ma3/ (default)
-m /boot/System.map-2.2.17ma3 (specified)
Warning (compare_maps): ksyms_base symbol module_list_R__ver_module_list not found in System.map. Ignoring ksyms_base entry
Unable to handle kernel NULL pointer dereference at virtual address 00000052
current->tss.cr3 = 05463000, %%cr3 = 05463000
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0010:[sr_finish+112/388]
EFLAGS: 00010206
eax: 00000000 ebx: 00000054 ecx: 00000005 edx: c7f9cc00
esi: c267bd50 edi: c7f9cc40 ebp: 00000001 esp: c267bd3c
ds: 0018 es: 0018 ss: 0018
Process rescan-scsi-bus (pid: 18301, process nr: 122, stackpage=c267b000)
Stack: 00000000 c7e3c000 c02a9980 c267bd4c 00307273 c01fedaf c33a0820 00000008
c010a058 c01fedf3 00000000 00000000 c7e3c000 00000001 c5902140 c02caee0
c5902148 c0132793 c5902140 c267bda4 c267bda4 c0001a00 00020000 c5780280
Call Trace: [scan_scsis+491/1076] [common_interrupt+24/32] [scan_scsis+559/1076] [get_new_inode+147/312] [vsprintf+720/764] [wake_up_process+64/76] [__wake_up+59/68]
Code: 80 48 52 20 a1 ac 99 2a c0 80 4c 18 16 08 a1 ac 99 2a c0 80
Using defaults from ksymoops -t elf32-i386 -a i386
Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 80 48 52 20 orb $0x20,0x52(%eax)
Code; 00000004 Before first symbol
4: a1 ac 99 2a c0 mov 0xc02a99ac,%eax
Code; 00000009 Before first symbol
9: 80 4c 18 16 08 orb $0x8,0x16(%eax,%ebx,1)
Code; 0000000e Before first symbol
e: a1 ac 99 2a c0 mov 0xc02a99ac,%eax
Code; 00000013 Before first symbol
13: 80 00 00 addb $0x0,(%eax)
Unable to handle kernel NULL pointer dereference at virtual address 00000032
current->tss.cr3 = 02394000, %%cr3 = 02394000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[proc_print_scsidevice+35/504]
EFLAGS: 00010246
eax: 00000000 ebx: c2577d40 ecx: c154f328 edx: c2577d40
esi: 00000329 edi: c7e3c000 ebp: c154f000 esp: c0551e78
ds: 0018 es: 0018 ss: 0018
Process cat (pid: 18466, process nr: 140, stackpage=c0551000)
Stack: 00000000 00000002 00000000 c2577d40 00000329 c7e3c000 c154f000 c154f000
c0271d8a c33a086a c33a0852 c4ae0ac0 00000212 c1f564c0 c0200e21 c2577d40
c154f000 c0551ef0 00000329 c11462a0 00000100 00000000 00000c00 c1f56380
Call Trace: [scsi_device_types+3946/5792] [scsi_proc_info+121/1876] [do_anonymous_page+45/128] [do_no_page+54/212] [dispatch_scsi_info+64/188] [proc_readscsi+150/268] [proc_readscsi+45/268]
Code: 66 8b 40 32 25 ff ff 00 00 50 68 20 5c 27 c0 8b 5c 24 50 03
Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 66 8b 40 32 mov 0x32(%eax),%ax
Code; 00000004 Before first symbol
4: 25 ff ff 00 00 and $0xffff,%eax
Code; 00000009 Before first symbol
9: 50 push %eax
Code; 0000000a Before first symbol
a: 68 20 5c 27 c0 push $0xc0275c20
Code; 0000000f Before first symbol
f: 8b 5c 24 50 mov 0x50(%esp,1),%ebx
Code; 00000013 Before first symbol
13: 03 00 add (%eax),%eax
1 warning issued. Results may not be reliable.
--
Matthias Andree
Hi Matthias,
On Wed, Nov 22, 2000 at 02:06:18AM +0100, Matthias Andree wrote:
> I ran that script several times since it did not collect all devices,
Strange.
> and at one time, I got two oopsen that I decoded.
> * dc390 2.0e3
> Warning (compare_maps): ksyms_base symbol module_list_R__ver_module_list not found in System.map. Ignoring ksyms_base entry
> Unable to handle kernel NULL pointer dereference at virtual address 00000052
> current->tss.cr3 = 05463000, %%cr3 = 05463000
> EIP: 0010:[sr_finish+112/388]
> Process rescan-scsi-bus (pid: 18301, process nr: 122, stackpage=c267b000)
> Call Trace: [scan_scsis+491/1076] [common_interrupt+24/32] [scan_scsis+559/1076] [get_new_inode+147/312] [vsprintf+720/764] [wake_up_process+64/76] [__wake_up+59/68]
> Code: 80 48 52 20 a1 ac 99 2a c0 80 4c 18 16 08 a1 ac 99 2a c0 80
> Using defaults from ksymoops -t elf32-i386 -a i386
>
> Code; 00000000 Before first symbol
> 00000000 <_EIP>:
???
While 2.0e3 contains a bug that can cause an OOps inside the driver (just
use the echo "INQUIRY 0" >/proc/scsi/tmscsim/?), the normal bus rescanning
should not be able to trigger it. The above looks like the bug is occuring
somewhere else.
Having said this, I'd like to ask you to try 2.0e6 of the tmscsim driver and
check whether you are able to reproduce the bug.
Thanks!
Regards,
--
Kurt Garloff <[email protected]> Eindhoven, NL
GPG key: See mail header, key servers Linux kernel development
SuSE GmbH, Nuernberg, FRG SCSI, Security
On Thu, 23 Nov 2000, Kurt Garloff wrote:
> While 2.0e3 contains a bug that can cause an OOps inside the driver (just
> use the echo "INQUIRY 0" >/proc/scsi/tmscsim/?), the normal bus rescanning
> should not be able to trigger it. The above looks like the bug is occuring
> somewhere else.
> Having said this, I'd like to ask you to try 2.0e6 of the tmscsim driver and
> check whether you are able to reproduce the bug.
My previous mail was a tad too early. After I moved the bus back to the
Tekram DC-390U (sym53c8xx driver), I caught an oops again, and
rescan-scsi-bus.sh caught a SIGSEGV. There seems to be a kernel bug somewhere
in that SCSI handling stuff:
Before the oops:
----------------
scsi singledevice 0 0 0 0
scsi singledevice 0 0 1 0
scsi singledevice 0 0 2 0
Vendor: PLEXTOR Model: CD-ROM PX-32TS Rev: 1.02
Type: CD-ROM ANSI SCSI revision: 02
Detected scsi CD-ROM sr3 at scsi0, channel 0, id 2, lun 0
scsi singledevice 0 0 3 0
Vendor: YAMAHA Model: CRW4416S Rev: 1.0j
Type: CD-ROM ANSI SCSI revision: 02
Detected scsi CD-ROM sr4 at scsi0, channel 0, id 3, lun 0
sr4: scsi3-mmc drive: 16x/16x writer cd/rw xa/form2 cdda tray
scsi singledevice 0 0 4 0
sym53c875-0-<4,*>: FAST-5 SCSI 5.0 MB/s (200 ns, offset 8)
Vendor: HP Model: HP35480A Rev: 1209
Type: Sequential-Access ANSI SCSI revision: 02
Detected scsi tape st2 at scsi0, channel 0, id 4, lun 0
scsi singledevice 0 0 5 0
Vendor: PLEXTOR Model: CD-ROM PX-20TS Rev: 1.00
Type: CD-ROM ANSI SCSI revision: 02
Detected scsi CD-ROM sr5 at scsi0, channel 0, id 5, lun 0
scsi singledevice 0 0 6 0
sym53c875-0-<6,*>: FAST-5 SCSI 4.0 MB/s (250 ns, offset 8)
Vendor: TANDBERG Model: TDC 4222 Rev: =07:
Type: Sequential-Access ANSI SCSI revision: 02
Detected scsi tape st3 at scsi0, channel 0, id 6, lun 0
scsi singledevice 0 0 7 0
scsi singledevice 1 0 0 0
scsi singledevice 1 0 1 0
scsi singledevice 1 0 2 0
scsi singledevice 1 0 2 0
scsi singledevice 1 0 3 0
scsi singledevice 1 0 3 0
scsi singledevice 1 0 4 0
scsi singledevice 1 0 4 0
scsi singledevice 1 0 5 0
scsi singledevice 1 0 5 0
scsi singledevice 1 0 6 0
scsi singledevice 1 0 6 0
scsi singledevice 1 0 7 0
scsi singledevice 2 0 0 0
scsi singledevice 2 0 1 0
scsi singledevice 2 0 2 0
scsi singledevice 2 0 3 0
scsi singledevice 2 0 4 0
scsi singledevice 2 0 5 0
scsi singledevice 2 0 6 0
scsi singledevice 2 0 7 0
scsi singledevice 0 0 0 0
scsi singledevice 0 0 1 0
scsi singledevice 0 0 2 0
Vendor: PLEXTOR Model: CD-ROM PX-32TS Rev: 1.02
Type: CD-ROM ANSI SCSI revision: 02
Detected scsi CD-ROM sr2 at scsi0, channel 0, id 2, lun 0
Oops ran through ksymoops:
--------------------------
Unable to handle kernel NULL pointer dereference at virtual address 00000052
current->tss.cr3 = 07106000, %%cr3 = 07106000
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0010:[sr_finish+112/388]
EFLAGS: 00010206
eax: 00000000 ebx: 00000054 ecx: 00000005 edx: c7f9cc00
esi: c629bd50 edi: c7f9cc40 ebp: 00000001 esp: c629bd3c
ds: 0018 es: 0018 ss: 0018
Process rescan-scsi-bus (pid: 2846, process nr: 23, stackpage=c629b000)
Stack: 00000000 c7f88340 c02a9960 c629bd4c 00307273 c01fedaf c7f88700 c7555de0
c629becc c01fedf3 00000000 00000000 c7f88340 00000000 c6e132c0 c02cb0e0
c6e132c8 c0132793 c6e132c0 c629bda4 c629bda4 c0001a00 00020000 c63532c0
Call Trace: [scan_scsis+491/1076] [scan_scsis+559/1076] [get_new_inode+147/312] [iget4+121/132] [vsprintf+720/764] [wake_up_process+64/76] [__wake_up+59/68]
Code: 80 48 52 20 a1 8c 99 2a c0 80 4c 18 16 08 a1 8c 99 2a c0 80
Using defaults from ksymoops -t elf32-i386 -a i386
Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 80 48 52 20 orb $0x20,0x52(%eax)
Code; 00000004 Before first symbol
4: a1 8c 99 2a c0 mov 0xc02a998c,%eax
Code; 00000009 Before first symbol
9: 80 4c 18 16 08 orb $0x8,0x16(%eax,%ebx,1)
Code; 0000000e Before first symbol
e: a1 8c 99 2a c0 mov 0xc02a998c,%eax
Code; 00000013 Before first symbol
13: 80 00 00 addb $0x0,(%eax)
Kurt Garloff schrieb am Donnerstag, den 23. November 2000:
> On Wed, Nov 22, 2000 at 02:06:18AM +0100, Matthias Andree wrote:
> > I ran that script several times since it did not collect all devices,
>
> Strange.
>
> While 2.0e3 contains a bug that can cause an OOps inside the driver (just
> use the echo "INQUIRY 0" >/proc/scsi/tmscsim/?), the normal bus rescanning
> should not be able to trigger it. The above looks like the bug is occuring
> somewhere else.
> Having said this, I'd like to ask you to try 2.0e6 of the tmscsim driver and
> check whether you are able to reproduce the bug.
Good thing first: I can't trigger the oopses with 2.0e6. I tried hard, didn't
manage to.
However, trying to trigger a race condition, I found that your script suffers
from too small a kernel API in this place, since it uses the
remove-single-device and all-single-devices in a non-locked manner, so if you
run several instances of your script in parallel (simple rescan-scsi-bus.sh -r
& will do) it will give a mess and may leave you with missing devices.
I don't currently recall if your web site states that your script may only be
run once at the same time; lock files would be nice :-]
To recall my devices:
#2 Plextor PX-32TS 1.02
#3 Yamaha CRW4416S 1.0j
#4 HP 35480A 12.09
#5 Plextor PX-20TS 1.00
#6 Tandberg TDC4222
The first run of your rescan-scsi-script showed that the "scsi
singledevice 1 0 2 0" command was aborted due to timeout, and the
Plextor PX-32TS is known to require at least 5 s settling time after a
bus reset in my Tekram DC-390U (SYM53C875) BIOS, so it may actually be the
drive itself that gives us headaches.
Result of first rescan-scsi-bus.sh -r:
all 5 devices removed, and only #3 ... #6 added (#2, Plextor PX-32
missing)
During scanning, I got these (the first two aborts and the SCSI bus reset look
right since I hotplugged the entire bus to the other adaptor)
scsi singledevice 1 0 2 0
scsi : aborting command due to timeout : pid 46, scsi1, channel 0, id 2, lun 0 Inquiry 00 00 00 ff 00
DC390: Abort command (pid 46, Device 02-00)
DC390: SRB: Xferred 00000000, Remain 00000100, State 00000100, Phase 01
DC390: AdpaterStatus: 00, SRB Status 00
DC390: Status of last IRQ (DMA/SC/Int/IRQ): 00818418
DC390: Register dump: SCSI block:
DC390: XferCnt Cmd Stat IntS IRQS FFIS Ctl1 Ctl2 Ctl3 Ctl4
DC390: 0000ff 90 01 c4 00 81 17 48 08 84
DC390: FIFO: 20
DC390: Register dump: DMA engine:
DC390: Cmd STrCnt SBusA WrkBC WrkAC Stat SBusCtrl
DC390: 83 00000100 071c5da4 00000100 071c5da4 00 03184200
DC390: Register dump: PCI Status: c200
DC390: In case of driver trouble read linux/drivers/scsi/README.tmscsim
DC390: Abort current command (pid 46, SRB c7e3c13c)
DC390: Aborted pid 46 with status 3
scsi : aborting command due to timeout : pid 46, scsi1, channel 0, id 2, lun 0 Inquiry 00 00 00 ff 00
DC390: Abort command (pid 46, Device 02-00)
DC390: SRB: Xferred 00000000, Remain 00000100, State 00000100, Phase 01
DC390: AdpaterStatus: 00, SRB Status 00
DC390: Status of last IRQ (DMA/SC/Int/IRQ): 00818418
DC390: Register dump: SCSI block:
DC390: XferCnt Cmd Stat IntS IRQS FFIS Ctl1 Ctl2 Ctl3 Ctl4
DC390: 0000ff 90 01 c4 00 80 17 48 08 84
DC390: Register dump: DMA engine:
DC390: Cmd STrCnt SBusA WrkBC WrkAC Stat SBusCtrl
DC390: 83 00000100 071c5da4 00000100 071c5da4 00 03184200
DC390: Register dump: PCI Status: c200
DC390: In case of driver trouble read linux/drivers/scsi/README.tmscsim
DC390: Abort current command (pid 46, SRB c7e3c13c)
DC390: Aborted pid 46 with status 3
SCSI host 1 abort (pid 46) timed out - resetting
SCSI bus is being reset for host 1 channel 0.
DC390: RESET ... done
scsi singledevice 1 0 3 0
DC390: Illegal Operation detected (00c38418)!
DC390: SRB: Xferred 00000000, Remain 00000000, State 00000040, Phase 05
DC390: AdpaterStatus: 00, SRB Status 00
DC390: Status of last IRQ (DMA/SC/Int/IRQ): 00c38418
DC390: Register dump: SCSI block:
DC390: XferCnt Cmd Stat IntS IRQS FFIS Ctl1 Ctl2 Ctl3 Ctl4
DC390: 0000ff 42 03 84 00 80 17 48 08 84
DC390: Register dump: DMA engine:
DC390: Cmd STrCnt SBusA WrkBC WrkAC Stat SBusCtrl
DC390: 00 00000100 071c5da4 00000100 071c5da4 00 031a4602
DC390: Register dump: PCI Status: c200
DC390: In case of driver trouble read linux/drivers/scsi/README.tmscsim
A subsequent rescan-scsi-bus.sh will find the PX-32TS and add it, this time,
without bus resets and aborts.
Should the reset->inquiry delay be applied after ANY reset? Is it actually
applied but too short for my Plextor?
On Mon, Nov 27, 2000 at 08:57:03AM +0100, Matthias Andree wrote:
> Good thing first: I can't trigger the oopses with 2.0e6. I tried hard, didn't
> manage to.
Nice to learn.
> However, trying to trigger a race condition, I found that your script suffers
> from too small a kernel API in this place, since it uses the
> remove-single-device and all-single-devices in a non-locked manner, so if you
> run several instances of your script in parallel (simple rescan-scsi-bus.sh -r
> & will do) it will give a mess and may leave you with missing devices.
>
> I don't currently recall if your web site states that your script may only be
> run once at the same time; lock files would be nice :-]
Well, it's one of these root-only admin tools.
I assume, root knows about what (s)he's doing.
So, I'll add some line to the documentation that tells you not to run the
scripts in parallel. Locking is really overkill, IMHO.
> A subsequent rescan-scsi-bus.sh will find the PX-32TS and add it, this time,
> without bus resets and aborts.
>
> Should the reset->inquiry delay be applied after ANY reset? Is it actually
> applied but too short for my Plextor?
Well, the default for tmscsim is 1.5s time after a reset before it start
command processing again. This can be changed by echoing DelayReset values tp
the proc file. If your Plextor requires 5s, just do
echo "delayreset 5" >/proc/scsi/tmscsim/? and the driver will wait for 5.5s
...
Regards,
--
Kurt Garloff <[email protected]> Eindhoven, NL
GPG key: See mail header, key servers Linux kernel development
SuSE GmbH, Nuernberg, FRG SCSI, Security
On Mon, Nov 27, 2000 at 09:10:05AM +0100, Matthias Andree wrote:
> My previous mail was a tad too early. After I moved the bus back to the
> Tekram DC-390U (sym53c8xx driver), I caught an oops again, and
> rescan-scsi-bus.sh caught a SIGSEGV. There seems to be a kernel bug somewhere
> in that SCSI handling stuff:
>
> Unable to handle kernel NULL pointer dereference at virtual address 00000052
> current->tss.cr3 = 07106000, %%cr3 = 07106000
> *pde = 00000000
> Oops: 0002
> CPU: 0
> EIP: 0010:[sr_finish+112/388]
> EFLAGS: 00010206
> eax: 00000000 ebx: 00000054 ecx: 00000005 edx: c7f9cc00
> esi: c629bd50 edi: c7f9cc40 ebp: 00000001 esp: c629bd3c
> ds: 0018 es: 0018 ss: 0018
> Process rescan-scsi-bus (pid: 2846, process nr: 23, stackpage=c629b000)
> Stack: 00000000 c7f88340 c02a9960 c629bd4c 00307273 c01fedaf c7f88700 c7555de0
> c629becc c01fedf3 00000000 00000000 c7f88340 00000000 c6e132c0 c02cb0e0
> c6e132c8 c0132793 c6e132c0 c629bda4 c629bda4 c0001a00 00020000 c63532c0
> Call Trace: [scan_scsis+491/1076] [scan_scsis+559/1076] [get_new_inode+147/312] [iget4+121/132] [vsprintf+720/764] [wake_up_process+64/76] [__wake_up+59/68]
> Code: 80 48 52 20 a1 8c 99 2a c0 80 4c 18 16 08 a1 8c 99 2a c0 80
> Using defaults from ksymoops -t elf32-i386 -a i386
This time, I do not suspect the low-level driver. It mey very well be, that
there's a subtile bug WRT add/remove-single-device housekeeping in the SCSI
layer.
You told that you did parallel rescan-scsi-bus.sh calls. That may have
confused the kernel. Did the above oops happen without such abuse?
Regards,
--
Kurt Garloff <[email protected]> [Eindhoven, NL]
Physics: Plasma simulations <[email protected]> [TU Eindhoven, NL]
Linux: SCSI, Security <[email protected]> [SuSE Nuernberg, FRG]
(See mail header or public key servers for PGP2 and GPG public keys.)