2010-08-27 13:12:09

by Justin Piszcz

[permalink] [raw]
Subject: 3w-9xxx: scsi0: WARNING: (0x06:0x0037): Character ioctl (0x108) timed out, resetting card.

Hello,

I sometimes see this on one of my two 3ware cards in my system.

I have opened a case previously with LSI/3ware and they are not sure what
is going on, all disks are good, etc.

I notice this sometimes occurs when there is high I/O & network activity,
but I am not able to reproduce it at will. I do remember it happening
before though just running a postmap on some files and rsyncing a few
configuration files around.

Originally I thought it might be something to do with having smartd
running, but I disabled it and the problem persists.

The next thing I am looking at is the 'nobarrier' option, I've always used
it in the past, I have now mounted all FS without it and I will see if the
error recurs.

Does anyone know what the root cause of this problem may be?

Aug 24 09:14:04 p34 kernel: [247295.662461] 3w-9xxx: scsi0: WARNING: (0x06:0x0037): Character ioctl (0x108) timed out, resetting card.

Latest kernel (2.6.35.x) and latest firmware on the card, etc.

Justin.


2010-08-27 16:29:46

by adam radford

[permalink] [raw]
Subject: Re: 3w-9xxx: scsi0: WARNING: (0x06:0x0037): Character ioctl (0x108) timed out, resetting card.

On Fri, Aug 27, 2010 at 6:12 AM, Justin Piszcz <[email protected]> wrote:

> Aug 24 09:14:04 p34 kernel: [247295.662461] 3w-9xxx: scsi0: WARNING:
> (0x06:0x0037): Character ioctl (0x108) timed out, resetting card.

Justin,

This just means the firmware didn't complete an ioctl in 60 seconds.

Can try changing this define in 3w-9xxx.h:

#define TW_IOCTL_CHRDEV_TIMEOUT 60 /* 60 seconds */

to 90 or 120 seconds to see if that fixes your issue?

Thanks,

-Adam

2010-08-27 17:10:44

by Justin Piszcz

[permalink] [raw]
Subject: Re: 3w-9xxx: scsi0: WARNING: (0x06:0x0037): Character ioctl (0x108) timed out, resetting card.



On Fri, 27 Aug 2010, adam radford wrote:

> Justin,
>
> This just means the firmware didn't complete an ioctl in 60 seconds.
>
> Can try changing this define in 3w-9xxx.h:
>
> #define TW_IOCTL_CHRDEV_TIMEOUT 60 /* 60 seconds */
>
> to 90 or 120 seconds to see if that fixes your issue?
>
> Thanks,
>
> -Adam
>

Hi Adam,

Thanks for responding!

LSI just got back to me on a second case, their latest beta firmware:
http://www.lsi.com/DistributionSystem/AssetDocument/readme.txt

SCR 2196: Unexpected controller soft resets
Fixed an issue with regards to deferral of write and read commands to
help eliminate unexpected soft resets.

Well well!

Plan of action:

1. See if the error recurs without 'nobarrier' - only because there are
some threads about problems when this is enabled on 3ware controllers.

2. LSI recommended I disconnect/replace my BBU module, I already did once
before, ~$120, so the next step I will apply their latest Beta firmware.

3. If it STILL does it, I can increase the timeout, HOWEVER, when I see
this, I/O on the system, everything locks up for 60-120-180-360 seconds
and then it comes back after it resets the controller. The weird thing is
this controller is only for a /data volume and not the root filesystem.
The root filesystem is on a separate RAID-1 4 port card (also w/BBU).

If you google that exact error, there is very little information, if I
find something that works, I'll update the list so others have something
to point to when they see this problem.

Justin.

2010-08-29 15:49:12

by Justin Piszcz

[permalink] [raw]
Subject: Re: 3w-9xxx: scsi0: WARNING: (0x06:0x0037): Character ioctl (0x108) timed out, resetting card.



On Fri, 27 Aug 2010, Justin Piszcz wrote:

Hi,

Current theory:

Mobo: Intel DP55KG

I had this:
#append="3w-9xxx.use_msi=1 reboot=a snd_hda_intel.enable_msi=1"

Now I am no longer using MSI for the 3ware driver, so far, the 'lag' /
controller reset problem has not occurred yet:
append="reboot=a snd_hda_intel.enable_msi=1"

I wish both cards did not use the same IRQ, however, if this solves the problem
then I'll be happy.

>From /proc/interrupts:

16: 44899 0 0 0 0 0 12418 0 IO-APIC-fasteoi 3w-9xxx, 3w-9xxx, ehci_hcd:usb1

So far I'm typing this e-mail on the client and I do not see any lag at this
time, it appears this has fixed my problem, I'll give it a few more hours/days
before I consider it 'fixed' but after changing all of the settings in 3dm2
and replacing/removing the BBUs, noticing no change, it appears to be an
interrupt problem when the 3ware cards are used with MSI on my motherboard.

Are there cases in which MSI should/should not be used? Generally I have it on
for all of my devices but perhaps the 3ware cards don't play well with it
on the Intel motherboard?

I am guessing the 'lag' issue causes the controller to reset when there is
too much I/O launched via cron (backups and such)..

We'll see if it happens again, I did briefly try the newest Beta firmware
as LSI recommended made no difference in terms of the lag issue.

I am still running with 'nobarrier' removed and have not gotten any controller
resets yet, but I had the lag issue as mentioned above until I removed the
MSI option from the 3w-9xxx driver.

Before removal of MSI option:

Drive Performance Monitor Configuration for /c1 ...
Performance Monitor: ON
Version: 1
Max commands for averaging: 100
Max latency commands to save: 10
Requested data: Instantaneous Drive Statistics

Queue Xfer Resp
Port Status Unit Depth IOPs Rate(MB/s) Time(ms)
------------------------------------------------------------------------
p0 OK u0 1 123 1.202 730148
p1 OK u0 1 123 1.203 644249
p2 OK u1 1 7 0.000 132
p3 NOT-PRESENT - - - - -

After removal:

Drive Performance Monitor Configuration for /c1 ...
Performance Monitor: ON
Version: 1
Max commands for averaging: 100
Max latency commands to save: 10
Requested data: Instantaneous Drive Statistics

Queue Xfer Resp
Port Status Unit Depth IOPs Rate(MB/s) Time(ms)
------------------------------------------------------------------------
p0 OK u0 1 53 3.124 85900
p1 OK u0 1 53 3.124 85900
p2 OK u1 0 0 0.000 0
p3 NOT-PRESENT - - - - -

..

This is what the problem looks like (sdb=root) as it happened twice (before
removal of the MSI option, I've not been able to reproduce it with the MSI
option disabled).

$ (output form dstat)
-dsk/total----dsk/sda-----dsk/sdb--
read writ: read writ: read writ
0 6884k: 0 0 : 0 6884k
0 4580k: 0 0 : 0 4580k
0 4728k: 0 0 : 0 4728k
0 3936k: 0 0 : 0 3936k
0 5764k: 0 0 : 0 5764k
0 1292k: 0 0 : 0 1292k
0 7760k: 0 0 : 0 7760k
0 5480k: 0 0 : 0 5480k
0 7408k: 0 0 : 0 7408k
0 5040k: 0 0 : 0 5040k
0 6236k: 0 0 : 0 6236k
0 788k: 0 0 : 0 788k
0 0 : 0 0 : 0 0
0 2900k: 0 0 : 0 2900k
0 8160k: 0 0 : 0 8160k
4096B 7916k: 0 0 :4096B 7916k
0 7812k: 0 0 : 0 7812k
0 4804k: 0 0 : 0 4804k
148k 7268k: 0 0 : 148k 7268k
0 888k: 0 0 : 0 888k
0 7688k: 0 0 : 0 7688k
0 7644k: 0 0 : 0 7644k
0 6352k: 0 0 : 0 6352k
-dsk/total----dsk/sda-----dsk/sdb--
read writ: read writ: read writ
0 5324k: 0 0 : 0 5324k
0 1432k: 0 0 : 0 1432k
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
-dsk/total----dsk/sda-----dsk/sdb--
read writ: read writ: read writ
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0 dddddddddd
0 8700k: 0 0 : 0 8700k
0 2076k: 0 0 : 0 2076k
0 5428k: 0 0 : 0 5428k
0 7984k: 0 0 : 0 7984k
0 8236k: 0 0 : 0 8236k
0 2536k: 0 0 : 0 2536k
0 232k: 0 0 : 0 232k
0 268k: 0 0 : 0 268k
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 508k: 0 0 : 0 508k
0 72k: 0 0 : 0 72k
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
4096B 232k: 0 0 :4096B 232k
344k 2380k: 0 0 : 344k 2380k
0 1172k: 0 0 : 0 1172k
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0
0 0 : 0 0 : 0 0 ^C^C^C
$


Justin.

2010-08-31 11:22:00

by Justin Piszcz

[permalink] [raw]
Subject: Re: 3w-9xxx: scsi0: WARNING: (0x06:0x0037): Character ioctl (0x108) timed out, resetting card.



On Sun, 29 Aug 2010, Justin Piszcz wrote:

>
>
> On Fri, 27 Aug 2010, Justin Piszcz wrote:


Problem persists during heavy I/O, different error this time:

[152904.636062] 3w-9xxx: scsi1: ERROR: (0x06:0x0036): Response queue (large) empty failed during reset sequence.
[152917.721865] 3w-9xxx: scsi1: AEN: INFO (0x04:0x0001): Controller reset occurred:resets=2.

--

syslog:Aug 6 05:58:19 p34 kernel: [ 1.763002] 3w-9xxx: scsi1: AEN: INFO (0x04:0x0001): Controller reset occurred:resets=1.
syslog:Aug 20 07:30:41 p34 kernel: [1171646.371290] 3w-9xxx: scsi0: WARNING: (0x06:0x0037): Character ioctl (0x108) timed out, resetting card.
syslog:Aug 24 09:04:11 p34 kernel: [246703.526388] 3w-9xxx: scsi0: WARNING: (0x06:0x0037): Character ioctl (0x108) timed out, resetting card.
syslog:Aug 24 09:14:04 p34 kernel: [247295.662461] 3w-9xxx: scsi0: WARNING: (0x06:0x0037): Character ioctl (0x108) timed out, resetting card.
syslog:Aug 26 21:05:22 p34 kernel: [462727.945313] 3w-9xxx: scsi0: WARNING: (0x06:0x0037): Character ioctl (0x108) timed out, resetting card.
syslog:Aug 27 00:39:43 p34 kernel: [475586.575030] 3w-9xxx: scsi0: WARNING: (0x06:0x0037): Character ioctl (0x108) timed out, resetting card.
syslog:Aug 31 05:53:03 p34 kernel: [152904.636062] 3w-9xxx: scsi1: ERROR: (0x06:0x0036): Response queue (large) empty failed during reset sequence.
syslog:Aug 31 05:53:03 p34 kernel: [152917.721865] 3w-9xxx: scsi1: AEN: INFO (0x04:0x0001): Controller reset occurred:resets=2.

--

It started happening on August 6th, this is around the same time I upgraded
to the 2.6.35 kernel, the 2.6.35 kernel according to kernelnewbies was
released on August 1, 2010:

http://kernelnewbies.org/LinuxChanges
Linux 2.6.35 has been released on 1 Aug, 2010.

http://marc.info/?l=linux-kernel&m=127419418304483&w=2

Adam Radford (2):
3ware maintainers update
3w-xxxx, 3w-9xxx: force 60 second timeout

I will go back to 2.6.34 and see if the problems persists.

Justin.

2010-08-31 11:36:49

by Justin Piszcz

[permalink] [raw]
Subject: Re: 3w-9xxx: scsi0: WARNING: (0x06:0x0037): Character ioctl (0x108) timed out, resetting card.



On Tue, 31 Aug 2010, Justin Piszcz wrote:

>
>
> On Sun, 29 Aug 2010, Justin Piszcz wrote:
>
>> On Fri, 27 Aug 2010, Justin Piszcz wrote:

Per my earlier message, both cards were resetting, scsi0 and scsi1 under
kernel 2.6.35(.x), with 2.6.34, the lag problem is gone and my machine is
back to normal again. It is unlikely that both cards would be at fault,
in addition I spent several hours trying different things, removing the BBU
modules for example, but always staying on the 2.6.35 kernel.

I am now able to run rss2email (which ALWAYS caused the machine to lockup
and freeze until one of the controllers reset) without any problems using
2.6.34.

2.6.34.1 = good
2.6.35.x = has 3ware bug

There appears to be a bug in the commit for the 3w-9xxx updates.

On 2.6.34 now and no problems so far, I am continuing to test.

http://marc.info/?l=linux-kernel&m=127419418304483&w=2

Adam Radford (2):
3ware maintainers update
3w-xxxx, 3w-9xxx: force 60 second timeout


Justin.

2010-08-31 12:23:19

by Justin Piszcz

[permalink] [raw]
Subject: Re: 3w-9xxx: scsi0: WARNING: (0x06:0x0037): Character ioctl (0x108) timed out, resetting card.



On Tue, 31 Aug 2010, Justin Piszcz wrote:

>
>
> On Tue, 31 Aug 2010, Justin Piszcz wrote:
>
>>
>>
>> On Sun, 29 Aug 2010, Justin Piszcz wrote:
>>
>>> On Fri, 27 Aug 2010, Justin Piszcz wrote:
>
> Per my earlier message, both cards were resetting, scsi0 and scsi1 under
> kernel 2.6.35(.x), with 2.6.34, the lag problem is gone and my machine is
> back to normal again. It is unlikely that both cards would be at fault,
> in addition I spent several hours trying different things, removing the BBU
> modules for example, but always staying on the 2.6.35 kernel.
>
> I am now able to run rss2email (which ALWAYS caused the machine to lockup
> and freeze until one of the controllers reset) without any problems using
> 2.6.34.
>
> 2.6.34.1 = good
> 2.6.35.x = has 3ware bug
>
> There appears to be a bug in the commit for the 3w-9xxx updates.
>
> On 2.6.34 now and no problems so far, I am continuing to test.
>
> http://marc.info/?l=linux-kernel&m=127419418304483&w=2
>
> Adam Radford (2):
> 3ware maintainers update
> 3w-xxxx, 3w-9xxx: force 60 second timeout
>
>
> Justin.


Latest-- 2.6.34.x seems to fix the issue, no lag whatsoever, the system is
very responsive when running rss2email, again no lag, have run 10-15 tests
without a single incident. I also re-enabled msi and put my optimization
script back in init.d, everything continues to work perfectly.

Will wait for Adam on what went wrong with the 2.6.35 3w-9xxx update.

Justin.

2010-08-31 18:15:26

by adam radford

[permalink] [raw]
Subject: Re: 3w-9xxx: scsi0: WARNING: (0x06:0x0037): Character ioctl (0x108) timed out, resetting card.

Justin,

There were no changes in the driver other than to force a 60 second I/O
timeout. You probably have a udev rule that was already giving you 60
second timeouts for all scsi devices, so this patch did absolutely nothing.

More likely is that there is an I/O pattern change between the kernels
that is causing a 3ware firmware issue (or drive firmware issue) on
your configuration.
I have forwarded your case to 3ware firmware support.

-Adam


On Tue, Aug 31, 2010 at 5:23 AM, Justin Piszcz <[email protected]> wrote:
>
>
> On Tue, 31 Aug 2010, Justin Piszcz wrote:
>
>>
>>
>> On Tue, 31 Aug 2010, Justin Piszcz wrote:
>>
>>>
>>>
>>> On Sun, 29 Aug 2010, Justin Piszcz wrote:
>>>
>>>> On Fri, 27 Aug 2010, Justin Piszcz wrote:
>>
>> Per my earlier message, both cards were resetting, scsi0 and scsi1 under
>> kernel 2.6.35(.x), with 2.6.34, the lag problem is gone and my machine is
>> back to normal again. ?It is unlikely that both cards would be at fault,
>> in addition I spent several hours trying different things, removing the
>> BBU
>> modules for example, but always staying on the 2.6.35 kernel.
>>
>> I am now able to run rss2email (which ALWAYS caused the machine to lockup
>> and freeze until one of the controllers reset) without any problems using
>> 2.6.34.
>>
>> 2.6.34.1 = good
>> 2.6.35.x = has 3ware bug
>>
>> There appears to be a bug in the commit for the 3w-9xxx updates.
>>
>> On 2.6.34 now and no problems so far, I am continuing to test.
>>
>> http://marc.info/?l=linux-kernel&m=127419418304483&w=2
>>
>> Adam Radford (2):
>> ? ? 3ware maintainers update
>> ? ? 3w-xxxx, 3w-9xxx: force 60 second timeout
>>
>>
>> Justin.
>
>
> Latest-- 2.6.34.x seems to fix the issue, no lag whatsoever, the system is
> very responsive when running rss2email, again no lag, have run 10-15 tests
> without a single incident. ?I also re-enabled msi and put my optimization
> script back in init.d, everything continues to work perfectly.
>
> Will wait for Adam on what went wrong with the 2.6.35 3w-9xxx update.
>
> Justin.
>
>