Subject: Probably 2.4 kernel or AIC7xxx module trouble

Hi linuxers,

I am having a strange halt troubles in my linux box after I installed the
2.4.20 kernel that didn't happen when I was running the 2.2 kernel.

The system works fine for hours or days, and then it halts. The keybord
halts, the remote acess halts, but I still can ping it !

I found some messages in this newsgroup with the same caracteristics, but it
was unsolved.

The system halts easily if I do a large I/O, like reindexing a database,
giving me some messages like: (scsi0:A:1:0): Locking max tag count at 128...

I have a Red Hat 9 distribution instaled in a Intel STL2 server board, with
2 Pentium III 933 Mhz and 512 Mb RAM, 2 scsi disks (18 Mb) and 2 IDE disks
(40 Mb), running in multiple RAID 1 and RAID 0 configurations.

The dmesg gives me the following data about the SCSI:

SCSI subsystem driver Revision: 1.00
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8
<Adaptec aic7899 Ultra160 SCSI adapter>
aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8
<Adaptec aic7899 Ultra160 SCSI adapter>
aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs

blk: queue c24fd618, I/O limit 4095Mb (mask 0xffffffff)
Vendor: FUJITSU Model: MAJ3182MC Rev: 0114
Type: Direct-Access ANSI SCSI revision: 04
blk: queue c24fd418, I/O limit 4095Mb (mask 0xffffffff)
Vendor: IBM Model: DDYS-T18350M Rev: SA2A
Type: Direct-Access ANSI SCSI revision: 03
blk: queue c24fd218, I/O limit 4095Mb (mask 0xffffffff)
Vendor: ESG-SHV Model: SCA HSBP M14 Rev: 0.03
Type: Processor ANSI SCSI revision: 02
blk: queue dfc58e18, I/O limit 4095Mb (mask 0xffffffff)
scsi0:A:0:0: Tagged Queuing enabled. Depth 253
scsi0:A:1:0: Tagged Queuing enabled. Depth 253
Vendor: NEC Model: CD-ROM DRIVE:466 Rev: 1.26
Type: CD-ROM ANSI SCSI revision: 02
blk: queue dfc58a18, I/O limit 4095Mb (mask 0xffffffff)
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0
(scsi0:A:0): 160.000MB/s transfers (80.000MHz DT, offset 127, 16bit)
SCSI device sda: 35694904 512-byte hdwr sectors (18276 MB)
sda: sda1 sda2 sda3
(scsi0:A:1): 160.000MB/s transfers (80.000MHz DT, offset 63, 16bit)
SCSI device sdb: 35843670 512-byte hdwr sectors (18352 MB)
sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 >
md: raid0 personality registered as nr 2
md: raid1 personality registered as nr 3
Journalled Block Device driver loaded
md: Autodetecting RAID arrays.

The /proc/scsi/aic7xxx/0 gives me:

Adaptec AIC7xxx driver version: 6.2.8
aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

Serial EEPROM:
0xc33a 0xc33a 0xc33a 0xc33a 0xc33a 0xc33a 0xc33a 0xc33a
0xc33a 0xc33a 0xc33a 0xc33a 0xc33a 0xc33a 0xc33a 0xc33a
0x58e4 0x5c5e 0x2807 0x0010 0xffff 0xffff 0xffff 0xffff
0xffff 0xffff 0xffff 0xffff 0xffff 0xffff 0x0250 0x133f

Channel A Target 0 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Goal: 160.000MB/s transfers (80.000MHz DT, offset 127, 16bit)
Curr: 160.000MB/s transfers (80.000MHz DT, offset 127, 16bit)
Channel A Target 0 Lun 0 Settings
Commands Queued 488876
Commands Active 0
Command Openings 125
Max Tagged Openings 253
Device Queue Frozen Count 0
Channel A Target 1 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Goal: 160.000MB/s transfers (80.000MHz DT, offset 63, 16bit)
Curr: 160.000MB/s transfers (80.000MHz DT, offset 63, 16bit)
Channel A Target 1 Lun 0 Settings
Commands Queued 465642
Commands Active 0
Command Openings 128
Max Tagged Openings 128
Device Queue Frozen Count 0
Channel A Target 2 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 3 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 4 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 5 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 6 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Goal: 3.300MB/s transfers
Curr: 3.300MB/s transfers
Channel A Target 6 Lun 0 Settings
Commands Queued 1
Commands Active 0
Command Openings 1

Max Tagged Openings 0
Device Queue Frozen Count 0
Channel A Target 7 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 8 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 9 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 10 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 11 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 12 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 13 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 14 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 15 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)

And /proc/scsi/aic7xxx/1 is:

daptec AIC7xxx driver version: 6.2.8
aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs

Serial EEPROM:
0xc33a 0xc33a 0xc33a 0xc33a 0xc33a 0xc33a 0xc33a 0xc33a
0xc33a 0xc33a 0xc33a 0xc33a 0xc33a 0xc33a 0xc33a 0xc33a
0x58e4 0x5c5e 0x2807 0x0010 0xffff 0xffff 0xffff 0xffff
0xffff 0xffff 0xffff 0xffff 0xffff 0xffff 0x0250 0x133f

Channel A Target 0 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 1 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 2 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 3 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 4 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 5 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Goal: 20.000MB/s transfers (20.000MHz, offset 16)
Curr: 20.000MB/s transfers (20.000MHz, offset 16)
Channel A Target 5 Lun 0 Settings
Commands Queued 193
Commands Active 0
Command Openings 1
Max Tagged Openings 0
Device Queue Frozen Count 0
Channel A Target 6 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 7 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 8 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 9 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 10 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 11 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 12 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 13 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 14 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)
Channel A Target 15 Negotiation Settings
User: 160.000MB/s transfers (80.000MHz DT, offset 255, 16bit)

How I could know if this is a hardware problem or a kernel + module problem
?

Thanks
Slepetys



2003-07-02 14:59:49

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble

> The system halts easily if I do a large I/O, like reindexing a database,
> giving me some messages like: (scsi0:A:1:0): Locking max tag count at 128...

The "Locking max tag count" messages are normal. It means the SCSI
driver was able to determine the maximum queue depth of your drive.

6.2.8 is rather old. I don't know that upgrading the aic7xxx driver
will solve your problem, but it might be worth a shot. The latest
is available here:

http://people.FreeBSD.org/~gibbs/linux/SRC/

After upgrading, you should be at 6.2.36.

--
Justin

Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble

Hi,

I upgraded it for the 6.2.36, using RPM and I am making some heavy tests.

Until now, it's ok, and for this kind of tests, the old configuration gave
some trouble.

Thanks
Slepetys

----- Original Message -----
From: "Justin T. Gibbs" <[email protected]>
To: "Roberto Slepetys Ferreira" <[email protected]>;
<[email protected]>
Sent: Wednesday, July 02, 2003 12:14 PM
Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble


> > The system halts easily if I do a large I/O, like reindexing a database,
> > giving me some messages like: (scsi0:A:1:0): Locking max tag count at
128...
>
> The "Locking max tag count" messages are normal. It means the SCSI
> driver was able to determine the maximum queue depth of your drive.
>
> 6.2.8 is rather old. I don't know that upgrading the aic7xxx driver
> will solve your problem, but it might be worth a shot. The latest
> is available here:
>
> http://people.FreeBSD.org/~gibbs/linux/SRC/
>
> After upgrading, you should be at 6.2.36.
>
> --
> Justin
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble

Ops....
The linux box halted again, after 12 hours operating normaly.

The more strange is that there isn't any message in the /var/log/message,
the system simples stop to respond, and some strange behavior is that the
TOP comand gaves me :

15:10:15 up 33 min, 1 user, load average: 1.06, 1.17, 1.14
91 processes: 90 sleeping, 1 running, 0 zombie, 0 stopped
CPU0 states: 0.0% user 3.1% system 0.0% nice 0.0% iowait 96.2%
idle
CPU1 states: 0.0% user 0.1% system 0.0% nice 0.0% iowait 99.2%
idle
Mem: 513172k av, 437740k used, 75432k free, 0k shrd, 17500k
buff
258436k actv, 41792k in_d, 76644k in_c
Swap: 1060088k av, 44k used, 1060044k free 339860k
cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
9 root 16 0 0 0 0 SW 0.8 0.0 0:02 0
kscand/Normal
31 root 15 0 0 0 0 DW 0.5 0.0 0:05 0 raid1syncd
2425 root 15 0 1088 1088 864 R 0.2 0.2 0:00 1 top
1 root 15 0 396 396 344 S 0.0 0.0 0:03 1 init
2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0
migration/0
... others...

Meanning that the Load Average is incompatible with the use of the CPUs.

I really have no idea where to find some clue about what is going on.

Thanks
Roberto Slepetys



----- Original Message -----
From: "Roberto Slepetys Ferreira" <[email protected]>
To: "Justin T. Gibbs" <[email protected]>; <[email protected]>
Sent: Wednesday, July 02, 2003 6:00 PM
Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble


> Hi,
>
> I upgraded it for the 6.2.36, using RPM and I am making some heavy tests.
>
> Until now, it's ok, and for this kind of tests, the old configuration gave
> some trouble.
>
> Thanks
> Slepetys
>
> ----- Original Message -----
> From: "Justin T. Gibbs" <[email protected]>
> To: "Roberto Slepetys Ferreira" <[email protected]>;
> <[email protected]>
> Sent: Wednesday, July 02, 2003 12:14 PM
> Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble
>
>
> > > The system halts easily if I do a large I/O, like reindexing a
database,
> > > giving me some messages like: (scsi0:A:1:0): Locking max tag count at
> 128...
> >
> > The "Locking max tag count" messages are normal. It means the SCSI
> > driver was able to determine the maximum queue depth of your drive.
> >
> > 6.2.8 is rather old. I don't know that upgrading the aic7xxx driver
> > will solve your problem, but it might be worth a shot. The latest
> > is available here:
> >
> > http://people.FreeBSD.org/~gibbs/linux/SRC/
> >
> > After upgrading, you should be at 6.2.36.
> >
> > --
> > Justin


Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble

Hi Jim,
It's exacly the same problem, after about 12-24 hour everything locks up,
but I can still ping (sometimes).

But the syslog stops, and I only reboot by hardware, because the CTL+ALT+DEL
doesn't works, and the terminal either.

It's a SMP (dual pentium III) too, but after some tests with single CPU and
NOAPIC parameter to the kernel, the trouble continues.

I have no clue for what kind of tests I can do to generate the trouble, or
for what logs, or files to look for.

[]s
Slepetys



> Roberto,
> How does this problem manifest itself. I think it's the same problem
> that I'm having. Let me know what you think. I'm using the megaraid driver
> and aic7xxx driver. After about a 12-20 hour period, everything locks up,
> but there is not error message. The kernel sysreq information does work
and
> I'm able to reboot.
>
> top - 12:59:05 up 3:02, 2 users, load average: 4.17, 4.35, 4.38
> Tasks: 122 total, 5 running, 117 sleeping, 0 stopped, 0 zombie
> Cpu0 : 25.0% user, 50.0% system, 25.0% nice, 0.0% idle
> Cpu1 : 62.5% user, 31.2% system, 0.0% nice, 6.2% idle
> Mem: 1033896k total, 823636k used, 210260k free, 160412k buffers
> Swap: 1060280k total, 0k used, 1060280k free, 352848k cached
>
>
> ----- Original Message -----
> From: "Roberto Slepetys Ferreira" <[email protected]>
> To: "Roberto Slepetys Ferreira" <[email protected]>; "Justin T.
> Gibbs" <[email protected]>; <[email protected]>
> Sent: Thursday, July 03, 2003 11:29 AM
> Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble
>
>
> > Ops....
> > The linux box halted again, after 12 hours operating normaly.
> >
> > The more strange is that there isn't any message in the
/var/log/message,
> > the system simples stop to respond, and some strange behavior is that
the
> > TOP comand gaves me :
> >
> > 15:10:15 up 33 min, 1 user, load average: 1.06, 1.17, 1.14
> > 91 processes: 90 sleeping, 1 running, 0 zombie, 0 stopped
> > CPU0 states: 0.0% user 3.1% system 0.0% nice 0.0% iowait 96.2%
> > idle
> > CPU1 states: 0.0% user 0.1% system 0.0% nice 0.0% iowait 99.2%
> > idle
> > Mem: 513172k av, 437740k used, 75432k free, 0k shrd, 17500k
> > buff
> > 258436k actv, 41792k in_d, 76644k in_c
> > Swap: 1060088k av, 44k used, 1060044k free 339860k
> > cached
> >
> > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU
COMMAND
> > 9 root 16 0 0 0 0 SW 0.8 0.0 0:02 0
> > kscand/Normal
> > 31 root 15 0 0 0 0 DW 0.5 0.0 0:05 0
> raid1syncd
> > 2425 root 15 0 1088 1088 864 R 0.2 0.2 0:00 1 top
> > 1 root 15 0 396 396 344 S 0.0 0.0 0:03 1 init
> > 2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0
> > migration/0
> > ... others...
> >
> > Meanning that the Load Average is incompatible with the use of the CPUs.
> >
> > I really have no idea where to find some clue about what is going on.
> >
> > Thanks
> > Roberto Slepetys
> >
> >
> >
> > ----- Original Message -----
> > From: "Roberto Slepetys Ferreira" <[email protected]>
> > To: "Justin T. Gibbs" <[email protected]>;
<[email protected]>
> > Sent: Wednesday, July 02, 2003 6:00 PM
> > Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble
> >
> >
> > > Hi,
> > >
> > > I upgraded it for the 6.2.36, using RPM and I am making some heavy
> tests.
> > >
> > > Until now, it's ok, and for this kind of tests, the old configuration
> gave
> > > some trouble.
> > >
> > > Thanks
> > > Slepetys
> > >
> > > ----- Original Message -----
> > > From: "Justin T. Gibbs" <[email protected]>
> > > To: "Roberto Slepetys Ferreira" <[email protected]>;
> > > <[email protected]>
> > > Sent: Wednesday, July 02, 2003 12:14 PM
> > > Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble
> > >
> > >
> > > > > The system halts easily if I do a large I/O, like reindexing a
> > database,
> > > > > giving me some messages like: (scsi0:A:1:0): Locking max tag count
> at
> > > 128...
> > > >
> > > > The "Locking max tag count" messages are normal. It means the SCSI
> > > > driver was able to determine the maximum queue depth of your drive.
> > > >
> > > > 6.2.8 is rather old. I don't know that upgrading the aic7xxx driver
> > > > will solve your problem, but it might be worth a shot. The latest
> > > > is available here:
> > > >
> > > > http://people.FreeBSD.org/~gibbs/linux/SRC/
> > > >
> > > > After upgrading, you should be at 6.2.36.
> > > >
> > > > --
> > > > Justin
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
>


2003-07-03 21:04:27

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble

> I have no clue for what kind of tests I can do to generate the trouble, or
> for what logs, or files to look for.

Have you tried running with the NMI watchdog enabled?

--
Justin

2003-07-03 22:35:06

by Matthias Andree

[permalink] [raw]
Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble

On Thu, 03 Jul 2003, Roberto Slepetys Ferreira wrote:

> Meanning that the Load Average is incompatible with the use of the CPUs.

To find the stuck process that pushes your LA up, try: ps ax | grep -w D

2003-07-03 22:51:46

by Jim Gifford

[permalink] [raw]
Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble

Justin, I just tried to enable the nmi watch dog. It doesn't seem to work on
my system I tried both

append="nmi_watchdog=1"
and
append="nmi_watchdog=2"

----- Original Message -----
From: "Justin T. Gibbs" <[email protected]>
To: "Roberto Slepetys Ferreira" <[email protected]>; "Jim Gifford"
<[email protected]>; <[email protected]>
Sent: Thursday, July 03, 2003 2:20 PM
Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble


> > I have no clue for what kind of tests I can do to generate the trouble,
or
> > for what logs, or files to look for.
>
> Have you tried running with the NMI watchdog enabled?
>
> --
> Justin
>
>

2003-07-03 22:58:29

by Jim Gifford

[permalink] [raw]
Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble

Tried that before. Before I thought it was the kswapd problem (see list).
But a few hours after I thought it was fixed, bamm it did it again.

I have monitored ps via this script, but I never see anything out of the
ordinary. I will try again and send a copy to the other guy who is having
the problem to see what results we get.


----- Original Message -----
From: "Matthias Andree" <[email protected]>
To: <[email protected]>
Sent: Thursday, July 03, 2003 3:49 PM
Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble


> On Thu, 03 Jul 2003, Roberto Slepetys Ferreira wrote:
>
> > Meanning that the Load Average is incompatible with the use of the CPUs.
>
> To find the stuck process that pushes your LA up, try: ps ax | grep -w D
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble

Hi again,

I passed the parameer: nmi_watchdog=1 to the kernel at the boot.

And after about 2 hours it frozen again, but in the console I found a lot of
messages like this:

smb_proc_readdir_log: name=\....(some directory)....\*, result=-2, rcls=1,
err=2

In the log, I found the same message:

Jul 4 12:35:21 filitico kernel: smb_proc_readdir_long:
name=\Renato(19)\Data Base Nao Utilizados\*, result=-2, rcls=1, err=2

and

Jul 4 12:30:07 filitico kernel: smb_proc_readdir_long:
name=\Renato(19)\Data Base Nao Utilizados\*, result=-2, rcls=1, err=2
Jul 4 12:30:54 filitico kernel: smb_proc_readdir_long:
name=\Renato(19)\Data Base Nao Utilizados\*, result=-2, rcls=1, err=2
Jul 4 12:33:59 filitico last message repeated 3 times
Jul 4 12:34:24 filitico last message repeated 2 times

Do you know what possible it can be ?

Thanks
Slepetys


----- Original Message -----
From: "Jim Gifford" <[email protected]>
Newsgroups: linux.kernel
Sent: Thursday, July 03, 2003 8:10 PM
Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble


> Justin, I just tried to enable the nmi watch dog. It doesn't seem to work
on
> my system I tried both
>
> append="nmi_watchdog=1"
> and
> append="nmi_watchdog=2"
>
> ----- Original Message -----
> From: "Justin T. Gibbs" <[email protected]>
> To: "Roberto Slepetys Ferreira" <[email protected]>; "Jim Gifford"
> <[email protected]>; <[email protected]>
> Sent: Thursday, July 03, 2003 2:20 PM
> Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble
>
>
> > > I have no clue for what kind of tests I can do to generate the
trouble,
> or
> > > for what logs, or files to look for.
> >
> > Have you tried running with the NMI watchdog enabled?
> >
> > --
> > Justin
> >
> >
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble

Hi again,

I did a ps ax|grep -w D, and I got:

>ps ax|grep -w D
2205 ? S 0:00 smbd -D
2209 ? S 0:00 nmbd -D
2337 pts/0 S 0:00 grep -w D

And the Load Average still is incompatible with the use of the CPUs:

14:34:53 up 11 min, 1 user, load average: 1.35, 1.38, 0.85
86 processes: 85 sleeping, 1 running, 0 zombie, 0 stopped
CPU0 states: 0.1% user 1.0% system 0.0% nice 0.0% iowait 97.0%
idle
CPU1 states: 0.0% user 1.0% system 0.0% nice 0.0% iowait 98.0%
idle
Mem: 513172k av, 340916k used, 172256k free, 0k shrd, 11904k
buff
242200k actv, 24552k in_d, 15120k in_c
Swap: 1060088k av, 40k used, 1060048k free 254744k
cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
29 root 15 0 0 0 0 SW 0.9 0.0 0:03 1 raid1syncd

[]s
Slepetys


----- Original Message -----
From: "Matthias Andree" <[email protected]>
Newsgroups: linux.kernel
Sent: Thursday, July 03, 2003 8:00 PM
Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble


> On Thu, 03 Jul 2003, Roberto Slepetys Ferreira wrote:
>
> > Meanning that the Load Average is incompatible with the use of the CPUs.
>
> To find the stuck process that pushes your LA up, try: ps ax | grep -w D
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


2003-07-05 20:01:02

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble

> Justin, I just tried to enable the nmi watch dog. It doesn't seem to work on
> my system I tried both
>
> append="nmi_watchdog=1"
> and
> append="nmi_watchdog=2"

Is the watchdog enabled in your kernel? The command line only works
if you have compiled in support for the watchdog.

--
Justin

2003-07-05 20:13:17

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble

> Hi again,
>
> I passed the parameer: nmi_watchdog=1 to the kernel at the boot.
>
> And after about 2 hours it frozen again, but in the console I found a lot of
> messages like this:
>
> smb_proc_readdir_log: name=\....(some directory)....\*, result=-2, rcls=1,
> err=2

Looks like your samba server is upset about some requests its getting.
These probably have nothing to do with your hang.

Did you verify that the NMI watchdog was functioning properly on your
system as outline by the NMI watchdog FAQ in the kernel source Documenation
directory?

--
Justin

2003-07-05 20:14:05

by Jim Gifford

[permalink] [raw]
Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble

I think the problem is elsewhere, please take a look at this message I sent
earlier.

http://marc.theaimsgroup.com/?l=linux-kernel&m=105742280413809&w=2

----- Original Message -----
From: "Justin T. Gibbs" <[email protected]>
To: "Jim Gifford" <[email protected]>; "Roberto Slepetys Ferreira"
<[email protected]>; <[email protected]>
Sent: Saturday, July 05, 2003 1:15 PM
Subject: Re: Probably 2.4 kernel or AIC7xxx module trouble


> > Justin, I just tried to enable the nmi watch dog. It doesn't seem to
work on
> > my system I tried both
> >
> > append="nmi_watchdog=1"
> > and
> > append="nmi_watchdog=2"
>
> Is the watchdog enabled in your kernel? The command line only works
> if you have compiled in support for the watchdog.
>
> --
> Justin
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>