2003-02-08 12:06:58

by Przemysław Maciuszko

[permalink] [raw]
Subject: Problem with mm in 2.4.19 and 2.4.20

Hello.
I have a problem with one news server (feeder) box running INN.
Under heavy load i get the following error on the console:

filemap.c:2084: bad pmd 2bc001e3

This showed few times during last few days and few times server 'hanged up'
after this.
Anyone has an idea what can cause it?

I'm using Linux Debian on dual PIII 1.1Ghz, 1GB RAM, LVM version 1.0.6,
Qlogic FC 2200F driver version 6.01
Any help would be apreciated...


--
Przemys?aw Maciuszko
Agora S.A.


2003-08-11 07:39:42

by Harald Welte

[permalink] [raw]
Subject: 2.4.18/2.4.20 filemap.c pmd bug (was Re: Problem with mm in 2.4.19 and 2.4.20)

Przemys?aw Maciuszko wrote:

>I have a problem with one news server (feeder) box running INN.
>Under heavy load i get the following error on the console:
>
>filemap.c:2084: bad pmd 2bc001e3
>
>This showed few times during last few days and few times server 'hanged up'
>after this.

I can confirm this problem. It happens on one of my newsservers as well,
currently at least once per day. It is a dual PIII 650MHz, 1GB RAM,
200GB spool (scsi hardware raid array attached to adaptec aic7xxx), six
seperate SCSI disks attached to a seperate aic7xxx controller for
overview, running inn-2.3.2.

We've tried RedHat kernels 2.4.18-3, 2.4.18-17.7, 2.4.20-19.7 and
2.4.20-19.7bigmem as well as a kernel.org 2.4.20 - all with the same
problem.

After the filemap.c / pmd_ERROR() printk, the box either hangs (no
further printout, not that often) or has a stack overflow (most of the
time):

filemap.c:2258: bad pmd c0003000(00000000000001e3).
do_IRQ: stack overflow: -864
c0252845 fffffca0 206d6564 c2426000 00000000 c0117b20 c0101018 c024bd2c
c2426000 00000018 00000018 00000000 c0117b20 c0101018 c2426470 6f6e0018
40320018 ffffff00 c0117b43 00000010 00000202 7369636e 3e65642e 613c200a
Call Trace: [<c0117b20>] do_page_fault [kernel] 0x0 (0xc242634c))
[<c0117b20>] do_page_fault [kernel] 0x0 (0xc2426368))
[<c0117b43>] do_page_fault [kernel] 0x23 (0xc2426380))
[<c0117b20>] do_page_fault [kernel] 0x0 (0xc242645c))
[<c0108cc4>] error_code [kernel] 0x34 (0xc2426464))
[<c0117fc5>] do_page_fault [kernel] 0x4a5 (0xc2426498))
[<c0117b20>] do_page_fault [kernel] 0x0 (0xc2426574))
[<c0108cc4>] error_code [kernel] 0x34 (0xc242657c))
[<c0117fc5>] do_page_fault [kernel] 0x4a5 (0xc24265b0))
[<c0117b20>] do_page_fault [kernel] 0x0 (0xc242668c))
[<c0108cc4>] error_code [kernel] 0x34 (0xc2426694))
[<c0117fc5>] do_page_fault [kernel] 0x4a5 (0xc24266c8))
[<c0117b20>] do_page_fault [kernel] 0x0 (0xc24267a4))
[<c0108cc4>] error_code [kernel] 0x34 (0xc24267ac))

The messages are always preceded by a '(scsi0:A:0:0): Locking max tag
count at 64' message. The scsi device number is changing, so it cannot
be a single device

>Anyone has an idea what can cause it?

Unfortunately I'm not very familiar with the linux MM subsystem. But
since I consider this now as a confirmed bug, maybe some of the other
lkml folks have an idea what might be going on.

>I'm using Linux Debian on dual PIII 1.1Ghz, 1GB RAM, LVM version 1.0.6
>Qlogic FC 2200F driver version 6.01

We don't use lvm, so the similarities seem to be: Dual PIII,
SCSI, INN

--
- Harald Welte <[email protected]> http://www.gnumonks.org/
============================================================================
Programming is like sex: One mistake and you have to support it your lifetime


Attachments:
(No filename) (2.71 kB)
(No filename) (189.00 B)
Download all attachments

2003-08-11 09:48:29

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 2.4.18/2.4.20 filemap.c pmd bug (was Re: Problem with mm in 2.4.19 and 2.4.20)

On Mon, Aug 11, 2003 at 09:34:43AM +0200, Harald Welte wrote:
> >I'm using Linux Debian on dual PIII 1.1Ghz, 1GB RAM, LVM version 1.0.6
> >Qlogic FC 2200F driver version 6.01
>
> We don't use lvm, so the similarities seem to be: Dual PIII,
> SCSI, INN

Well, qlogic + lvm is vert prone of stack overflows. You're using aic7xxx
I assume? Some other interesting drivers?

2003-08-11 10:25:59

by Harald Welte

[permalink] [raw]
Subject: Re: 2.4.18/2.4.20 filemap.c pmd bug (was Re: Problem with mm in 2.4.19 and 2.4.20)

Hi Christian. First of all, thanks for your quick reply.

On Mon, Aug 11, 2003 at 10:48:23AM +0100, Christoph Hellwig wrote:

> Well, qlogic + lvm is vert prone of stack overflows.

In my case, we use neither of them.

> You're using aic7xxx I assume?

yes. The device is reported as

scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8
<Adaptec aic7890/91 Ultra2 SCSI adapter>
aic7890/91: Ultra2 Wide Channel A, SCSI Id=7, 32/253 SCBs


> Some other interesting drivers?

Well, there's a tulip based network board and one symbios SCSI controller
(ncr53c8xx driver) in the system. But since the '(scsi0:A:9:0): Locking
max tag count at 64' message always indicates 'scsi0', I think it has to
do with aic7xxx.

--
- Harald Welte <[email protected]> http://www.gnumonks.org/
============================================================================
Programming is like sex: One mistake and you have to support it your lifetime


Attachments:
(No filename) (982.00 B)
(No filename) (189.00 B)
Download all attachments