2003-11-19 07:29:08

by Sumit Pandya

[permalink] [raw]
Subject: Infinite do_IRQ

Hi All,
I'm running 2.4.22 kernel on Pentium-III processor with following few
patches
1. ebtables-brnf-3_vs_2.4.22.diff
2. routes-2.4.22-9.diff (Julean's DGD)
3. nfnetlink-ctnetlink-0.12-2.patch
4. htb_3.12_3.13.diff

After running it for few time suddenly it hangs and I get continuous
"do_IRQ: stack overflow:" messages on my serial console. I'm using ttywatch
for reading console output on other Linux System.

Following is a snapshot of those messages

do_IRQ: stack overflow: -940
c024caa8 fffffc54 00000018 00000003 c7e551fa c7e551f7 c02c6270 c024a588
00000003 c7e551f7 00000009 c7e551fa c7e551f7 c02c6270 c7e55200
c0110018
c7e50018 ffffff00 c8867db0 00000010 00000286 c8867a2a c7e551fa
c7e551fd
Call Trace: [<c0110018>] [<c8867db0>] [<c8867a2a>] [<c011a9e0>]
[<c886a4ef>]
[<c8867b72>] [<c011a9e0>] [<c886a4ef>] [<c8867bd7>] [<c886a4ef>]
[<c8867ba5>]
[<c886a4ee>] [<c8867bd7>] [<c886a4ee>] [<c8867b72>] [<c011a9e0>]
[<c886a4ee>]
[<c8867bd7>] [<c011a9e0>] [<c886a4ee>] [<c8867ba5>] [<c886a4ed>]
[<c8867bd7>]
[<c886a4ed>] [<c8867b72>] [<c886a4ed>] [<c8867bd7>] [<c011b170>]
[<c886a4ed>]
[<c8867ba5>] [<c886a4ec>] [<c8867bd7>] [<c886a4ec>] [<c8867b72>]
[<c886a4ec>]
[<c8867bd7>] [<c886a4ec>] [<c8867ba5>] [<c886a4eb>] [<c8867bd7>]
[<c886a4eb>]
[<c8867b72>] [<c886a4eb>] [<c8867bd7>] [<c886a4eb>] [<c8867ba5>]
[<c886a4ea>]
[<c8867bd7>] [<c886a4ea>] [<c8867b72>] [<c023a160>] [<c886a4ea>]
[<c8867bd7>]
[<c023b280>] [<c023b540>] [<c023ba30>] [<c023bd80>] [<c023be10>]
[<c886a4ea>]
[<c8867ba5>] [<c023c8f0>] [<c01f9760>] [<c01f9770>] [<c886a4e9>]
[<c8867bd7>]
[<c023b860>] [<c023bd80>] [<c01f9640>] [<c023bef0>] [<c01fc590>]
[<c886a4e9>]
[<c8867b72>] [<c01f9760>] [<c01f9770>] [<c023b200>] [<c886a4e9>]
[<c8867bd7>]
[<c886a4e9>] [<c8867ba5>] [<c886a4e8>] [<c8867bd7>] [<c886a4e8>]
[<c8867b72>]
[<c886a4e8>] [<c8867bd7>] [<c886a4e8>] [<c8867ba5>] [<c023df70>]
[<c023e220>]
[<c886a4e7>] [<c8867bd7>] [<c01f96a0>] [<c01f96b0>] [<c023d870>]
[<c023e4d0>]
[<c01f9760>] [<c886a4e7>] [<c8867b72>] [<c023df70>] [<c023e2b0>]
[<c01f9620>]
[<c01f9630>] [<c886a4e7>] [<c8867bd7>] [<c023ebd0>] [<c023dd10>]
[<c023e4d0>]
[<c01f9760>] [<c01f9770>] [<c886a4e7>] [<c8867ba5>] [<c886a4e6>]
[<c8867bd7>]
[<c886a4e6>] [<c8867b72>] [<c886a4e6>] [<c8867bd7>] [<c886a4e6>]
[<c8867ba5>]
[<c886a4e5>] [<c8867bd7>] [<c886a4e5>] [<c8867b72>] [<c886a4e5>]
[<c8867bd7>]
[<c886a4e5>] [<c8867ba5>] [<c886a4e4>] [<c8867bd7>] [<c886a4e4>]
[<c8867b72>]
[<c886a4e4>] [<c8867bd7>] [<c886a4e4>] [<c8867ba5>] [<c886a4e3>]
[<c8867bd7>]
[<c886a4e3>] [<c8867b72>] [<c886a4e3>] [<c8867bd7>] [<c886a4e3>]
[<c8867ba5>]
[<c886a4e2>] [<c8867bd7>] [<c886a4e2>] [<c8867b72>] [<c886a4e2>]
[<c8867bd7>]
[<c886a4e2>] [<c8867ba5>] [<c886a4e1>] [<c8867bd7>] [<c886a4e1>]
[<c8867b72>]
[<c886a4e1>] [<c8867bd7>] [<c886a4e1>] [<c8867ba5>] [<c886a4e0>]
[<c8867bd7>]
[<c886a4e0>] [<c8867b72>] [<c886a4e0>] [<c8867bd7>] [<c886a4e0>]
[<c8867ba5>]
[<c886a4df>] [<c8867bd7>] [<c886a4df>] [<c8867b72>] [<c886a4df>]
[<c8867bd7>]
[<c886a4df>] [<c8867ba5>] [<c886a4de>] [<c8867bd7>] [<c886a4de>]
[<c8867b72>]
[<c886a4de>] [<c8867bd7>] [<c886a4de>] [<c8867ba5>] [<c886a4dd>]
[<c8867bd7>]
[<c886a4dd>] [<c8867b72>] [<c886a4dd>] [<c8867bd7>] [<c886a4dd>]
[<c8867ba5>]
[<c886a4dc>] [<c8867bd7>] [<c886a4dc>] [<c8867b72>] [<c886a4dc>]
[<c8867bd7>]
[<c886a4dc>] [<c8867ba5>] [<c886a4db>] [<c8867bd7>] [<c886a4db>]
[<c8867b72>]
[<c886a4db>] [<c8867bd7>] [<c886a4db>] [<c8867ba5>] [<c886a4da>]
[<c8867bd7>]
[<c886a4da>] [<c8867b72>] [<c886a4da>] [<c8867bd7>] [<c886a4da>]
[<c8867ba5>]
[<c886a4d9>] [<c8867bd7>] [<c886a4d9>] [<c8867b72>] [<c886a4d9>]
[<c8867bd7>]
[<c886a4d9>] [<c8867ba5>] [<c886a4d8>] [<c8867bd7>] [<c886a4d8>]
[<c8867b72>]
[<c886a4d8>] [<c8867bd7>] [<c886a4d8>] [<c8867ba5>] [<c886a4d7>]
[<c8867bd7>]
[<c886a4d7>] [<c8867b72>] [<c886a4d7>] [<c8867bd7>] [<c886a4d7>]
[<c8867ba5>]
[<c886a4d6>] [<c8867bd7>] [<c886a4d6>] [<c8867b72>] [<c886a4d6>]
[<c8867bd7>]
[<c886a4d6>] [<c8867ba5>] [<c886a4d5>] [<c8867bd7>] [<c886a4d5>]
[<c8867b72>]
[<c886a4d5>] [<c8867bd7>] [<c886a4d5>] [<c8867ba5>] [<c886a4d4>]
[<c8867bd7>]
[<c886a4d4>] [<c8867b72>] [<c886a4d4>] [<c8867bd7>] [<c886a4d4>]
[<c8867ba5>]
[<c886a4d3>] [<c8867bd7>] [<c886a4d3>] [<c8867b72>] [<c886a4d3>]
[<c8867bd7>]
[<c886a4d3>] [<c8867ba5>] [<c886a4d2>] [<c8867bd7>] [<c886a4d2>]
[<c8867b72>]
[<c886a4d2>] [<c8867bd7>] [<c886a4d2>] [<c8867ba5>] [<c886a4d1>]
[<c8867bd7>]
[<c886a4d1>] [<c8867b72>] [<c886a4d1>] [<c8867bd7>] [<c886a4d1>]
[<c8867ba5>]
[<c886a4d0>] [<c8867bd7>] [<c886a4d0>] [<c8867b72>] [<c886a4d0>]
[<c8867bd7>]
[<c886a4d0>] [<c8867ba5>] [<c886a4cf>] [<c8867bd7>] [<c886a4cf>]
[<c8867b72>]
[<c886a4cf>] [<c8867bd7>] [<c886a4cf>] [<c8867ba5>] [<c886a4ce>]
[<c8867bd7>]
[<c886a4ce>] [<c8867b72>] [<c886a4ce>] [<c8867bd7>] [<c886a4ce>]
[<c8867ba5>]
[<c886a4cd>] [<c8867bd7>] [<c886a4cd>] [<c8867b72>] [<c886a4cd>]
[<c8867bd7>]
[<c886a4cd>] [<c8867ba5>] [<c886a4cc>] [<c8867bd7>] [<c886a4cc>]
[<c8867b72>]
[<c886a4cc>] [<c8867bd7>] [<c886a4cc>] [<c8867ba5>] [<c886a4cb>]
[<c8867bd7>]
[<c886a4cb>] [<c8867b72>] [<c886a4cb>] [<c8867bd7>]
do_IRQ: stack overflow: -940
... ... ...
After some time this limit changes
do_IRQ: stack overflow: -1048
some stack frames
do_IRQ: stack overflow: -1156
some stack frames

I've build this Kernel with gcc-2.96-98 on Red Hat Linux release 7.2.
In above dump, what surprises me is when I googled for do_IRQ I got few
postings on mailing list but none of the posting was for -Ve value of "esp -
sizeof(struct task_struct)".

Also I'd like some opinion about the patch posted on
http://www.ussg.iu.edu/hypermail/linux/kernel/0301.2/0232.html
Here, why "struct task_struct" is replaced with "struct thread_info"? Is
that only for 2.5.X/2.6.X series only?

I'd also like to draw your attention on one more patch by Joern Engel
(He is in CC list)
http://wh.fh-wedel.de/~joern/software/kernel/je/24/.patches/stack_overflow.p
atch
Are these patches safe to apply? What could be pros and cons if these
patches are applied into 2.4.22 kernel.

Thanks for all your replies.
-- Sumit


2003-11-19 08:48:20

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: Infinite do_IRQ

On Wed, 19 Nov 2003, Sumit Pandya wrote:

> Hi All,
> I'm running 2.4.22 kernel on Pentium-III processor with following few
> patches
> 1. ebtables-brnf-3_vs_2.4.22.diff
> 2. routes-2.4.22-9.diff (Julean's DGD)
> 3. nfnetlink-ctnetlink-0.12-2.patch
> 4. htb_3.12_3.13.diff

That's quite the patch cocktail, perhaps they need some auditing on stack
usage.

> After running it for few time suddenly it hangs and I get continuous
> "do_IRQ: stack overflow:" messages on my serial console. I'm using ttywatch
> for reading console output on other Linux System.
>
> Also I'd like some opinion about the patch posted on
> http://www.ussg.iu.edu/hypermail/linux/kernel/0301.2/0232.html
> Here, why "struct task_struct" is replaced with "struct thread_info"? Is
> that only for 2.5.X/2.6.X series only?

Yes that patch was 2.5/6 specific.

> I'd also like to draw your attention on one more patch by Joern Engel
> (He is in CC list)
> http://wh.fh-wedel.de/~joern/software/kernel/je/24/.patches/stack_overflow.p
> atch
> Are these patches safe to apply? What could be pros and cons if these
> patches are applied into 2.4.22 kernel.

+#if 0
+ if (unlikely(esp < (sizeof(struct task_struct) + 1024))) {
+#else
+ /* We check for 5k for now. The kernel stack still is 8k,
+ * but should shrink to 4k, so this test makes sense.
+ * Once the stack is 4k, we go back to the old test.
+ */
+ if (unlikely(esp < (sizeof(struct thread_info) + 5120))) {
+#endif

The i386 stack grows downwards, so if anything it'll report even earlier
than what you're hitting now. I'd recommend backing out those patches one by
one until you find out the offending patch and then perhaps do the stack
usage audit from there.

2003-11-19 14:44:42

by Sumit Pandya

[permalink] [raw]
Subject: Re: Infinite do_IRQ

Hi Zwane,
I'm sorry to bother you again. Following is output from
http://kernelnewbies.org/scripts/check-stack.sh

100 getxattr
100 pirq_peer_trick
100 removexattr
100 setxattr
100 sys_reboot
100 sys_recvmsg
108 rt_cache_get_info
108 sg_proc_hoststrs_info
10c rs_read_proc
110 autofs4_notify_daemon
110 set_serial_info
110 sys_sendmsg
114 scsi_request_sense
11c autofs4_expire_run
11c radeon_cp_clear
11c scan_scsis
128 aout_core_dump
13c load_elf_binary
140 do_execve
140 mmc_ioctl
140 vc_resize
144 elf_kcore_store_hdr
170 sg_ioctl
178 extract_entropy
190 sys_msgctl
1a0 scsi_reset_provider
1a4 sys_shmctl
1ac check_tcp_syn_cookie
1b0 tcp_timewait_state_process
1b4 ip_getsockopt
1cc secure_tcp_syn_cookie
1d0 tcp_v4_conn_request
1d4 sys_semtimedop
1d8 tcp_check_req
204 cdrom_buffer_sectors
224 cdrom_read_intr
228 ip_setsockopt
248 semctl_main
258 elf_core_dump
2a8 pci_do_scan_bus
2f4 vt_ioctl
31c sym53c8xx_detect
324 pcibios_fixup_peer_bridges
324 pci_sanity_check
410 cdrom_number_of_slots
410 cdrom_select_disc
410 cdrom_slot_status
444 cdrom_ioctl
47c ide_unregister
490 inflate_fixed
524 inflate_dynamic
5a8 huft_build
73c sanitize_e820_map

Any comments on this? Thanks in advance
-- Sumit
----- Original Message -----

From: "Zwane Mwaikambo" <[email protected]>
Sent: Wednesday, November 19, 2003 2:17 PM


> On Wed, 19 Nov 2003, Sumit Pandya wrote:
>
> > Hi All,
> > I'm running 2.4.22 kernel on Pentium-III processor with following
few
> > patches
> > 1. ebtables-brnf-3_vs_2.4.22.diff
> > 2. routes-2.4.22-9.diff (Julean's DGD)
> > 3. nfnetlink-ctnetlink-0.12-2.patch
> > 4. htb_3.12_3.13.diff
>
> That's quite the patch cocktail, perhaps they need some auditing on stack
> usage.
>
> > After running it for few time suddenly it hangs and I get continuous
> > "do_IRQ: stack overflow:" messages on my serial console. I'm using
ttywatch
> > for reading console output on other Linux System.
[snip]
>
> The i386 stack grows downwards, so if anything it'll report even earlier
> than what you're hitting now. I'd recommend backing out those patches one
by
> one until you find out the offending patch and then perhaps do the stack
> usage audit from there.

2003-11-20 04:14:44

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: Infinite do_IRQ

On Wed, 19 Nov 2003, Sumit Pandya wrote:

> Hi Zwane,
> I'm sorry to bother you again. Following is output from
> http://kernelnewbies.org/scripts/check-stack.sh

I think it'd really be easier to back out those patches one by one until
your messages stop happening. Otherwise i'm not quite sure which one is
really affecting you.

> 100 getxattr
> 100 pirq_peer_trick
> 100 removexattr
> 100 setxattr
> 100 sys_reboot
> 100 sys_recvmsg
> 108 rt_cache_get_info
> 108 sg_proc_hoststrs_info
> 10c rs_read_proc
> 110 autofs4_notify_daemon
> 110 set_serial_info
> 110 sys_sendmsg
> 114 scsi_request_sense
> 11c autofs4_expire_run
> 11c radeon_cp_clear
> 11c scan_scsis
> 128 aout_core_dump
> 13c load_elf_binary
> 140 do_execve
> 140 mmc_ioctl
> 140 vc_resize
> 144 elf_kcore_store_hdr
> 170 sg_ioctl
> 178 extract_entropy
> 190 sys_msgctl
> 1a0 scsi_reset_provider
> 1a4 sys_shmctl
> 1ac check_tcp_syn_cookie
> 1b0 tcp_timewait_state_process
> 1b4 ip_getsockopt
> 1cc secure_tcp_syn_cookie
> 1d0 tcp_v4_conn_request
> 1d4 sys_semtimedop
> 1d8 tcp_check_req
> 204 cdrom_buffer_sectors
> 224 cdrom_read_intr
> 228 ip_setsockopt
> 248 semctl_main
> 258 elf_core_dump
> 2a8 pci_do_scan_bus
> 2f4 vt_ioctl
> 31c sym53c8xx_detect
> 324 pcibios_fixup_peer_bridges
> 324 pci_sanity_check
> 410 cdrom_number_of_slots
> 410 cdrom_select_disc
> 410 cdrom_slot_status
> 444 cdrom_ioctl
> 47c ide_unregister
> 490 inflate_fixed
> 524 inflate_dynamic
> 5a8 huft_build
> 73c sanitize_e820_map

2003-11-20 10:06:11

by Jörn Engel

[permalink] [raw]
Subject: Re: Infinite do_IRQ

On Wed, 19 November 2003 23:12:58 -0500, Zwane Mwaikambo wrote:
> On Wed, 19 Nov 2003, Sumit Pandya wrote:
>
> > Hi Zwane,
> > I'm sorry to bother you again. Following is output from
> > http://kernelnewbies.org/scripts/check-stack.sh
>
> I think it'd really be easier to back out those patches one by one until
> your messages stop happening. Otherwise i'm not quite sure which one is
> really affecting you.

Agreed. Still, tiny comments on the below.

> > 410 cdrom_number_of_slots
> > 410 cdrom_select_disc
> > 410 cdrom_slot_status
> > 444 cdrom_ioctl

Why do you use the mess from drivers/cdrom? Unless you have one of
those old cdroms attached to your soundcard, better get rid of them.

> > 47c ide_unregister
> > 490 inflate_fixed
> > 524 inflate_dynamic
> > 5a8 huft_build

Old known problems, no bug reports about the ever. Should be safe.

> > 73c sanitize_e820_map

This one is new to me. Does it exist in plain 2.4.22 as well?


Anyway, backing out patches is the way to go.


J?rn

--
Happiness isn't having what you want, it's wanting what you have.
-- unknown

2003-11-20 10:08:37

by Jörn Engel

[permalink] [raw]
Subject: Re: Infinite do_IRQ

On Wed, 19 November 2003 12:58:48 +0530, Sumit Pandya wrote:
>
> I'd also like to draw your attention on one more patch by Joern Engel
> (He is in CC list)
> http://wh.fh-wedel.de/~joern/software/kernel/je/24/.patches/stack_overflow.p
> atch
> Are these patches safe to apply? What could be pros and cons if these
> patches are applied into 2.4.22 kernel.

That patch intends to make things worse, not better. Just a test tool
to find problems early. Should be pointless for you.

J?rn

--
Fools ignore complexity. Pragmatists suffer it.
Some can avoid it. Geniuses remove it.
-- Perlis's Programming Proverb #58, SIGPLAN Notices, Sept. 1982