2005-04-27 09:52:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: [BUG] 2.6.12-rc3: unkillable java process in TASK_RUNNING on AMD64

Hi,

I'm having a problem with 2.6.12-rc3 and the Java VM (from SuSE 9.2)
on AMD64. Namely, after trying to open a web page containing a Java
applet, my browser starts a java process that takes almost 100% of the CPU
(system load, according to gkrellm) and cannot be killed (even by root,
although it executes with a non-root UID). Apparently, it is in TASK_RUNNING
(according to ps).

The problem is 100% reproducible (it is enough to visit
http://java.sun.com/docs/books/tutorial/getStarted/index.html to trigger it)
and it does not depend on the web browser used.

The Java JRE version is:

java version "1.4.2_06"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_06-b03)
Java HotSpot(TM) Client VM (build 1.4.2_06-b03, mixed mode)

(I guess it's 32-bit, but I'm not quite sure) and I've installed it from the
SuSE 9.2 RPM.

It really is a show stopper to me, so please advise.

Greets,
Rafael


--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"


2005-04-27 10:20:43

by Andrew Morton

[permalink] [raw]
Subject: Re: [BUG] 2.6.12-rc3: unkillable java process in TASK_RUNNING on AMD64

"Rafael J. Wysocki" <[email protected]> wrote:
>
> Hi,
>
> I'm having a problem with 2.6.12-rc3 and the Java VM (from SuSE 9.2)
> on AMD64. Namely, after trying to open a web page containing a Java
> applet, my browser starts a java process that takes almost 100% of the CPU
> (system load, according to gkrellm) and cannot be killed (even by root,
> although it executes with a non-root UID). Apparently, it is in TASK_RUNNING
> (according to ps).
>
> The problem is 100% reproducible (it is enough to visit
> http://java.sun.com/docs/books/tutorial/getStarted/index.html to trigger it)
> and it does not depend on the web browser used.
>
> The Java JRE version is:
>
> java version "1.4.2_06"
> Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_06-b03)
> Java HotSpot(TM) Client VM (build 1.4.2_06-b03, mixed mode)
>
> (I guess it's 32-bit, but I'm not quite sure) and I've installed it from the
> SuSE 9.2 RPM.
>
> It really is a show stopper to me, so please advise.

Where is it running?

You can tell this from a kernel profile, or by using sysrq-P five or ten
times then looking at the output.

2005-04-27 11:01:53

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: [BUG] 2.6.12-rc3: unkillable java process in TASK_RUNNING on AMD64

In article <[email protected]> you wrote:
> I'm having a problem with 2.6.12-rc3 and the Java VM (from SuSE 9.2)
> on AMD64.

Java sux sometimes pretty much. Why it cannot be killed? is the system too
slow for X to responde, or have you been able to use kill -9? Maybe it
spawns threads too fast, try to "killall -9 java".

Greetings
Bernd

2005-04-27 11:05:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUG] 2.6.12-rc3: unkillable java process in TASK_RUNNING on AMD64

On Wednesday, 27 of April 2005 12:19, Andrew Morton wrote:
> "Rafael J. Wysocki" <[email protected]> wrote:
> >
> > Hi,
> >
> > I'm having a problem with 2.6.12-rc3 and the Java VM (from SuSE 9.2)
> > on AMD64. Namely, after trying to open a web page containing a Java
> > applet, my browser starts a java process that takes almost 100% of the CPU
> > (system load, according to gkrellm) and cannot be killed (even by root,
> > although it executes with a non-root UID). Apparently, it is in TASK_RUNNING
> > (according to ps).
> >
> > The problem is 100% reproducible (it is enough to visit
> > http://java.sun.com/docs/books/tutorial/getStarted/index.html to trigger it)
> > and it does not depend on the web browser used.
> >
> > The Java JRE version is:
> >
> > java version "1.4.2_06"
> > Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_06-b03)
> > Java HotSpot(TM) Client VM (build 1.4.2_06-b03, mixed mode)
> >
> > (I guess it's 32-bit, but I'm not quite sure) and I've installed it from the
> > SuSE 9.2 RPM.
> >
> > It really is a show stopper to me, so please advise.
>
> Where is it running?
>
> You can tell this from a kernel profile, or by using sysrq-P five or ten
> times then looking at the output.

>From sysrq-P, I get this:

Pid: 11073, comm: java Not tainted 2.6.12-rc3
RIP: 0010:[<ffffffff8010f675>] <ffffffff8010f675>{retint_signal+20}
RSP: 0018:ffff810012d6ff58 EFLAGS: 00000282
RAX: 0000000000020000 RBX: ffff810010868820 RCX: ffff810012d6e000
RDX: 0000000000020000 RSI: 0000000000000000 RDI: ffff810012d6ff58
RBP: 000000a30c153a4a R08: ffff810012d6e000 R09: ffffffff804c6068
R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff804ccd40
R13: ffff810010868820 R14: ffff81002cff2cf0 R15: ffffffff8010d3a7
FS: 00002aaaae6389c0(0000) GS:ffffffff8054a600(0063) knlGS:00000000556c9080
CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 00002aaaaabab000 CR3: 0000000012930000 CR4: 00000000000006e0

Call Trace:<ffffffff8010f697>{retint_signal+54}

all the time.

I've also found out that in fact the problem is not 100% reproducible, but it is much
more likely to be reproduced if the CPU is heavily loaded.

Greets,
Rafael


--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"

2005-04-27 11:13:31

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUG] 2.6.12-rc3: unkillable java process in TASK_RUNNING on AMD64

On Wednesday, 27 of April 2005 13:01, Bernd Eckenfels wrote:
> In article <[email protected]> you wrote:
> > I'm having a problem with 2.6.12-rc3 and the Java VM (from SuSE 9.2)
> > on AMD64.
>
> Java sux sometimes pretty much. Why it cannot be killed? is the system too
> slow for X to responde, or have you been able to use kill -9? Maybe it
> spawns threads too fast, try to "killall -9 java".

No. It is exactly _one_ Java process that _does_ _not_ _react_ to kill -9. Apart
from this, the system is responsive and the other processes get their CPU
share as usual (eg if I run another process that normally would get ~100%
of the CPU, now it gets 50% of it and the rest is "used" for the Java).

It looks like a kernel bug to me this time.

Rafael


--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"

2005-04-27 11:57:10

by Andrew Morton

[permalink] [raw]
Subject: Re: [BUG] 2.6.12-rc3: unkillable java process in TASK_RUNNING on AMD64

"Rafael J. Wysocki" <[email protected]> wrote:
>
> On Wednesday, 27 of April 2005 12:19, Andrew Morton wrote:
> > "Rafael J. Wysocki" <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > I'm having a problem with 2.6.12-rc3 and the Java VM (from SuSE 9.2)
> > > on AMD64. Namely, after trying to open a web page containing a Java
> > > applet, my browser starts a java process that takes almost 100% of the CPU
> > > (system load, according to gkrellm) and cannot be killed (even by root,
> > > although it executes with a non-root UID). Apparently, it is in TASK_RUNNING
> > > (according to ps).
> > >
> > > The problem is 100% reproducible (it is enough to visit
> > > http://java.sun.com/docs/books/tutorial/getStarted/index.html to trigger it)
> > > and it does not depend on the web browser used.
> > >
> > > The Java JRE version is:
> > >
> > > java version "1.4.2_06"
> > > Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_06-b03)
> > > Java HotSpot(TM) Client VM (build 1.4.2_06-b03, mixed mode)
> > >
> > > (I guess it's 32-bit, but I'm not quite sure) and I've installed it from the
> > > SuSE 9.2 RPM.
> > >
> > > It really is a show stopper to me, so please advise.
> >
> > Where is it running?
> >
> > You can tell this from a kernel profile, or by using sysrq-P five or ten
> > times then looking at the output.
>
> >From sysrq-P, I get this:
>
> Pid: 11073, comm: java Not tainted 2.6.12-rc3
> RIP: 0010:[<ffffffff8010f675>] <ffffffff8010f675>{retint_signal+20}
> RSP: 0018:ffff810012d6ff58 EFLAGS: 00000282
> RAX: 0000000000020000 RBX: ffff810010868820 RCX: ffff810012d6e000
> RDX: 0000000000020000 RSI: 0000000000000000 RDI: ffff810012d6ff58
> RBP: 000000a30c153a4a R08: ffff810012d6e000 R09: ffffffff804c6068
> R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff804ccd40
> R13: ffff810010868820 R14: ffff81002cff2cf0 R15: ffffffff8010d3a7
> FS: 00002aaaae6389c0(0000) GS:ffffffff8054a600(0063) knlGS:00000000556c9080
> CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
> CR2: 00002aaaaabab000 CR3: 0000000012930000 CR4: 00000000000006e0
>
> Call Trace:<ffffffff8010f697>{retint_signal+54}
>
> all the time.

All the time? Exactly the same? If you could do something crude like hit
sysrq-P 100 times then do `dmesg|grep RIP' we could possibly determine
whether things are indeed stuck in that potential infinite loop in there.

I assume you're using CONFIG_PREEMPT?

It'd be interesting to know the interrupt rate and context switch rate
which this is going on.


> I've also found out that in fact the problem is not 100% reproducible, but it is much
> more likely to be reproduced if the CPU is heavily loaded.


2005-04-27 12:13:24

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUG] 2.6.12-rc3: unkillable java process in TASK_RUNNING on AMD64

On Wednesday, 27 of April 2005 13:55, Andrew Morton wrote:
> "Rafael J. Wysocki" <[email protected]> wrote:
> >
> > On Wednesday, 27 of April 2005 12:19, Andrew Morton wrote:
> > > "Rafael J. Wysocki" <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I'm having a problem with 2.6.12-rc3 and the Java VM (from SuSE 9.2)
> > > > on AMD64. Namely, after trying to open a web page containing a Java
> > > > applet, my browser starts a java process that takes almost 100% of the CPU
> > > > (system load, according to gkrellm) and cannot be killed (even by root,
> > > > although it executes with a non-root UID). Apparently, it is in TASK_RUNNING
> > > > (according to ps).
> > > >
> > > > The problem is 100% reproducible (it is enough to visit
> > > > http://java.sun.com/docs/books/tutorial/getStarted/index.html to trigger it)
> > > > and it does not depend on the web browser used.
> > > >
> > > > The Java JRE version is:
> > > >
> > > > java version "1.4.2_06"
> > > > Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_06-b03)
> > > > Java HotSpot(TM) Client VM (build 1.4.2_06-b03, mixed mode)
> > > >
> > > > (I guess it's 32-bit, but I'm not quite sure) and I've installed it from the
> > > > SuSE 9.2 RPM.
> > > >
> > > > It really is a show stopper to me, so please advise.
> > >
> > > Where is it running?
> > >
> > > You can tell this from a kernel profile, or by using sysrq-P five or ten
> > > times then looking at the output.
> >
> > >From sysrq-P, I get this:
> >
> > Pid: 11073, comm: java Not tainted 2.6.12-rc3
> > RIP: 0010:[<ffffffff8010f675>] <ffffffff8010f675>{retint_signal+20}
> > RSP: 0018:ffff810012d6ff58 EFLAGS: 00000282
> > RAX: 0000000000020000 RBX: ffff810010868820 RCX: ffff810012d6e000
> > RDX: 0000000000020000 RSI: 0000000000000000 RDI: ffff810012d6ff58
> > RBP: 000000a30c153a4a R08: ffff810012d6e000 R09: ffffffff804c6068
> > R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff804ccd40
> > R13: ffff810010868820 R14: ffff81002cff2cf0 R15: ffffffff8010d3a7
> > FS: 00002aaaae6389c0(0000) GS:ffffffff8054a600(0063) knlGS:00000000556c9080
> > CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
> > CR2: 00002aaaaabab000 CR3: 0000000012930000 CR4: 00000000000006e0
> >
> > Call Trace:<ffffffff8010f697>{retint_signal+54}
> >
> > all the time.
>
> All the time? Exactly the same?

Well, to be precise (in the order of appearence):

RIP: 0010:[<ffffffff8010f666>] <ffffffff8010f666>{retint_signal+5}
Call Trace:<ffffffff8010f697>{retint_signal+54}

RIP: 0010:[<ffffffff8010f666>] <ffffffff8010f666>{retint_signal+5}
Call Trace:<ffffffff8010f697>{retint_signal+54}

RIP: 0010:[<ffffffff8010f666>] <ffffffff8010f666>{retint_signal+5}
Call Trace:<ffffffff8010f697>{retint_signal+54}

RIP: 0010:[<ffffffff8013f928>] <ffffffff8013f928>{__do_softirq+72}
Call Trace: <IRQ> <ffffffff80139a7b>{profile_tick+75} <ffffffff8013f9c5>{do_softirq+53}
<ffffffff80111ab7>{do_IRQ+71} <ffffffff8010f5d5>{ret_from_intr+0}
<EOI> <ffffffff8010d3a7>{__switch_to+263} <ffffffff8010f666>{retint_signal+5}
<ffffffff8010f697>{retint_signal+54}

RIP: 0010:[<ffffffff8010f666>] <ffffffff8010f666>{retint_signal+5}
Call Trace:<ffffffff8010f697>{retint_signal+54}

RIP: 0010:[<ffffffff8010f666>] <ffffffff8010f666>{retint_signal+5}
Call Trace:<ffffffff8010f697>{retint_signal+54}

RIP: 0010:[<ffffffff8010e6a0>] <ffffffff8010e6a0>{do_notify_resume+48}
Call Trace:<ffffffff8010f697>{retint_signal+54}

RIP: 0010:[<ffffffff8010f675>] <ffffffff8010f675>{retint_signal+20}
Call Trace:<ffffffff8010f697>{retint_signal+54}


> If you could do something crude like hit sysrq-P 100 times then do
> `dmesg|grep RIP' we could possibly determine whether things are indeed stuck
> in that potential infinite loop in there.

OK, the result is attached (I think things really are there).

> I assume you're using CONFIG_PREEMPT?

No ...

> It'd be interesting to know the interrupt rate and context switch rate
> which this is going on.

OK, but could you please tell me how to get these numbers?

Greets,
Rafael


--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"


Attachments:
(No filename) (4.12 kB)
sysrq-P-RIP.log (9.23 kB)
Download all attachments

2005-04-27 12:16:59

by Sipos Ferenc

[permalink] [raw]
Subject: Re: [BUG] 2.6.12-rc3: unkillable java process in TASK_RUNNING on AMD64

Hi!

I'm also having the above problem with debian sarge, 2.6.12-rc3,
blackdown java 1.4.2 (also sun's 1.5.0), and preempt is not enabled.
Attached my kernel config. The problem arises not with every java
applet, e.g. http://www.playsite.net online gaming applets wotk fine, but the
gportal.hu site kills the system, only xfree's mouse cursor is
responsing, no alt-f1, no acpi button.

Feri


2005-04-27, sze keltez?ssel 04.55-kor Andrew Morton ezt ?rta:
> "Rafael J. Wysocki" <[email protected]> wrote:
> >
> > On Wednesday, 27 of April 2005 12:19, Andrew Morton wrote:
> > > "Rafael J. Wysocki" <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I'm having a problem with 2.6.12-rc3 and the Java VM (from SuSE 9.2)
> > > > on AMD64. Namely, after trying to open a web page containing a Java
> > > > applet, my browser starts a java process that takes almost 100% of the CPU
> > > > (system load, according to gkrellm) and cannot be killed (even by root,
> > > > although it executes with a non-root UID). Apparently, it is in TASK_RUNNING
> > > > (according to ps).
> > > >
> > > > The problem is 100% reproducible (it is enough to visit
> > > > http://java.sun.com/docs/books/tutorial/getStarted/index.html to trigger it)
> > > > and it does not depend on the web browser used.
> > > >
> > > > The Java JRE version is:
> > > >
> > > > java version "1.4.2_06"
> > > > Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_06-b03)
> > > > Java HotSpot(TM) Client VM (build 1.4.2_06-b03, mixed mode)
> > > >
> > > > (I guess it's 32-bit, but I'm not quite sure) and I've installed it from the
> > > > SuSE 9.2 RPM.
> > > >
> > > > It really is a show stopper to me, so please advise.
> > >
> > > Where is it running?
> > >
> > > You can tell this from a kernel profile, or by using sysrq-P five or ten
> > > times then looking at the output.
> >
> > >From sysrq-P, I get this:
> >
> > Pid: 11073, comm: java Not tainted 2.6.12-rc3
> > RIP: 0010:[<ffffffff8010f675>] <ffffffff8010f675>{retint_signal+20}
> > RSP: 0018:ffff810012d6ff58 EFLAGS: 00000282
> > RAX: 0000000000020000 RBX: ffff810010868820 RCX: ffff810012d6e000
> > RDX: 0000000000020000 RSI: 0000000000000000 RDI: ffff810012d6ff58
> > RBP: 000000a30c153a4a R08: ffff810012d6e000 R09: ffffffff804c6068
> > R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff804ccd40
> > R13: ffff810010868820 R14: ffff81002cff2cf0 R15: ffffffff8010d3a7
> > FS: 00002aaaae6389c0(0000) GS:ffffffff8054a600(0063) knlGS:00000000556c9080
> > CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
> > CR2: 00002aaaaabab000 CR3: 0000000012930000 CR4: 00000000000006e0
> >
> > Call Trace:<ffffffff8010f697>{retint_signal+54}
> >
> > all the time.
>
> All the time? Exactly the same? If you could do something crude like hit
> sysrq-P 100 times then do `dmesg|grep RIP' we could possibly determine
> whether things are indeed stuck in that potential infinite loop in there.
>
> I assume you're using CONFIG_PREEMPT?
>
> It'd be interesting to know the interrupt rate and context switch rate
> which this is going on.
>
>
> > I've also found out that in fact the problem is not 100% reproducible, but it is much
> > more likely to be reproduced if the CPU is heavily loaded.
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


Attachments:
kernelmono (28.81 kB)

2005-04-27 12:39:38

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: [BUG] 2.6.12-rc3: unkillable java process in TASK_RUNNING on AMD64

On Wed, Apr 27, 2005 at 01:13:31PM +0200, Rafael J. Wysocki wrote:
> It looks like a kernel bug to me this time.

Which kernel works? Have you tried a stable one? (2.6.11.7 is recent)

Gruss
Bernd
--
(OO) -- Bernd_Eckenfels@M?rscher_Strasse_8.76185Karlsruhe.de --
( .. ) ecki@{inka.de,linux.de,debian.org} http://www.eckes.org/
o--o 1024D/E383CD7E eckes@IRCNet v:+497211603874 f:+497211606754
(O____O) When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl!

2005-04-27 12:54:58

by Alexander Nyberg

[permalink] [raw]
Subject: Re: [BUG] 2.6.12-rc3: unkillable java process in TASK_RUNNING on AMD64

> > > >From sysrq-P, I get this:
> > >
> > > Pid: 11073, comm: java Not tainted 2.6.12-rc3
> > > RIP: 0010:[<ffffffff8010f675>] <ffffffff8010f675>{retint_signal+20}
> > > RSP: 0018:ffff810012d6ff58 EFLAGS: 00000282
> > > RAX: 0000000000020000 RBX: ffff810010868820 RCX: ffff810012d6e000
> > > RDX: 0000000000020000 RSI: 0000000000000000 RDI: ffff810012d6ff58
> > > RBP: 000000a30c153a4a R08: ffff810012d6e000 R09: ffffffff804c6068
> > > R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff804ccd40
> > > R13: ffff810010868820 R14: ffff81002cff2cf0 R15: ffffffff8010d3a7
> > > FS: 00002aaaae6389c0(0000) GS:ffffffff8054a600(0063) knlGS:00000000556c9080
> > > CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
> > > CR2: 00002aaaaabab000 CR3: 0000000012930000 CR4: 00000000000006e0
> > >
> > > Call Trace:<ffffffff8010f697>{retint_signal+54}
> > >
> > > all the time.

My mind tells me this might be the problem but statistics also tell me
processes should get stuck all the time here...

In retint_signal %rdi is destroyed, need to jump to the label above
retint_check that sets %edi back to $_TIF_WORK_MASK

Signed-off-by: Alexander Nyberg <[email protected]>

Index: linux-2.6/arch/x86_64/kernel/entry.S
===================================================================
--- linux-2.6.orig/arch/x86_64/kernel/entry.S 2005-04-27 13:08:50.000000000 +0200
+++ linux-2.6/arch/x86_64/kernel/entry.S 2005-04-27 14:43:20.000000000 +0200
@@ -491,7 +491,7 @@
RESTORE_REST
cli
GET_THREAD_INFO(%rcx)
- jmp retint_check
+ jmp retint_with_reschedule

#ifdef CONFIG_PREEMPT
/* Returning to kernel space. Check if we need preemption */


2005-04-27 13:56:12

by Andi Kleen

[permalink] [raw]
Subject: Re: [BUG] 2.6.12-rc3: unkillable java process in TASK_RUNNING on AMD64


Does this patch fix the problem?

Initialize workmask correct on interrupt signal handling

Readd missing clis in the interrupt return path.

Signed-off-by: Andi Kleen <[email protected]>



diff -u linux-2.6.12rc3/arch/x86_64/kernel/entry.S-o linux-2.6.12rc3/arch/x86_64/kernel/entry.S
--- linux-2.6.12rc3/arch/x86_64/kernel/entry.S-o 2005-04-22 12:48:11.000000000 +0200
+++ linux-2.6.12rc3/arch/x86_64/kernel/entry.S 2005-04-27 15:52:49.305183345 +0200
@@ -296,6 +296,7 @@
call syscall_trace_leave
popq %rdi
andl $~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP),%edi
+ cli
jmp int_restore_rest

int_signal:
@@ -307,6 +308,7 @@
1: movl $_TIF_NEED_RESCHED,%edi
int_restore_rest:
RESTORE_REST
+ cli
jmp int_with_check
CFI_ENDPROC

@@ -490,7 +492,8 @@
call do_notify_resume
RESTORE_REST
cli
- GET_THREAD_INFO(%rcx)
+ GET_THREAD_INFO(%rcx)
+ movl $_TIF_WORK_MASK,%edi
jmp retint_check

#ifdef CONFIG_PREEMPT

2005-04-27 14:04:22

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUG] 2.6.12-rc3: unkillable java process in TASK_RUNNING on AMD64

On Wednesday, 27 of April 2005 14:54, Alexander Nyberg wrote:
> > > > >From sysrq-P, I get this:
> > > >
> > > > Pid: 11073, comm: java Not tainted 2.6.12-rc3
> > > > RIP: 0010:[<ffffffff8010f675>] <ffffffff8010f675>{retint_signal+20}
> > > > RSP: 0018:ffff810012d6ff58 EFLAGS: 00000282
> > > > RAX: 0000000000020000 RBX: ffff810010868820 RCX: ffff810012d6e000
> > > > RDX: 0000000000020000 RSI: 0000000000000000 RDI: ffff810012d6ff58
> > > > RBP: 000000a30c153a4a R08: ffff810012d6e000 R09: ffffffff804c6068
> > > > R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff804ccd40
> > > > R13: ffff810010868820 R14: ffff81002cff2cf0 R15: ffffffff8010d3a7
> > > > FS: 00002aaaae6389c0(0000) GS:ffffffff8054a600(0063) knlGS:00000000556c9080
> > > > CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
> > > > CR2: 00002aaaaabab000 CR3: 0000000012930000 CR4: 00000000000006e0
> > > >
> > > > Call Trace:<ffffffff8010f697>{retint_signal+54}
> > > >
> > > > all the time.
>
> My mind tells me this might be the problem but statistics also tell me
> processes should get stuck all the time here...
>
> In retint_signal %rdi is destroyed, need to jump to the label above
> retint_check that sets %edi back to $_TIF_WORK_MASK
>
> Signed-off-by: Alexander Nyberg <[email protected]>
>
> Index: linux-2.6/arch/x86_64/kernel/entry.S
> ===================================================================
> --- linux-2.6.orig/arch/x86_64/kernel/entry.S 2005-04-27 13:08:50.000000000 +0200
> +++ linux-2.6/arch/x86_64/kernel/entry.S 2005-04-27 14:43:20.000000000 +0200
> @@ -491,7 +491,7 @@
> RESTORE_REST
> cli
> GET_THREAD_INFO(%rcx)
> - jmp retint_check
> + jmp retint_with_reschedule
>
> #ifdef CONFIG_PREEMPT
> /* Returning to kernel space. Check if we need preemption */

With this patch I'm unable to reproduce the problem, though I've tried really hard. Thanks!

Greets,
Rafael


--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"

2005-04-27 14:52:24

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUG] 2.6.12-rc3: unkillable java process in TASK_RUNNING on AMD64

Hi,

On Wednesday, 27 of April 2005 15:54, Andi Kleen wrote:
>
> Does this patch fix the problem?

Yes, it does, apparently (ie I'm unable to reproduce it). :-)

> Initialize workmask correct on interrupt signal handling
>
> Readd missing clis in the interrupt return path.
>
> Signed-off-by: Andi Kleen <[email protected]>
>
>
>
> diff -u linux-2.6.12rc3/arch/x86_64/kernel/entry.S-o linux-2.6.12rc3/arch/x86_64/kernel/entry.S
> --- linux-2.6.12rc3/arch/x86_64/kernel/entry.S-o 2005-04-22 12:48:11.000000000 +0200
> +++ linux-2.6.12rc3/arch/x86_64/kernel/entry.S 2005-04-27 15:52:49.305183345 +0200
> @@ -296,6 +296,7 @@
> call syscall_trace_leave
> popq %rdi
> andl $~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP),%edi
> + cli
> jmp int_restore_rest
>
> int_signal:
> @@ -307,6 +308,7 @@
> 1: movl $_TIF_NEED_RESCHED,%edi
> int_restore_rest:
> RESTORE_REST
> + cli
> jmp int_with_check
> CFI_ENDPROC
>
> @@ -490,7 +492,8 @@
> call do_notify_resume
> RESTORE_REST
> cli
> - GET_THREAD_INFO(%rcx)
> + GET_THREAD_INFO(%rcx)
> + movl $_TIF_WORK_MASK,%edi
> jmp retint_check
>
> #ifdef CONFIG_PREEMPT

I assume that the Alexander's patch is not needed with this one?

Greets,
Rafael


--
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"

2005-04-28 17:33:14

by Randy.Dunlap

[permalink] [raw]
Subject: Re: [BUG] 2.6.12-rc3: unkillable java process in TASK_RUNNING on AMD64

On Wed, 27 Apr 2005 14:12:51 +0200
"Rafael J. Wysocki" <[email protected]> wrote:

>
> > It'd be interesting to know the interrupt rate and context switch rate
> > which this is going on.
>
> OK, but could you please tell me how to get these numbers?


Use something like the attached perl script... (a little overkill
for this).

---
~Randy


Attachments:
sysalive.pl (5.68 kB)