2003-01-03 10:31:18

by Dave Jones

[permalink] [raw]
Subject: odd phenomenon.

Something strange I've noticed on all recent 2.4 and 2.5 kernels.

If I start galeon whilst I've got a bk pull in operation, the
galeon process starts, opens its window, and then dies instantly.
Starting it a second time works.

Its not OOM, as theres plenty of free RAM, and half gig of free (unused) swap.

It's almost 100% reproducable here. Only seen it do it on this box
though which is a P4 with HT, so it could be SMP related..

Ideas ?

Dave

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs


2003-01-03 10:39:49

by William Lee Irwin III

[permalink] [raw]
Subject: Re: odd phenomenon.

On Fri, Jan 03, 2003 at 10:38:16AM +0000, Dave Jones wrote:
> Something strange I've noticed on all recent 2.4 and 2.5 kernels.
> If I start galeon whilst I've got a bk pull in operation, the
> galeon process starts, opens its window, and then dies instantly.
> Starting it a second time works.
> Its not OOM, as theres plenty of free RAM, and half gig of free (unused) swap.
> It's almost 100% reproducable here. Only seen it do it on this box
> though which is a P4 with HT, so it could be SMP related..
> Ideas ?
> Dave

(1) strace?
(2) kgdb breakpoint on exit(), conditional on current->comm?
(3) exit code?


Thanks,
Bill

2003-01-03 10:51:28

by ZHAO Wei

[permalink] [raw]
Subject: Re: odd phenomenon.

Dave Jones wrote:
> Something strange I've noticed on all recent 2.4 and 2.5 kernels.
>
> If I start galeon whilst I've got a bk pull in operation, the
> galeon process starts, opens its window, and then dies instantly.
> Starting it a second time works.
>
> Its not OOM, as theres plenty of free RAM, and half gig of free (unused) swap.
>
> It's almost 100% reproducable here. Only seen it do it on this box
> though which is a P4 with HT, so it could be SMP related..

I used to have a small system with 96M RAM and no swap, only OpenSSH
and bash and some kernel threads were running, when I got a big BK
pull, it would catch sig 11 and die. Maybe this is unrelated.
Indeed, at first I had only 64M RAM installed, only after some sig
11, had I got more RAM installed. But this probably has nothing to
do with your situation.

2003-01-03 11:04:33

by William Lee Irwin III

[permalink] [raw]
Subject: Re: odd phenomenon.

At some point in the past, Dave Jones wrote:
>>> It's almost 100% reproducable here. Only seen it do it on this box
>>> though which is a P4 with HT, so it could be SMP related..
>>> Ideas ?

On Fri, Jan 03, 2003 at 02:48:09AM -0800, William Lee Irwin III wrote:
>> (1) strace?

On Fri, Jan 03, 2003 at 11:09:01AM +0000, Dave Jones wrote:
> That was my first thought. Everything works as expected though
> when you try to strace it.

Highly unusual. In-kernel tracing seems to be in order. Can you
describe a more complete "reproduction suite" (esp. app/lib versions)?


Thanks,
Bill

2003-01-03 11:02:40

by Dave Jones

[permalink] [raw]
Subject: Re: odd phenomenon.

On Fri, Jan 03, 2003 at 02:48:09AM -0800, William Lee Irwin III wrote:
> > It's almost 100% reproducable here. Only seen it do it on this box
> > though which is a P4 with HT, so it could be SMP related..
> > Ideas ?
> (1) strace?

That was my first thought. Everything works as expected though
when you try to strace it.

Dave

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2003-01-03 11:25:23

by Dave Jones

[permalink] [raw]
Subject: Re: odd phenomenon.

On Fri, Jan 03, 2003 at 03:12:53AM -0800, William Lee Irwin III wrote:
> > That was my first thought. Everything works as expected though
> > when you try to strace it.
> Highly unusual.

Indeed.

> In-kernel tracing seems to be in order. Can you
> describe a more complete "reproduction suite" (esp. app/lib versions)?

Galeon 1.2.7, Bitkeeper 3.0 20021011025136

I had until today thought that this was a 2.5 only bug, but this
box rebooted back into a 2.4 kernel yesterday for the first time
in ages. (Running 2.4.20-rc4 currently)

Dave

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2003-01-03 18:46:06

by Robert Love

[permalink] [raw]
Subject: Re: odd phenomenon.

On Fri, 2003-01-03 at 06:31, Dave Jones wrote:

> Galeon 1.2.7, Bitkeeper 3.0 20021011025136
>
> I had until today thought that this was a 2.5 only bug, but this
> box rebooted back into a 2.4 kernel yesterday for the first time
> in ages. (Running 2.4.20-rc4 currently)

Galeon 1.3.1 CVS here, so maybe its a bit different, but zero problems
on 2.5 - and I use Galeon constantly :)

Anything I can do to try to better reproduce it?

Robert Love

2003-01-04 10:37:10

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: odd phenomenon.

On Fri, 2003-01-03 at 11:38, Dave Jones wrote:
> Something strange I've noticed on all recent 2.4 and 2.5 kernels.
>
> If I start galeon whilst I've got a bk pull in operation, the
> galeon process starts, opens its window, and then dies instantly.
> Starting it a second time works.
>
> Its not OOM, as theres plenty of free RAM, and half gig of free (unused) swap.
>
> It's almost 100% reproducable here. Only seen it do it on this box
> though which is a P4 with HT, so it could be SMP related..

Happens all the time here too (ppc32), and did so for ages, with 2.4
(didn't specifically notice it with 2.5 yet, but I rarely use galeon
when testing 2.5 ;)

Typically happens with any kind of intense disk activity slowing down
galeon's launch process. (Not only bk, but also for example updatedb
running in the background).

I'm currently running galeon 1.2.6 (happened with all earlier versions
at least).

Ben.




2003-01-04 11:51:07

by Dave Jones

[permalink] [raw]
Subject: Re: odd phenomenon.

On Sat, Jan 04, 2003 at 11:48:33AM +0100, Benjamin Herrenschmidt wrote:

> > It's almost 100% reproducable here. Only seen it do it on this box
> > though which is a P4 with HT, so it could be SMP related..
>
> Happens all the time here too (ppc32), and did so for ages, with 2.4
> (didn't specifically notice it with 2.5 yet, but I rarely use galeon
> when testing 2.5 ;)

Ha! Conclusive proof I'm not losing my marbles.

> Typically happens with any kind of intense disk activity slowing down
> galeon's launch process. (Not only bk, but also for example updatedb
> running in the background).

Maybe, but bk was the only disk-thrashing type app I regularly
have running when I've tried to reproduce this.

Is your PPC32 box SMP ? I'm wondering why I don't see it on my
athlon/P3 boxes, just on my dual P4.

Dave

--
| Dave Jones. http://www.codemonkey.org.uk

2003-01-04 12:02:14

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: odd phenomenon.

On Sat, 2003-01-04 at 12:57, Dave Jones wrote:
> Maybe, but bk was the only disk-thrashing type app I regularly
> have running when I've tried to reproduce this.
>
> Is your PPC32 box SMP ? I'm wondering why I don't see it on my
> athlon/P3 boxes, just on my dual P4.

No, it happens on my UP laptop as well. Hadess suggested it could be
yet-another gconf race in gnome

Ben.



2003-01-04 13:16:57

by Stephen Rothwell

[permalink] [raw]
Subject: Re: odd phenomenon.

From: Dave Jones <[email protected]>
>
> On Sat, Jan 04, 2003 at 11:48:33AM +0100, Benjamin Herrenschmidt wrote:
>
> > Typically happens with any kind of intense disk activity slowing down
> > galeon's launch process. (Not only bk, but also for example updatedb
> > running in the background).
>
> Maybe, but bk was the only disk-thrashing type app I regularly
> have running when I've tried to reproduce this.
>
> Is your PPC32 box SMP ? I'm wondering why I don't see it on my
> athlon/P3 boxes, just on my dual P4.

I see this every morning on my laptop. Anancron starts my overnight
cron jobs (mostly find across the whole disk). So, it is not SMP
specific. I assumed there was some sort of timeout in galeon to make
sure it starts within a particular amount of time or just aborts it.
Always works the second time.

This is on 2.4.19-pre8 (usually) (I must build a newer kernel :-)).

Cheers,
Stephen Rothwell