2001-11-28 20:05:09

by Jeremy Puhlman

[permalink] [raw]
Subject: Google Test and 2.4.16

Ok yesterday got the google tests running...The machine I ran it on
was a standard (Old) white box...Athlon k6-450 with 128 MB of
Ram....Using 256 MB of swap....The google test tries to use an
adjustable 1/2 terra byte Block size...This is unrealistic for an
embedded system or my system for that matter..So in trying to tune the
test to the system it seems they would not run unless the block size
was less then 60 MB...Not sure what the deal was...I tried turning on
memory-overcommit but no dice...

So basically I ran the test once through and every thing went fine...The

test didn't seem to really stress the system very much...

So I ran the same program 4 times, concurrently...The system did not
seem to lose any responsiveness....This did stress the vm system since
each of the processes were grabbing 60 megs...

I did find that once the 4 processes finished their runs. I ran one
more just for fun...Then the system locked up...

Jeremy


2001-11-28 20:17:30

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: Google Test and 2.4.16



On Wed, 28 Nov 2001, Jeremy Puhlman wrote:

> Ok yesterday got the google tests running...The machine I ran it on
> was a standard (Old) white box...Athlon k6-450 with 128 MB of
> Ram....Using 256 MB of swap....The google test tries to use an
> adjustable 1/2 terra byte Block size...This is unrealistic for an
> embedded system or my system for that matter..So in trying to tune the
> test to the system it seems they would not run unless the block size
> was less then 60 MB...Not sure what the deal was...I tried turning on
> memory-overcommit but no dice...
>
> So basically I ran the test once through and every thing went fine...The
>
> test didn't seem to really stress the system very much...
>
> So I ran the same program 4 times, concurrently...The system did not
> seem to lose any responsiveness....This did stress the vm system since
> each of the processes were grabbing 60 megs...
>
> I did find that once the 4 processes finished their runs. I ran one
> more just for fun...Then the system locked up...

Can you get the backtraces (with magic sysrq) of such a lockup ?

2001-11-29 02:42:57

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Google Test and 2.4.16

On Wed, Nov 28, 2001 at 12:02:59PM -0800, Jeremy Puhlman wrote:
> Ok yesterday got the google tests running...The machine I ran it on
> was a standard (Old) white box...Athlon k6-450 with 128 MB of
> Ram....Using 256 MB of swap....The google test tries to use an
> adjustable 1/2 terra byte Block size...This is unrealistic for an
> embedded system or my system for that matter..So in trying to tune the
> test to the system it seems they would not run unless the block size
> was less then 60 MB...Not sure what the deal was...I tried turning on
> memory-overcommit but no dice...
>
> So basically I ran the test once through and every thing went fine...The
>
> test didn't seem to really stress the system very much...
>
> So I ran the same program 4 times, concurrently...The system did not
> seem to lose any responsiveness....This did stress the vm system since
> each of the processes were grabbing 60 megs...
>
> I did find that once the 4 processes finished their runs. I ran one
> more just for fun...Then the system locked up...

this won't happen in 2.4.15aa1.

Andrea

2001-11-30 18:33:21

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Google Test and 2.4.16

On Fri, Nov 30, 2001 at 10:18:23AM -0800, Jeremy Puhlman wrote:
> Andrea Arcangeli wrote:
>
> > On Wed, Nov 28, 2001 at 12:02:59PM -0800, Jeremy Puhlman wrote:
> > > Ok yesterday got the google tests running...The machine I ran it on
> > > was a standard (Old) white box...Athlon k6-450 with 128 MB of
> > > Ram....Using 256 MB of swap....The google test tries to use an
> > > adjustable 1/2 terra byte Block size...This is unrealistic for an
> > > embedded system or my system for that matter..So in trying to tune the
> > > test to the system it seems they would not run unless the block size
> > > was less then 60 MB...Not sure what the deal was...I tried turning on
> > > memory-overcommit but no dice...
> > >
> > > So basically I ran the test once through and every thing went fine...The
> > >
> > > test didn't seem to really stress the system very much...
> > >
> > > So I ran the same program 4 times, concurrently...The system did not
> > > seem to lose any responsiveness....This did stress the vm system since
> > > each of the processes were grabbing 60 megs...
> > >
> > > I did find that once the 4 processes finished their runs. I ran one
> > > more just for fun...Then the system locked up...
^^^^^^^^^^^^^^^^^^^^
> >
> > this won't happen in 2.4.15aa1.
> >
> > Andrea
>
> Yes it does happen on 2.4.15aa1....Actually 2.4.15aa1 does even get through
> the first set of runs....see attached opps/sysreqs/system.map...

ah but you get an oops! That has nothing to do with the bug triggered by
the google workload. I thought "the system locked" above meant it was a
live lock like it was happening before fixing the kernel bug that was
generating the live lock with the google workload. I couldn't imagine
"the system locked" meant "I got an oops", sorry.

So let's check the Oops:

Unable to handle kernel NULL pointer dereference at virtual address
00000200
printing eip:
00000200
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<00000200>] Not tainted
EFLAGS: 00010202
eax: 00000001 ebx: c02b77a0 ecx: c8800000 edx: 00000002
esi: 00000002 edi: c10456c0 ebp: c025b4c0 esp: c7fe7f20
ds: 0018 es: 0018 ss: 0018
Process kswapd (pid: 4, stackpage=c7fe7000)
Stack: c10456a4 c0126911 c02b77a0 c7fe6000 0000060e c025b610 0000004e
0000001d
c11e7d50 c0820000 c11e7b90 00000001 0000001f c025b610 000001d0
c025b610
c0126b94 000001d0 c7fe7f88 00000020 00000020 000001d0 c0126bff
c7fe7f88
Call Trace: [<shrink_cache+0x321>] [<shrink_caches+0x64>]
[<try_to_free_pages+0x5f>] [<kswapd_balance_pgdat+0x51>]
[<kswapd_balance+0x16>]
[<kswapd+0xa1>] [<kswapd+0x0>] [<kernel_thread+0x2b>]

this sounds like faulty hardware. Are you sure your ram is not buggy?
I will start now an infinite loop of the google workload. How many runs
it takes exactly before you can reproduce btw?

Before fixing the bug I could reproduce the live lock just during the
first run. So to verify the problem was fixed I run 3/4 runs of the
workload but never more than this (mainly because of the pause() that
forced me to C^c without automating the loop, I will now drop the
pause() and start an infinite loop).

Andrea