2002-07-17 20:07:40

by Steven Cole

[permalink] [raw]
Subject: 2.5.25-dj2, kernel BUG at dcache.c:361

While running 2.5.25-dj2 and dbench with increasing numbers of clients,
my test machine locked up with the following message:

kernel BUG at dcache.c:361!

I tried to copy down the following register dump but was unable to.
Nothing interesting saved in /var/log/messages.

This is fairly repeatable in that it happens with running dbench with
more than 32 clients. I saw it once with as few as 6 clients. After
getting weary of running fsck on an ext2 /home partition, I added a
journal to /home and mounted it as ext3. With /home (where dbench is
running) mounted as ext3, I got the following message just before the
BUG:

EXT3-fs error (device sd(8,8): ext3_free_blocks: freeing blocks not in
datazone - block = 7939096, count = 13.

The test machine is a dual p3 1mb memory, scsi, 2.5.25-dj2 SMP kernel.

Steven




2002-07-17 20:13:55

by Dave Jones

[permalink] [raw]
Subject: Re: 2.5.25-dj2, kernel BUG at dcache.c:361

On Wed, Jul 17, 2002 at 02:06:50PM -0600, Steven Cole wrote:
> While running 2.5.25-dj2 and dbench with increasing numbers of clients,
> my test machine locked up with the following message:
>
> kernel BUG at dcache.c:361!

There are some -dj specific hacks to dcache.c to convert to use
list_t types. Which from memory, I think William Lee Irwin did.
(wli, can you double check those just in case there's either an
obvious thinko, or a mismerge if you get time ?)

Failing that, this could be something that also affects mainline
I think.

Dave

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2002-07-17 20:27:09

by Steven Cole

[permalink] [raw]
Subject: Re: 2.5.25-dj2, kernel BUG at dcache.c:361

On Wed, 2002-07-17 at 14:16, Dave Jones wrote:
> On Wed, Jul 17, 2002 at 02:06:50PM -0600, Steven Cole wrote:
> > While running 2.5.25-dj2 and dbench with increasing numbers of clients,
> > my test machine locked up with the following message:
> >
> > kernel BUG at dcache.c:361!
>
> There are some -dj specific hacks to dcache.c to convert to use
> list_t types. Which from memory, I think William Lee Irwin did.
> (wli, can you double check those just in case there's either an
> obvious thinko, or a mismerge if you get time ?)
>
> Failing that, this could be something that also affects mainline
> I think.

I didn't explicitly mention it, but I have successfully run recent
kernels (2.5.2[4,5,6]) with and without the rmap patches with up to 64
dbench clients with no problems observed. Also 2.4.19-rc[1,2] works
well. 2.5.25-dj2 is the only kernel which has had this dcache.c BUG.
I didn't test 2.5.25-dj1.

Steven

2002-07-17 23:05:09

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5.25-dj2, kernel BUG at dcache.c:361

On Wed, Jul 17, 2002 at 02:06:50PM -0600, Steven Cole wrote:
>> While running 2.5.25-dj2 and dbench with increasing numbers of clients,
>> my test machine locked up with the following message:
>> kernel BUG at dcache.c:361!

On Wed, Jul 17, 2002 at 10:16:40PM +0200, Dave Jones wrote:
> There are some -dj specific hacks to dcache.c to convert to use
> list_t types. Which from memory, I think William Lee Irwin did.
> (wli, can you double check those just in case there's either an
> obvious thinko, or a mismerge if you get time ?)
> Failing that, this could be something that also affects mainline
> I think.

I'm bringing it up on one of my testboxen and debugging it now.


Cheers,
Bill

2002-07-18 01:28:10

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.5.25-dj2, kernel BUG at dcache.c:361

On Wed, Jul 17, 2002 at 02:06:50PM -0600, Steven Cole wrote:
>> While running 2.5.25-dj2 and dbench with increasing numbers of clients,
>> my test machine locked up with the following message:
>>
>> kernel BUG at dcache.c:361!

On Wed, Jul 17, 2002 at 10:16:40PM +0200, Dave Jones wrote:
> There are some -dj specific hacks to dcache.c to convert to use
> list_t types. Which from memory, I think William Lee Irwin did.
> (wli, can you double check those just in case there's either an
> obvious thinko, or a mismerge if you get time ?)
> Failing that, this could be something that also affects mainline
> I think.

This is getting real ugly real fast. It looks like there's a lot more
to debug than the dcache. I'll stick around for a bit and send a few
other patches your way as well.


Cheers,
Bill