2005-11-03 18:10:24

by Mark Fortescue

[permalink] [raw]
Subject: Kernel BUG

Hi Trond,

I am running a sparc-linux kernel using an NFS Root and it is falling over
with the trace below.

My Kernel is not a standard kernel (I have had to tweek it to get the
SBUS GC3 and the 82077 floppy to work on my OPUS Sparc 1 clone).

Can you advise me on any known issues in the NFS Client code that might
enter NULL pointers into the 'slot->slots[i]' in __lookup_tag.

If there are none that you are aware of, are there any specific areas that
I should investigate with printk statements.

The Kernel is cross compiled on an Athlon 64 3400+ (32bit linux at the
moment) using GCC-4.0.2 and Binutils-2.16.1. Compilation takes about 10
minutes so there is no real issue in making changes to the kernel to find
the source of the problem.

A compiler/binutils bug should not be ruled out. I might try
gcc-3.4.3/binutils-2.15.

Please let me know if you would like further information.

Regards
Mark Fortescue.
--------------------------------------------------------------------------
kernel BUG at /L64/src/linux-2.6/linux-2.6.13.4-p01/lib/radix-tree.c:575!
\|/ ____ \|/
"@'/ ,. \`@"
/_| \__/ |_\
\__U_/
ld(45): Kernel bad trap [#1]
PSR: 004000c4 PC: f00e0ff4 NPC: f00e0ff8 Y: 00000000 Not tainted
PC: <radix_tree_gang_lookup_tag+0x144/0x1ac>
%G: 00000001 f022ec00 f022eccc 00400fe2 f002fd18 f022ec00 ff020000
00000000
%O: 0000004d f01fbd78 0000023f 00000000 00000001 00000000 ff021a48
f00e0fec
RPC: <radix_tree_gang_lookup_tag+0x13c/0x1ac>
%L: 00000001 ff021b14 0000003f 00000001 00000002 00000000 ff020000
e0162000
%I: 00000000 ff021b14 00000000 00000001 00000008 ff021b10 ff021ab0
f00b10d4
Caller[f00b10d4]: nfs_wait_on_requests+0x98/0xb8
Caller[f00b2a70]: nfs_sync_inode+0x20/0x74
Caller[f00b063c]: nfs_readpage+0x44/0x44c
Caller[f004fc8c]: do_generic_mapping_read+0x290/0x564
Caller[f005084c]: __generic_file_aio_read+0x168/0x1cc
Caller[f0050a2c]: generic_file_aio_read+0x44/0x54
Caller[f006e298]: do_sync_read+0x94/0xc8
Caller[f006e62c]: vfs_read+0xa0/0x15c
Caller[f006f200]: sys_read+0x30/0x64
Caller[f001144c]: syscall_is_too_hard+0x34/0x40
Caller[e0096e58]: 0xe0096e58
Instruction DUMP: 90122178 7ffcc514 01000000 <91d02005> 9402a001
80a28010 0280000f c4244001 8600e001

--------------------------------------------------------------------------



2005-11-03 18:59:20

by Trond Myklebust

[permalink] [raw]
Subject: Re: Kernel BUG

On Thu, 2005-11-03 at 18:10 +0000, Mark Fortescue wrote:
> Hi Trond,
>
> I am running a sparc-linux kernel using an NFS Root and it is falling over
> with the trace below.
>
> My Kernel is not a standard kernel (I have had to tweek it to get the
> SBUS GC3 and the 82077 floppy to work on my OPUS Sparc 1 clone).
>
> Can you advise me on any known issues in the NFS Client code that might
> enter NULL pointers into the 'slot->slots[i]' in __lookup_tag.
>
> If there are none that you are aware of, are there any specific areas that
> I should investigate with printk statements.

NFS does not ever directly access the radix tree internals: it always
uses the API, and it always protects those operations using the
NFS_I(inode)->req_lock.

Are you sure that radix_tree_init() is being called before the NFSroot
stuff is started? To me, this whole thing smells of memory scribble.

Cheers,
Trond

2005-11-03 23:15:35

by Mark Fortescue

[permalink] [raw]
Subject: Re: Kernel BUG

Hi Trond,

The error occoures well after the NFS root is up and running (during the
link phase of a gcc compilation of hello.c). I thought it might be part of
the NFS system due to the backtrace.

I am currently working on a GCC-3.4.3, Binutils-2.15 version to see if it
is a compiler/binary utilities issue. The problem I have is that
GCC-3.4.3, Binutils-2.15 does not cope with printk("%llu") so I know that
there is a high potential for failure with this combination.

If it works OK, I will try with GCC-3.4.3, Binutils-2.16.1 and GCC-4.0.2,
Binutils-2.15 to try and eliminate compiler/binutils issues.

Once I have eliminated compiler/binutils bugs/features then I will start
to concentrate on the Kernel to try to identify the initial point of
failure.

Regards
Mark Fortescue.

On Thu, 3 Nov 2005, Trond Myklebust wrote:

> On Thu, 2005-11-03 at 18:10 +0000, Mark Fortescue wrote:
> > Hi Trond,
> >
> > I am running a sparc-linux kernel using an NFS Root and it is falling over
> > with the trace below.
> >
> > My Kernel is not a standard kernel (I have had to tweek it to get the
> > SBUS GC3 and the 82077 floppy to work on my OPUS Sparc 1 clone).
> >
> > Can you advise me on any known issues in the NFS Client code that might
> > enter NULL pointers into the 'slot->slots[i]' in __lookup_tag.
> >
> > If there are none that you are aware of, are there any specific areas that
> > I should investigate with printk statements.
>
> NFS does not ever directly access the radix tree internals: it always
> uses the API, and it always protects those operations using the
> NFS_I(inode)->req_lock.
>
> Are you sure that radix_tree_init() is being called before the NFSroot
> stuff is started? To me, this whole thing smells of memory scribble.
>
> Cheers,
> Trond
>
>

2005-11-04 10:08:22

by Mark Fortescue

[permalink] [raw]
Subject: Re: Kernel BUG

Hi Trond,

I have found a working combination of GCC/Binutils [gcc-3.4.3,
binutils-2.16.1 (GCC needs more work as it got its specs wrong and has a
bug in it regarding %llu on sparc).

This suggests that there is a kernel build error associated with GCC-4.0.2
(for sparc-linux). I will need to investigate this as GCC-4.0.2 has a
veriety of bug fixes in it that affect the sparc-linux target. It also has
improved configuration/build scripts that are relevent to what I am trying
to do.

I will let you know what I find. It may take me some time as my sparc
assembly is not too good and this is the best place to find compiler
hickups.

Regards
Mark Fortescue.

On Thu, 3 Nov 2005, Trond Myklebust wrote:

> On Thu, 2005-11-03 at 18:10 +0000, Mark Fortescue wrote:
> > Hi Trond,
> >
> > I am running a sparc-linux kernel using an NFS Root and it is falling over
> > with the trace below.
> >
> > My Kernel is not a standard kernel (I have had to tweek it to get the
> > SBUS GC3 and the 82077 floppy to work on my OPUS Sparc 1 clone).
> >
> > Can you advise me on any known issues in the NFS Client code that might
> > enter NULL pointers into the 'slot->slots[i]' in __lookup_tag.
> >
> > If there are none that you are aware of, are there any specific areas that
> > I should investigate with printk statements.
>
> NFS does not ever directly access the radix tree internals: it always
> uses the API, and it always protects those operations using the
> NFS_I(inode)->req_lock.
>
> Are you sure that radix_tree_init() is being called before the NFSroot
> stuff is started? To me, this whole thing smells of memory scribble.
>
> Cheers,
> Trond
>
>