2004-02-24 14:18:39

by Burton Windle

[permalink] [raw]
Subject: 2.6.3: oops reading /proc/net/rpc/auth.unix.ip/content

Hello. I just upgraded a workstation from 2.6.2 to 2.6.3, and am now
seeing an oops on boot when my init scripts run the nfs-kernel-server
script. The oops actually happens whenever trying to read
/proc/net/rpc/auth.unix.ip/content

Is this a known-issue?

Linux version 2.6.3 (root@jekyll) (gcc version 3.3.2 (Debian)) #11 Mon Feb 23 22:23:26 EST 2004

Unable to handle kernel NULL pointer dereference at virtual address 00000044
printing eip:
c0396beb
*pde = 00000000
Oops: 0002 [#1]
CPU: 0
EIP: 0060:[<c0396beb>] Not tainted
EFLAGS: 00010246
EIP is at content_open+0x5b/0x80
eax: 00000000 ebx: cfb49628 ecx: 00000000 edx: cf92d738
esi: 00000000 edi: cfb29df4 ebp: cfc45f3c esp: cfc45f28
ds: 007b es: 007b ss: 0068
Process grep (pid: 225, threadinfo=cfc44000 task=cfe01440)
Stack: cfb29df4 c0435e98 cfb29df4 cf9b2c24 ffffffe9 cfc45f58 c015b824 cf9b2c24
cfb29df4 00000000 cffee000 00008000 cfc45f9c c015b76b cf9dfae0 cfff56ec
00008000 00000003 00008000 cffee000 cf9dfae0 cfff56ec cffee000 00008000
Call Trace:
[<c015b824>] dentry_open+0xac/0x14c
[<c015b76b>] filp_open+0x4f/0x5c
[<c015bdb3>] sys_open+0x37/0x80
[<c0109c1b>] syscall_call+0x7/0xb

Code: 89 58 44 89 f0 8b 5d f4 8b 75 f8 8b 7d fc 89 ec 5d c3 8d 76



CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_NET_KEY=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_XFRM=y
CONFIG_IPV6_SCTP__=y
CONFIG_NETDEVICES=y
CONFIG_DUMMY=y
CONFIG_NET_ETHERNET=y
CONFIG_MII=y
CONFIG_NET_VENDOR_3COM=y
CONFIG_VORTEX=y
CONFIG_NET_TULIP=y
CONFIG_TULIP=y
CONFIG_DE4X5=y
CONFIG_WINBOND_840=y
CONFIG_DM9102=y
CONFIG_NET_PCI=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=y
CONFIG_SUNRPC=y
CONFIG_SMB_FS=y
CONFIG_MSDOS_PARTITION=y
CONFIG_NLS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_STACKOVERFLOW=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_IOVIRT=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
CONFIG_FRAME_POINTER=y


--
Burton Windle [email protected]



-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-02-25 00:38:42

by Burton Windle

[permalink] [raw]
Subject: Re: 2.6.3: oops reading /proc/net/rpc/auth.unix.ip/content

Upon doing a 'make clean', and setting my gcc symlink back to trusty old
GCC 2.95, kernel 2.6.3 works just fine.

Let it be know that GCC 3.3 (or at least the GCC 3.3.3-0pre3 that is in
Debian Testing) is broken.

I'm sorry to have wasted your time.

--
Burton Windle [email protected]


On Wed, 25 Feb 2004, Neil Brown wrote:

> On Tuesday February 24, [email protected] wrote:
> > Hello. I just upgraded a workstation from 2.6.2 to 2.6.3, and am now
> > seeing an oops on boot when my init scripts run the nfs-kernel-server
> > script. The oops actually happens whenever trying to read
> > /proc/net/rpc/auth.unix.ip/content
> >
> > Is this a known-issue?
>
> I hate to say this, but this cannot possibly happen :-)
>
> It is fairly clear from:
>
> > Unable to handle kernel NULL pointer dereference at virtual address 00000044
> > EIP is at content_open+0x5b/0x80
> > eax: 00000000 ebx: cfb49628 ecx: 00000000 edx: cf92d738
> > esi: 00000000 edi: cfb29df4 ebp: cfc45f3c esp: cfc45f28
> > ds: 007b es: 007b ss: 0068
> >
> > Code: 89 58 44 89 f0 8b 5d f4 8b 75 f8 8b 7d fc 89 ec 5d c3 8d 76
> >
>
> that the oops is happening :
>
> static int content_open(struct inode *inode, struct file *file)
> {
> int res;
> struct handle *han;
> struct cache_detail *cd = PDE(inode)->data;
>
> han = kmalloc(sizeof(*han), GFP_KERNEL);
> if (han == NULL)
> return -ENOMEM;
>
> han->cd = cd;
>
> res = seq_open(file, &cache_content_op);
> if (res)
> kfree(han);
> else
> /*HERE*/ ((struct seq_file *)file->private_data)->private = han;
>
> return res;
> }
>
> The instruction that is oopsing is:
> 0: 89 58 44 mov %ebx,0x44(%eax)
> which is storing the value "han" (in %ebx, 0xcfb49628) into element
> "private" (offset 0x44) of ((struct seq_file *)file->private_data)
> (in %eax, 0x00).
>
> So file->private_data must be NULL.
>
> However seq_open has just returned zero (or we wouldn't have got to
> this code) and you can see from
>
> int seq_open(struct file *file, struct seq_operations *op)
> {
> struct seq_file *p = kmalloc(sizeof(*p), GFP_KERNEL);
> if (!p)
> return -ENOMEM;
> memset(p, 0, sizeof(*p));
> sema_init(&p->sem, 1);
> p->op = op;
> file->private_data = p;
> return 0;
> }
>
> that this means that file->private_data is most definitely not NULL.
>
> As I said, it cannot happen....
>
> Maybe a compiler bug ???? (wouldn't be the first time).
>
> Would you be able to use gdb to disassemble all of content_open and
> seq_file so I can see what is happening?
>
> NeilBrown
>


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-02-24 23:02:14

by NeilBrown

[permalink] [raw]
Subject: Re: 2.6.3: oops reading /proc/net/rpc/auth.unix.ip/content

On Tuesday February 24, [email protected] wrote:
> Hello. I just upgraded a workstation from 2.6.2 to 2.6.3, and am now
> seeing an oops on boot when my init scripts run the nfs-kernel-server
> script. The oops actually happens whenever trying to read
> /proc/net/rpc/auth.unix.ip/content
>
> Is this a known-issue?

I hate to say this, but this cannot possibly happen :-)

It is fairly clear from:

> Unable to handle kernel NULL pointer dereference at virtual address 00000044
> EIP is at content_open+0x5b/0x80
> eax: 00000000 ebx: cfb49628 ecx: 00000000 edx: cf92d738
> esi: 00000000 edi: cfb29df4 ebp: cfc45f3c esp: cfc45f28
> ds: 007b es: 007b ss: 0068
>
> Code: 89 58 44 89 f0 8b 5d f4 8b 75 f8 8b 7d fc 89 ec 5d c3 8d 76
>

that the oops is happening :

static int content_open(struct inode *inode, struct file *file)
{
int res;
struct handle *han;
struct cache_detail *cd = PDE(inode)->data;

han = kmalloc(sizeof(*han), GFP_KERNEL);
if (han == NULL)
return -ENOMEM;

han->cd = cd;

res = seq_open(file, &cache_content_op);
if (res)
kfree(han);
else
/*HERE*/ ((struct seq_file *)file->private_data)->private = han;

return res;
}

The instruction that is oopsing is:
0: 89 58 44 mov %ebx,0x44(%eax)
which is storing the value "han" (in %ebx, 0xcfb49628) into element
"private" (offset 0x44) of ((struct seq_file *)file->private_data)
(in %eax, 0x00).

So file->private_data must be NULL.

However seq_open has just returned zero (or we wouldn't have got to
this code) and you can see from

int seq_open(struct file *file, struct seq_operations *op)
{
struct seq_file *p = kmalloc(sizeof(*p), GFP_KERNEL);
if (!p)
return -ENOMEM;
memset(p, 0, sizeof(*p));
sema_init(&p->sem, 1);
p->op = op;
file->private_data = p;
return 0;
}

that this means that file->private_data is most definitely not NULL.

As I said, it cannot happen....

Maybe a compiler bug ???? (wouldn't be the first time).

Would you be able to use gdb to disassemble all of content_open and
seq_file so I can see what is happening?

NeilBrown


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs