2005-02-10 07:40:07

by Kim Holviala

[permalink] [raw]
Subject: Spontaneous reboot with 2.6.10 and NFSD

I hit an obscure bug last night when trying to copy files from an nfs
client to my nfs server. The server is a P3/800 with three IDE disks in
software RAID5 running vanilla 2.6.10 and Debian Sarge. The network is
local 100Mbit/s switched ethernet. The server exports a 220 gig
partition which contains a lot of data.

Oh, kernel configs and stuff from the server can be found from:
http://www.holviala.com/~kimmy/crash/

Anyway, I mount the export to a Linux client (tried with a few with
different 2.6 kernels and distros) and then start copying files from
clients CDROM to the server through NFS. After copying a few small
files, the first big one reboots the server. There are no log entries,
and the server has no local console so I don't know what happens. This
is reproduceable 100% of the time.

To narrow down the problem, I've tried the following:

- copied files from a different client running Gentoo: reboot
- exported a non-raided partition (hdc9) and tried that: reboot
- switched 2.6.10 to 2.6.11-rc3: reboot, but it took longer

I hope it's just something that I've done, but this server has been in
use for a long time now without any problems, and I haven't touched it
for a while.

So, if anyone knows what's wrong, or can tell me a way to debug the
situation more I'd be grateful. The server is in a place where it's
nearly impossible to have a local console - I could probably use a
serial one if necessary for debugging.



Kim


2005-02-10 08:27:36

by Kim Holviala

[permalink] [raw]
Subject: Re: Spontaneous reboot with 2.6.10 and NFSD

Kim Holviala wrote:

> To narrow down the problem, I've tried the following:
>
> - copied files from a different client running Gentoo: reboot
> - exported a non-raided partition (hdc9) and tried that: reboot
> - switched 2.6.10 to 2.6.11-rc3: reboot, but it took longer

- tried with both udp and tcp mounts (nfsv3 both): reboot
- tried with 2.6.8.1: reboot

I also tried copying the same files with a tar-ssh-tar pipe thingy and
it works ok. Since I'm not near the server now, and it just crashed, I
can't do more testing until I get back home...



Kim


2005-02-10 08:56:12

by Kim Holviala

[permalink] [raw]
Subject: Re: Spontaneous reboot with 2.6.10 and NFSD

Kim Holviala wrote:
> Kim Holviala wrote:
>
>> To narrow down the problem, I've tried the following:
>>
>> - copied files from a different client running Gentoo: reboot
>> - exported a non-raided partition (hdc9) and tried that: reboot
>> - switched 2.6.10 to 2.6.11-rc3: reboot, but it took longer
>
>
> - tried with both udp and tcp mounts (nfsv3 both): reboot
> - tried with 2.6.8.1: reboot

- tried with local NFS mount (mount localhost:/export/tmp ...): reboot
- tried with user-mode nfs server (nfs-user-server): OK

Hmmm. So it looks like that when I transfer a lot of data through the
kernel NFS server, it reboots the server. But if I transfer the same
data using ssh or user-mode nfs server, it all works like it should

I'm out of ideas.



Kim

2005-02-10 09:21:29

by NeilBrown

[permalink] [raw]
Subject: Re: Spontaneous reboot with 2.6.10 and NFSD

On Thursday February 10, [email protected] wrote:
> Anyway, I mount the export to a Linux client (tried with a few with
> different 2.6 kernels and distros) and then start copying files from
> clients CDROM to the server through NFS. After copying a few small
> files, the first big one reboots the server.

Can you be specific about the size of the "big" file?
Also, what filesystem is being used on the server, what mount flags
(if any) and what export options.

Having some sort of console, whether VGA, serial, or network, to view
the Oops would be invaluable.

Thanks,
NeilBrown

2005-02-10 10:02:39

by Kim Holviala

[permalink] [raw]
Subject: Re: Spontaneous reboot with 2.6.10 and NFSD

Neil Brown wrote:
> On Thursday February 10, [email protected] wrote:
>
>>Anyway, I mount the export to a Linux client (tried with a few with
>>different 2.6 kernels and distros) and then start copying files from
>>clients CDROM to the server through NFS. After copying a few small
>>files, the first big one reboots the server.
>
> Can you be specific about the size of the "big" file?

Well, there were two bigger files, the first 18 megs and the second 35
megs and the copying never got past those two. But in the end it wasn't
the size - I was able to make it reboot with a small C source file...

> Also, what filesystem is being used on the server, what mount flags
> (if any) and what export options.

All the files are here:
http://www.holviala.com/~kimmy/crash/mount

Mount options:
/dev/md8 on /boot type ext3 (rw,nosuid,noatime)

I forgot to transfer the exports file, and now the server is dead...
Will do that later.

> Having some sort of console, whether VGA, serial, or network, to view
> the Oops would be invaluable.

I'll carry the server next to a monitor once I get back home.



Kim

2005-02-10 11:08:31

by Kim Holviala

[permalink] [raw]
Subject: Re: Spontaneous reboot with 2.6.10 and NFSD

Kim Holviala wrote:

>> Also, what filesystem is being used on the server, what mount flags
>> (if any) and what export options.
>
> All the files are here:
> http://www.holviala.com/~kimmy/crash/mount

Umph... Actually, the files are here:
http://www.holviala.com/~kimmy/crash/

> Mount options:
> /dev/md8 on /boot type ext3 (rw,nosuid,noatime)

And that was for a wrong fs, but the options were the same.

> I forgot to transfer the exports file, and now the server is dead...
> Will do that later.

Exports is now in the same place as other files. Here's the relevant line:
/export/home *(rw,sync)




Kim