2006-10-03 16:49:12

by Dave Jones

[permalink] [raw]
Subject: FSX on NFS blew up.

Took ~8hrs to hit this on an NFSv3 mount. (2.6.18+Jan Kara's jbd patch)

http://www.codemonkey.org.uk/junk/fsx-nfs.txt

Dave

--
http://www.codemonkey.org.uk


2006-10-04 00:35:18

by Badari Pulavarty

[permalink] [raw]
Subject: Re: FSX on NFS blew up.

On Tue, 2006-10-03 at 12:49 -0400, Dave Jones wrote:
> Took ~8hrs to hit this on an NFSv3 mount. (2.6.18+Jan Kara's jbd patch)
>
> http://www.codemonkey.org.uk/junk/fsx-nfs.txt
>
> Dave
>

I was seeing *similar* problem on NFS mounted filesystem (while running
fsx), but later realized that filesystem is full - when it happend.

Could be fsx error handling problem ? Can you check yours ?

Thanks,
Badari

#df
...
/dev/sdc 17504036 17504036 0 100% /mnt1
9.47.xx.xx:/mnt1 17504256 17504256 0 100% /mnt3


I get fsx sigsegvs:

fsx-linux[4514] general protection rip:2ae1d90df690 rsp:7fffd1b57b08 error:0
fsx-linux[4513] general protection rip:2b6ee6048690 rsp:7fffc4becba8 error:0
fsx-linux[4515] general protection rip:2ac5964f0690 rsp:7fff147446f8 error:0
fsx-linux[5586] general protection rip:2b001c974690 rsp:7fff8e2c0278 error:0
fsx-linux[5587] general protection rip:2af03e546690 rsp:7fff6c6ee6a8 error:0
fsx-linux[5588] general protection rip:2ad9ca19c690 rsp:7fffe0a99ec8 error:0
fsx-linux[5585] general protection rip:2b4da569c690 rsp:7fff0559a588 error:0
fsx-linux[5921] general protection rip:2ac4d7346690 rsp:7fffd38f0b38 error:0
fsx-linux[5923] general protection rip:2b942d139690 rsp:7fff7dafd2b8 error:0
fsx-linux[5924] general protection rip:2b14e07cf690 rsp:7fffca465738 error:0
fsx-linux[5922] general protection rip:2af16b457690 rsp:7fff3f7de498 error:0
fsx-linux[5932] general protection rip:2b4b2b6ba690 rsp:7fff7f57c5b8 error:0
fsx-linux[5933] general protection rip:2b1d69ffd690 rsp:7fff40c37c68 error:0
fsx-linux[5934] general protection rip:2b06721f7690 rsp:7fff38a3da78 error:0
fsx-linux[5935] general protection rip:2ba2b5be8690 rsp:7ffff504e088 error:0

truncating to largest ever: 0x13e76
truncating to largest ever: 0x13e76
truncating to largest ever: 0x13e76
truncating to largest ever: 0x13e76
short read: 0xa8c2 bytes instead of 0xf0c4
LOG DUMP (3 total operations):
1(1 mod 256): TRUNCATE UP from 0x0 to 0x13e76
2(2 mod 256): WRITE 0x17098 thru 0x26857 (0xf7c0 bytes) HOLE
3(3 mod 256): READ 0xc73e thru 0x1b801 (0xf0c4 bytes)
Correct content saved for comparison
(maybe hexdump "/mnt3/foo1" vs "/mnt3/foo1.fsxgood")
short read: 0xa8c2 bytes instead of 0xf0c4
LOG DUMP (3 total operations):
1(1 mod 256): TRUNCATE UP from 0x0 to 0x13e76
2(2 mod 256): WRITE 0x17098 thru 0x26857 (0xf7c0 bytes) HOLE
3(3 mod 256): READ 0xc73e thru 0x1b801 (0xf0c4 bytes)
short read: 0xa8c2 bytes instead of 0xf0c4
LOG DUMP (3 total operations):
1(1 mod 256): TRUNCATE UP from 0x0 to 0x13e76
2(2 mod 256): WRITE 0x17098 thru 0x26857 (0xf7c0 bytes) HOLE
3(3 mod 256): READ 0xc73e thru 0x1b801 (0xf0c4 bytes)
Correct content saved for comparison
(maybe hexdump "/mnt3//foo2" vs "/mnt3//foo2.fsxgood")
Correct content saved for comparison
(maybe hexdump "/mnt3//foo3" vs "/mnt3//foo3.fsxgood")
short read: 0xa8c2 bytes instead of 0xf0c4
LOG DUMP (3 total operations):
1(1 mod 256): TRUNCATE UP from 0x0 to 0x13e76
2(2 mod 256): WRITE 0x17098 thru 0x26857 (0xf7c0 bytes) HOLE
3(3 mod 256): READ 0xc73e thru 0x1b801 (0xf0c4 bytes)
Correct content saved for comparison
(maybe hexdump "/mnt3//foo4" vs "/mnt3//foo4.fsxgood")



2006-10-04 00:40:19

by Dave Jones

[permalink] [raw]
Subject: Re: FSX on NFS blew up.

On Tue, Oct 03, 2006 at 05:34:44PM -0700, Badari Pulavarty wrote:
> On Tue, 2006-10-03 at 12:49 -0400, Dave Jones wrote:
> > Took ~8hrs to hit this on an NFSv3 mount. (2.6.18+Jan Kara's jbd patch)
> >
> > http://www.codemonkey.org.uk/junk/fsx-nfs.txt
>
> I was seeing *similar* problem on NFS mounted filesystem (while running
> fsx), but later realized that filesystem is full - when it happend.
>
> Could be fsx error handling problem ? Can you check yours ?

It's running low, but there's no way it ran out. (It's down to about 4GB free).

Dave

--
http://www.codemonkey.org.uk

2006-10-04 00:44:39

by Dave Jones

[permalink] [raw]
Subject: Re: FSX on NFS blew up.

On Tue, Oct 03, 2006 at 08:40:09PM -0400, Dave Jones wrote:
> On Tue, Oct 03, 2006 at 05:34:44PM -0700, Badari Pulavarty wrote:
> > On Tue, 2006-10-03 at 12:49 -0400, Dave Jones wrote:
> > > Took ~8hrs to hit this on an NFSv3 mount. (2.6.18+Jan Kara's jbd patch)
> > >
> > > http://www.codemonkey.org.uk/junk/fsx-nfs.txt
> >
> > I was seeing *similar* problem on NFS mounted filesystem (while running
> > fsx), but later realized that filesystem is full - when it happend.
> >
> > Could be fsx error handling problem ? Can you check yours ?
>
> It's running low, but there's no way it ran out. (It's down to about 4GB free).

I just noticed the fsxlog that got dumped in that dir contains
some slightly different info to what got dumped to stdout.

I've pasted it onto the end of the file in the URL above.

Dave

--
http://www.codemonkey.org.uk

2006-10-04 00:46:40

by Badari Pulavarty

[permalink] [raw]
Subject: Re: FSX on NFS blew up.

On Tue, 2006-10-03 at 20:40 -0400, Dave Jones wrote:
> On Tue, Oct 03, 2006 at 05:34:44PM -0700, Badari Pulavarty wrote:
> > On Tue, 2006-10-03 at 12:49 -0400, Dave Jones wrote:
> > > Took ~8hrs to hit this on an NFSv3 mount. (2.6.18+Jan Kara's jbd patch)
> > >
> > > http://www.codemonkey.org.uk/junk/fsx-nfs.txt
> >
> > I was seeing *similar* problem on NFS mounted filesystem (while running
> > fsx), but later realized that filesystem is full - when it happend.
> >
> > Could be fsx error handling problem ? Can you check yours ?
>
> It's running low, but there's no way it ran out. (It's down to about 4GB free).
>
> Dave
>

Okay... Looking at your log

> Size error: expected 0x2b804 stat 0x37000 seek 0x37000

filesize doesn't match. So wondering, if you have a write
failure or filesystem full case.

Thanks,
Badari

2006-10-04 00:51:31

by Dave Jones

[permalink] [raw]
Subject: Re: FSX on NFS blew up.

On Tue, Oct 03, 2006 at 05:46:10PM -0700, Badari Pulavarty wrote:
> On Tue, 2006-10-03 at 20:40 -0400, Dave Jones wrote:
> > On Tue, Oct 03, 2006 at 05:34:44PM -0700, Badari Pulavarty wrote:
> > > On Tue, 2006-10-03 at 12:49 -0400, Dave Jones wrote:
> > > > Took ~8hrs to hit this on an NFSv3 mount. (2.6.18+Jan Kara's jbd patch)
> > > >
> > > > http://www.codemonkey.org.uk/junk/fsx-nfs.txt
> > >
> > > I was seeing *similar* problem on NFS mounted filesystem (while running
> > > fsx), but later realized that filesystem is full - when it happend.
> > >
> > > Could be fsx error handling problem ? Can you check yours ?
> >
> > It's running low, but there's no way it ran out. (It's down to about 4GB free).
> >
> > Dave
> >
>
> Okay... Looking at your log
>
> > Size error: expected 0x2b804 stat 0x37000 seek 0x37000
>
> filesize doesn't match. So wondering, if you have a write
> failure or filesystem full case.

The server didn't report anything nasty in its logs, and *touch wood*
hasn't had any hardware problems to date.

Dave

--
http://www.codemonkey.org.uk

2006-10-04 23:34:10

by Badari Pulavarty

[permalink] [raw]
Subject: Re: FSX on NFS blew up.

Dave Jones wrote:
> On Tue, Oct 03, 2006 at 05:46:10PM -0700, Badari Pulavarty wrote:
> > On Tue, 2006-10-03 at 20:40 -0400, Dave Jones wrote:
> > > On Tue, Oct 03, 2006 at 05:34:44PM -0700, Badari Pulavarty wrote:
> > > > On Tue, 2006-10-03 at 12:49 -0400, Dave Jones wrote:
> > > > > Took ~8hrs to hit this on an NFSv3 mount. (2.6.18+Jan Kara's jbd patch)
> > > > >
> > > > > http://www.codemonkey.org.uk/junk/fsx-nfs.txt
> > > >
> > > > I was seeing *similar* problem on NFS mounted filesystem (while running
> > > > fsx), but later realized that filesystem is full - when it happend.
> > > >
> > > > Could be fsx error handling problem ? Can you check yours ?
> > >
> > > It's running low, but there's no way it ran out. (It's down to about 4GB free).
> > >
> > > Dave
> > >
> >
> > Okay... Looking at your log
> >
> > > Size error: expected 0x2b804 stat 0x37000 seek 0x37000
> >
> > filesize doesn't match. So wondering, if you have a write
> > failure or filesystem full case.
>
> The server didn't report anything nasty in its logs, and *touch wood*
> hasn't had any hardware problems to date.
>
FWIW, On 2.6.18-mm3 I ran (4-copies) fsx on NFS mount for 24 hours
without any issues. I do see segfaults and errors when the filesystem is
full -
those are mostly fsx error handling issues.

Thanks,
Badari