Dear List,
This is the scenario; two high performance NFS file servers needed;
quota support is a must, and so far it seems that we are out of luck :*(
Suggestions and help would be very welcome.
We don't care much about which filesystem to use - so far we use XFS
because of the need for (journalled) quota.
*) ext2 - no-go, because of lack of journal
*) ext3 - no-go, because quota isn't journalled
*) JFS - no-go, because of lack of quota
*) reiserfs - no-go, because of lack of quota
*) XFS seems to be the *only* viable filesystem in this scenario - if
anyone has alternative suggestions, we'd like to hear about it.
Oh, and Hans, I don't think we can fund your quota implementation right
now - no hard feelings ;)
History of these projects:
The first server, an IBM 345 with external SCSI enclosure and hardware
RAID, quickly triggered bugs in XFS under heavy usage:
First XFS bug:
---------------
http://oss.sgi.com/bugzilla/show_bug.cgi?id=309
Submitted in februrary this year - requires server reboot, NFS clients
will then re-trigger the bug immediately after the NFS server is started
again. Clearly not a pleasent problem.
A fairly simple patch is available, which solves the problem in the most
common cases. This simple patch has *not*yet* been included in 2.6.8.1.
A lot of people are seeing this - the SGI bugzilla is evidence of this,
so is google.
Second XFS bug:
---------------
Also causes the 'kernel BUG at fs/xfs/support/debug.c:106' message to be
printed. This bug is not solved by applying the simple patch to the
first problem.
How well known this problem is, I don't know - I can get more details on
this if anyone is actually interested in working on fixing XFS.
Third XFS bug:
--------------
XFS causes lowmem oom, triggering the OOM killer. Reported by
[email protected] on the 18th of august.
On the 24th of august, William Lee Irwin gives some suggestions and
mentions "xfs has some known bad slab behavior."
So, it's normal to OOM the lowmem with XFS? Again, more info can be
presented if anyone cares about fixing this.
Stability on large filesystems:
-------------------------------
On a 600+G filesystem with some 17M files, we are currently unable to
run a backup of the filesystem.
Some 4-8 hours after the backup has started, the dreaded 'debug.c:106'
message will appear (at some random place thru the filesystem - it is
not a consistent error in one specific location in the filesystem), and
the server will need a reboot.
Obviously, running very large busy filesystems while being unable to
back them up, is not a very pleasent thing to do...
Second server:
On a somewhat smaller server, I recently migrated to XFS (beliving the
most basic problems had been ironed out). It took me about a day to
trigger the 'debug.c:106' error message from XFS, on vanilla 2.6.8.1.
After applying the simple fix (the fix for the first XFS problem as
described above), I haven't had problems with this particular server
since - but it is clearly serving fewer clients with fewer disks and a
lot less storage and traffic.
While the small server seems to be running well now, the large one has
an average uptime of about one day (!) Backups will crash it reliably,
when XFS doesn't OOM the box at random.
A little info on the hardware:
Big server Small server
---------------------- -----------------------
Intel Xeon Dual Athlon MP
7 external SCSI disks 4 internal IDE disks
IBM hardware RAID Software RAID-1 + LVM
600+ GB XFS ~150 GB XFS
17+ M files ~1 M files
Both primarily serve NFS to a bunch of clients. Both run vanilla 2.6.8.1
plus the aforementioned patch for the first XFS problem we encountered.
<frustrated_admin mode="on">
Does anyone actually use XFS for serious file-serving? (yes, I run it
on my desktop at home and I don't have problems there - such reports are
not really relevant).
Is anyone actually maintaining/bugfixing XFS? Yes, I know the
MAINTAINERS file, but I am a little bit confused here - seeing that
trivial-to-trigger bugs that crash the system and have simple fixes,
have not been fixed in current mainline kernels.
If XFS is a no-go because of lack of support, is there any realistic
alternatives under Linux (taking our need for quota into account) ?
And finally, if Linux is simply a no-go for high performance file
serving, what other suggestions might people have? NetApp?
</>
Thank you very much,
--
/ jakob
This is a very well written+detailed bug report, have you tried writing
the linux-xfs mailing list?
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Jakob
Oestergaard
Sent: Wednesday, September 08, 2004 8:35 AM
To: [email protected]
Subject: Major XFS problems...
Dear List,
This is the scenario; two high performance NFS file servers needed;
quota support is a must, and so far it seems that we are out of luck :*(
Suggestions and help would be very welcome.
We don't care much about which filesystem to use - so far we use XFS
because of the need for (journalled) quota.
*) ext2 - no-go, because of lack of journal
*) ext3 - no-go, because quota isn't journalled
*) JFS - no-go, because of lack of quota
*) reiserfs - no-go, because of lack of quota
*) XFS seems to be the *only* viable filesystem in this scenario - if
anyone has alternative suggestions, we'd like to hear about it.
Oh, and Hans, I don't think we can fund your quota implementation right
now - no hard feelings ;)
History of these projects:
The first server, an IBM 345 with external SCSI enclosure and hardware
RAID, quickly triggered bugs in XFS under heavy usage:
First XFS bug:
---------------
http://oss.sgi.com/bugzilla/show_bug.cgi?id=309
Submitted in februrary this year - requires server reboot, NFS clients
will then re-trigger the bug immediately after the NFS server is started
again. Clearly not a pleasent problem.
A fairly simple patch is available, which solves the problem in the most
common cases. This simple patch has *not*yet* been included in 2.6.8.1.
A lot of people are seeing this - the SGI bugzilla is evidence of this,
so is google.
Second XFS bug:
---------------
Also causes the 'kernel BUG at fs/xfs/support/debug.c:106' message to be
printed. This bug is not solved by applying the simple patch to the
first problem.
How well known this problem is, I don't know - I can get more details on
this if anyone is actually interested in working on fixing XFS.
Third XFS bug:
--------------
XFS causes lowmem oom, triggering the OOM killer. Reported by
[email protected] on the 18th of august.
On the 24th of august, William Lee Irwin gives some suggestions and
mentions "xfs has some known bad slab behavior."
So, it's normal to OOM the lowmem with XFS? Again, more info can be
presented if anyone cares about fixing this.
Stability on large filesystems:
-------------------------------
On a 600+G filesystem with some 17M files, we are currently unable to
run a backup of the filesystem.
Some 4-8 hours after the backup has started, the dreaded 'debug.c:106'
message will appear (at some random place thru the filesystem - it is
not a consistent error in one specific location in the filesystem), and
the server will need a reboot.
Obviously, running very large busy filesystems while being unable to
back them up, is not a very pleasent thing to do...
Second server:
On a somewhat smaller server, I recently migrated to XFS (beliving the
most basic problems had been ironed out). It took me about a day to
trigger the 'debug.c:106' error message from XFS, on vanilla 2.6.8.1.
After applying the simple fix (the fix for the first XFS problem as
described above), I haven't had problems with this particular server
since - but it is clearly serving fewer clients with fewer disks and a
lot less storage and traffic.
While the small server seems to be running well now, the large one has
an average uptime of about one day (!) Backups will crash it reliably,
when XFS doesn't OOM the box at random.
A little info on the hardware:
Big server Small server
---------------------- -----------------------
Intel Xeon Dual Athlon MP
7 external SCSI disks 4 internal IDE disks
IBM hardware RAID Software RAID-1 + LVM
600+ GB XFS ~150 GB XFS
17+ M files ~1 M files
Both primarily serve NFS to a bunch of clients. Both run vanilla 2.6.8.1
plus the aforementioned patch for the first XFS problem we encountered.
<frustrated_admin mode="on">
Does anyone actually use XFS for serious file-serving? (yes, I run it
on my desktop at home and I don't have problems there - such reports are
not really relevant).
Is anyone actually maintaining/bugfixing XFS? Yes, I know the
MAINTAINERS file, but I am a little bit confused here - seeing that
trivial-to-trigger bugs that crash the system and have simple fixes,
have not been fixed in current mainline kernels.
If XFS is a no-go because of lack of support, is there any realistic
alternatives under Linux (taking our need for quota into account) ?
And finally, if Linux is simply a no-go for high performance file
serving, what other suggestions might people have? NetApp?
</>
Thank you very much,
--
/ jakob
On Wed, Sep 08, 2004 at 09:07:56AM -0400, Piszcz, Justin Michael wrote:
> This is a very well written+detailed bug report, have you tried writing
> the linux-xfs mailing list?
Thank you :)
No, I did not write the XFS list - but maybe I should...
I purposedly wrote directly to LKML to get the attention of a "broader
audience".
For example, the lowmem OOM problem may not be fixable by the XFS
developers alone (I don't know), and I've heard whisperings about the
debug.c:106 problem being related to knfsd changes in the 2.6 series
(don't know how credible that is either).
So, I thought, maybe if more people were made aware of this, the right
people would have a chance of figuring this out :)
--
/ jakob
Jakob,
I am a XFS freak, I have a ton of servers with RAID <hardware> and
also run XFS over finely tuned NFS. Never had a problem. Only once
when there was a power faliure, the journal took 20 mins to read to
come back up. But otherwise XFS is pretty damn stable.
My xfs box just runs linux 2.4.18-xfs and runs nfs over it on an
Single Athlon 1800 or something like that has a 1GB of RAM and has a
3Ware Raid card it shares to about 200 workstations. Not bad at all
it runs pretty good. Never needed to run quota had no problem with
backup. Check your blocksize of the RAID and remake yor xfs partition
and also .... do an update of the RAID card firmware.
Hope this helps take care
-Anando
On Wed, 8 Sep 2004 14:35:24 +0200, Jakob Oestergaard
<[email protected]> wrote:
>
> Dear List,
>
> This is the scenario; two high performance NFS file servers needed;
> quota support is a must, and so far it seems that we are out of luck :*(
>
> Suggestions and help would be very welcome.
>
> We don't care much about which filesystem to use - so far we use XFS
> because of the need for (journalled) quota.
> *) ext2 - no-go, because of lack of journal
> *) ext3 - no-go, because quota isn't journalled
> *) JFS - no-go, because of lack of quota
> *) reiserfs - no-go, because of lack of quota
> *) XFS seems to be the *only* viable filesystem in this scenario - if
> anyone has alternative suggestions, we'd like to hear about it.
>
> Oh, and Hans, I don't think we can fund your quota implementation right
> now - no hard feelings ;)
>
> History of these projects:
>
> The first server, an IBM 345 with external SCSI enclosure and hardware
> RAID, quickly triggered bugs in XFS under heavy usage:
>
> First XFS bug:
> ---------------
> http://oss.sgi.com/bugzilla/show_bug.cgi?id=309
>
> Submitted in februrary this year - requires server reboot, NFS clients
> will then re-trigger the bug immediately after the NFS server is started
> again. Clearly not a pleasent problem.
>
> A fairly simple patch is available, which solves the problem in the most
> common cases. This simple patch has *not*yet* been included in 2.6.8.1.
>
> A lot of people are seeing this - the SGI bugzilla is evidence of this,
> so is google.
>
> Second XFS bug:
> ---------------
> Also causes the 'kernel BUG at fs/xfs/support/debug.c:106' message to be
> printed. This bug is not solved by applying the simple patch to the
> first problem.
>
> How well known this problem is, I don't know - I can get more details on
> this if anyone is actually interested in working on fixing XFS.
>
> Third XFS bug:
> --------------
> XFS causes lowmem oom, triggering the OOM killer. Reported by
> [email protected] on the 18th of august.
>
> On the 24th of august, William Lee Irwin gives some suggestions and
> mentions "xfs has some known bad slab behavior."
>
> So, it's normal to OOM the lowmem with XFS? Again, more info can be
> presented if anyone cares about fixing this.
>
> Stability on large filesystems:
> -------------------------------
> On a 600+G filesystem with some 17M files, we are currently unable to
> run a backup of the filesystem.
>
> Some 4-8 hours after the backup has started, the dreaded 'debug.c:106'
> message will appear (at some random place thru the filesystem - it is
> not a consistent error in one specific location in the filesystem), and
> the server will need a reboot.
>
> Obviously, running very large busy filesystems while being unable to
> back them up, is not a very pleasent thing to do...
>
> Second server:
>
> On a somewhat smaller server, I recently migrated to XFS (beliving the
> most basic problems had been ironed out). It took me about a day to
> trigger the 'debug.c:106' error message from XFS, on vanilla 2.6.8.1.
>
> After applying the simple fix (the fix for the first XFS problem as
> described above), I haven't had problems with this particular server
> since - but it is clearly serving fewer clients with fewer disks and a
> lot less storage and traffic.
>
> While the small server seems to be running well now, the large one has
> an average uptime of about one day (!) Backups will crash it reliably,
> when XFS doesn't OOM the box at random.
>
> A little info on the hardware:
> Big server Small server
> ---------------------- -----------------------
> Intel Xeon Dual Athlon MP
> 7 external SCSI disks 4 internal IDE disks
> IBM hardware RAID Software RAID-1 + LVM
> 600+ GB XFS ~150 GB XFS
> 17+ M files ~1 M files
>
> Both primarily serve NFS to a bunch of clients. Both run vanilla 2.6.8.1
> plus the aforementioned patch for the first XFS problem we encountered.
>
> <frustrated_admin mode="on">
>
> Does anyone actually use XFS for serious file-serving? (yes, I run it
> on my desktop at home and I don't have problems there - such reports are
> not really relevant).
>
> Is anyone actually maintaining/bugfixing XFS? Yes, I know the
> MAINTAINERS file, but I am a little bit confused here - seeing that
> trivial-to-trigger bugs that crash the system and have simple fixes,
> have not been fixed in current mainline kernels.
>
> If XFS is a no-go because of lack of support, is there any realistic
> alternatives under Linux (taking our need for quota into account) ?
>
> And finally, if Linux is simply a no-go for high performance file
> serving, what other suggestions might people have? NetApp?
>
> </>
>
> Thank you very much,
>
> --
>
> / jakob
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
All gold does not glitter all those who wander are ot lost.
The Song Of Aaragon
I am also an XFS freak; I do not use a RAID anywhere but have XFS as my
file system for almost every machine I use.
I have yet to have a serious problem that he is experiencing; however,
he said he did not have any issues or problems on his desktop either.
How does the XFS interplay with the block size of the raid? I remember
I did some benchmarks with ext2 with varying block sizes and around 2048
to 4096 bytes was the best and then after that, the performance tapered
off.
How does the block size w/XFS & journal play with the block size of the
RAID?
What is recommended?
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Anando
Bhattacharya
Sent: Wednesday, September 08, 2004 11:04 AM
To: Jakob Oestergaard; [email protected]
Subject: Re: Major XFS problems...
Jakob,
I am a XFS freak, I have a ton of servers with RAID <hardware> and
also run XFS over finely tuned NFS. Never had a problem. Only once
when there was a power faliure, the journal took 20 mins to read to
come back up. But otherwise XFS is pretty damn stable.
My xfs box just runs linux 2.4.18-xfs and runs nfs over it on an
Single Athlon 1800 or something like that has a 1GB of RAM and has a
3Ware Raid card it shares to about 200 workstations. Not bad at all
it runs pretty good. Never needed to run quota had no problem with
backup. Check your blocksize of the RAID and remake yor xfs partition
and also .... do an update of the RAID card firmware.
Hope this helps take care
-Anando
On Wed, 8 Sep 2004 14:35:24 +0200, Jakob Oestergaard
<[email protected]> wrote:
>
> Dear List,
>
> This is the scenario; two high performance NFS file servers needed;
> quota support is a must, and so far it seems that we are out of luck
:*(
>
> Suggestions and help would be very welcome.
>
> We don't care much about which filesystem to use - so far we use XFS
> because of the need for (journalled) quota.
> *) ext2 - no-go, because of lack of journal
> *) ext3 - no-go, because quota isn't journalled
> *) JFS - no-go, because of lack of quota
> *) reiserfs - no-go, because of lack of quota
> *) XFS seems to be the *only* viable filesystem in this scenario - if
> anyone has alternative suggestions, we'd like to hear about it.
>
> Oh, and Hans, I don't think we can fund your quota implementation
right
> now - no hard feelings ;)
>
> History of these projects:
>
> The first server, an IBM 345 with external SCSI enclosure and hardware
> RAID, quickly triggered bugs in XFS under heavy usage:
>
> First XFS bug:
> ---------------
> http://oss.sgi.com/bugzilla/show_bug.cgi?id=309
>
> Submitted in februrary this year - requires server reboot, NFS clients
> will then re-trigger the bug immediately after the NFS server is
started
> again. Clearly not a pleasent problem.
>
> A fairly simple patch is available, which solves the problem in the
most
> common cases. This simple patch has *not*yet* been included in
2.6.8.1.
>
> A lot of people are seeing this - the SGI bugzilla is evidence of
this,
> so is google.
>
> Second XFS bug:
> ---------------
> Also causes the 'kernel BUG at fs/xfs/support/debug.c:106' message to
be
> printed. This bug is not solved by applying the simple patch to the
> first problem.
>
> How well known this problem is, I don't know - I can get more details
on
> this if anyone is actually interested in working on fixing XFS.
>
> Third XFS bug:
> --------------
> XFS causes lowmem oom, triggering the OOM killer. Reported by
> [email protected] on the 18th of august.
>
> On the 24th of august, William Lee Irwin gives some suggestions and
> mentions "xfs has some known bad slab behavior."
>
> So, it's normal to OOM the lowmem with XFS? Again, more info can be
> presented if anyone cares about fixing this.
>
> Stability on large filesystems:
> -------------------------------
> On a 600+G filesystem with some 17M files, we are currently unable to
> run a backup of the filesystem.
>
> Some 4-8 hours after the backup has started, the dreaded 'debug.c:106'
> message will appear (at some random place thru the filesystem - it is
> not a consistent error in one specific location in the filesystem),
and
> the server will need a reboot.
>
> Obviously, running very large busy filesystems while being unable to
> back them up, is not a very pleasent thing to do...
>
> Second server:
>
> On a somewhat smaller server, I recently migrated to XFS (beliving the
> most basic problems had been ironed out). It took me about a day to
> trigger the 'debug.c:106' error message from XFS, on vanilla 2.6.8.1.
>
> After applying the simple fix (the fix for the first XFS problem as
> described above), I haven't had problems with this particular server
> since - but it is clearly serving fewer clients with fewer disks and a
> lot less storage and traffic.
>
> While the small server seems to be running well now, the large one has
> an average uptime of about one day (!) Backups will crash it
reliably,
> when XFS doesn't OOM the box at random.
>
> A little info on the hardware:
> Big server Small server
> ---------------------- -----------------------
> Intel Xeon Dual Athlon MP
> 7 external SCSI disks 4 internal IDE disks
> IBM hardware RAID Software RAID-1 + LVM
> 600+ GB XFS ~150 GB XFS
> 17+ M files ~1 M files
>
> Both primarily serve NFS to a bunch of clients. Both run vanilla
2.6.8.1
> plus the aforementioned patch for the first XFS problem we
encountered.
>
> <frustrated_admin mode="on">
>
> Does anyone actually use XFS for serious file-serving? (yes, I run it
> on my desktop at home and I don't have problems there - such reports
are
> not really relevant).
>
> Is anyone actually maintaining/bugfixing XFS? Yes, I know the
> MAINTAINERS file, but I am a little bit confused here - seeing that
> trivial-to-trigger bugs that crash the system and have simple fixes,
> have not been fixed in current mainline kernels.
>
> If XFS is a no-go because of lack of support, is there any realistic
> alternatives under Linux (taking our need for quota into account) ?
>
> And finally, if Linux is simply a no-go for high performance file
> serving, what other suggestions might people have? NetApp?
>
> </>
>
> Thank you very much,
>
> --
>
> / jakob
>
> -
> To unsubscribe from this list: send the line "unsubscribe
linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
All gold does not glitter all those who wander are ot lost.
The Song Of Aaragon
On Wed, Sep 08, 2004 at 10:04:16AM -0500, Anando Bhattacharya wrote:
> Jakob,
>
> I am a XFS freak, I have a ton of servers with RAID <hardware> and
> also run XFS over finely tuned NFS. Never had a problem. Only once
> when there was a power faliure, the journal took 20 mins to read to
> come back up. But otherwise XFS is pretty damn stable.
> My xfs box just runs linux 2.4.18-xfs and runs nfs over it on an
> Single Athlon 1800 or something like that has a 1GB of RAM and has a
> 3Ware Raid card it shares to about 200 workstations.
This, along with other information from XFS bugzilla and the xfs list
etc. etc. seems to suggest that there is a common trend:
SMP systems on 2.6 have a problem with XFS+NFS.
UP systems on 2.4 and possibly 2.6 does not have this problem.
We'll be testing with a 2.6.8.1 UP kernel, next time the big server
reboots (it's been up for the better part of a day now so it shouldn't
take long ;)
--
/ jakob
On Wed, Sep 08, 2004 at 02:35:24PM +0200, Jakob Oestergaard wrote:
>
> First XFS bug:
> ---------------
> http://oss.sgi.com/bugzilla/show_bug.cgi?id=309
I've shared your frustration this summer:
http://bugzilla.kernel.org/show_bug.cgi?id=2840
http://bugzilla.kernel.org/show_bug.cgi?id=2841
http://bugzilla.kernel.org/show_bug.cgi?id=2929
http://bugzilla.kernel.org/show_bug.cgi?id=3118
but after 2.6.8.1 and going back to 8K stacks, my fileserver has been
stable.
> XFS causes lowmem oom, triggering the OOM killer. Reported by
> [email protected] on the 18th of august.
Have you tried if this helps:
sysctl -w vm.vfs_cache_pressure=10000
Fixed oom-killer problems for me on 2.6.8-rc2. It was very eager to
kill the backup (tivoli storage manager) client. Haven't needed it so
far on 2.6.8.1.
> A little info on the hardware:
> Big server Small server
> ---------------------- -----------------------
> Intel Xeon Dual Athlon MP
> 7 external SCSI disks 4 internal IDE disks
> IBM hardware RAID Software RAID-1 + LVM
> 600+ GB XFS ~150 GB XFS
> 17+ M files ~1 M files
Dual pentium-III (IBM x330)
2 TB Infortren Eonstore, fibre channel
500 GB in use
~5 M files
Also have another dual Xeon (Dell PowerEdge 2650) with ~1 TB on XFS,
but that's running 2.4.20-30.8.XFS1.3.1smp, and has never had any
problems.
>
> Does anyone actually use XFS for serious file-serving?
Yes,
>
> If XFS is a no-go because of lack of support, is there any realistic
> alternatives under Linux (taking our need for quota into account) ?
>
> And finally, if Linux is simply a no-go for high performance file
> serving, what other suggestions might people have? NetApp?
My impression so far has been more that the 2.6-series might not yet
have been stable enough. Have had to be chasing the leading/bleeding
edge to get away from the known problems. Was looking forward to a
real "maintainer" taking over this series, but that doesn't seem to be
happening..
-jf
On Thu, 2004-09-09 at 01:44, Jakob Oestergaard wrote:
> SMP systems on 2.6 have a problem with XFS+NFS.
Knfsd threads in 2.6 are no longer serialised by the BKL, and the
change has exposed a number of SMP issues in the dcache. Try the
two patches at
http://marc.theaimsgroup.com/?l=linux-kernel&m=108330112505555&w=2
and
http://linus.bkbits.net:8080/linux-2.5/[email protected]
(the latter is in recent Linus kernels). If you're still having
problems after applying those patches, Nathan and I need to know.
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
On Thu, Sep 09, 2004 at 02:36:58AM +1000, Greg Banks wrote:
> On Thu, 2004-09-09 at 01:44, Jakob Oestergaard wrote:
> > SMP systems on 2.6 have a problem with XFS+NFS.
>
> Knfsd threads in 2.6 are no longer serialised by the BKL, and the
> change has exposed a number of SMP issues in the dcache. Try the
> two patches at
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=108330112505555&w=2
>
> and
>
> http://linus.bkbits.net:8080/linux-2.5/[email protected]
>
> (the latter is in recent Linus kernels). If you're still having
> problems after applying those patches, Nathan and I need to know.
>
Ok - Anders ([email protected]) will hopefully get a test setup (similar
to the big server) running tomorrow, and will then see if the system can
be broken with these two patches applied.
Are we right in assuming that no other patches should be necessary atop
of 2.6.8.1 in order to get a stable XFS? (that we should not apply
other XFS specific patches?)
Thanks,
--
/ jakob
Greg Banks wrote:
> On Thu, 2004-09-09 at 01:44, Jakob Oestergaard wrote:
>
>>SMP systems on 2.6 have a problem with XFS+NFS.
>
>
> Knfsd threads in 2.6 are no longer serialised by the BKL, and the
> change has exposed a number of SMP issues in the dcache. Try the
> two patches at
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=108330112505555&w=2
>
> and
>
> http://linus.bkbits.net:8080/linux-2.5/[email protected]
>
> (the latter is in recent Linus kernels). If you're still having
> problems after applying those patches, Nathan and I need to know.
Do I read you right that this is an SMP issue and that the NFS, quota,
backup and all that are not relevant? I will pass on the patches you
supplied to someone who is having similar problems with no NFS and no
quota, a TB of storage which gets beaten without mercy 24x4.5 and which
has been having issues as load has gone up.
--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
Hi there,
On Wed, Sep 08, 2004 at 02:35:24PM +0200, Jakob Oestergaard wrote:
>
> First XFS bug:
> ---------------
> http://oss.sgi.com/bugzilla/show_bug.cgi?id=309
>
> Submitted in februrary this year - requires server reboot, NFS clients
> will then re-trigger the bug immediately after the NFS server is started
> again. Clearly not a pleasent problem.
>
> A fairly simple patch is available, which solves the problem in the most
> common cases. This simple patch has *not*yet* been included in 2.6.8.1.
>
Have you asked Christoph if he thinks that patch is ready for
inclusion? Its possibly just fallen through the cracks.
> Second XFS bug:
> ---------------
> Also causes the 'kernel BUG at fs/xfs/support/debug.c:106' message to be
> printed. This bug is not solved by applying the simple patch to the
> first problem.
>
> How well known this problem is, I don't know - I can get more details on
> this if anyone is actually interested in working on fixing XFS.
Yes please (it does help to actually contact the maintainers when
reporting bugs...). It is not well known.
> Third XFS bug:
> --------------
> XFS causes lowmem oom, triggering the OOM killer. Reported by
> [email protected] on the 18th of august.
>
> On the 24th of august, William Lee Irwin gives some suggestions and
> mentions "xfs has some known bad slab behavior."
Hmm? Which message was that?
> ...
> While the small server seems to be running well now, the large one has
> an average uptime of about one day (!) Backups will crash it reliably,
> when XFS doesn't OOM the box at random.
It would be a good idea to track the memory statistics while you're
running your workloads to see where in particular the memory is being
used when you hit OOM - /proc/{meminfo,slabinfo,buddyinfo}. I'd also
be interested to hear if that vfs_cache_pressure tweak that someone
recommended helps your load at all, thanks.
Is this xfsdump you're running for backups?
> Is anyone actually maintaining/bugfixing XFS?
Yes, there's a group of people actively working on it.
> Yes, I know the MAINTAINERS file,
But haven't figured out how to use it yet?
> ...
> trivial-to-trigger bugs that crash the system and have simple fixes,
> have not been fixed in current mainline kernels.
If you have trivial-to-trigger bugs (or other bugs) then please let
the folks at [email protected] know all the details (test cases,
etc, are quite useful).
cheers.
--
Nathan
On Thu, Sep 09, 2004 at 07:40:47AM +1000, Nathan Scott wrote:
> Hi there,
>
...
> >
> > A fairly simple patch is available, which solves the problem in the most
> > common cases. This simple patch has *not*yet* been included in 2.6.8.1.
> >
>
> Have you asked Christoph if he thinks that patch is ready for
> inclusion? Its possibly just fallen through the cracks.
With the feedback I've seen thus far, it seems that one possible
explanation for this is, that the patch only papers over the problem (by
changing XFS), but that the real problem is not in XFS and thus might be
fixed for real by a completely different set of patches (which sort of
makes sense since the small patch only cures the problem in the common
cases).
We'll know more about this tomorrow, hopefully, if Anders gets the new
test system up and running.
> > On the 24th of august, William Lee Irwin gives some suggestions and
> > mentions "xfs has some known bad slab behavior."
>
> Hmm? Which message was that?
http://lkml.org/lkml/2004/8/24/140
>
> > ...
> > While the small server seems to be running well now, the large one has
> > an average uptime of about one day (!) Backups will crash it reliably,
> > when XFS doesn't OOM the box at random.
>
> It would be a good idea to track the memory statistics while you're
> running your workloads to see where in particular the memory is being
> used when you hit OOM - /proc/{meminfo,slabinfo,buddyinfo}.
Slab usage in kilo-kilo-bytes (one K on the graph is one Megabyte):
http://saaby.com/slabused.gif
This was presented earlier in
http://lkml.org/lkml/2004/8/24/53
> I'd also
> be interested to hear if that vfs_cache_pressure tweak that someone
> recommended helps your load at all, thanks.
Anders will hopefully get a lot of this testing done tomorrow - by then
hopefully we'll know a lot more about all this.
>
> Is this xfsdump you're running for backups?
Veritas BackupExec was used, as far as I know
xfsdump will be tested soon.
The "small server" is backed up with tar (by Amanda).
>
> > Is anyone actually maintaining/bugfixing XFS?
>
> Yes, there's a group of people actively working on it.
>
> > Yes, I know the MAINTAINERS file,
>
> But haven't figured out how to use it yet?
Read on ;)
>
> > ...
> > trivial-to-trigger bugs that crash the system and have simple fixes,
> > have not been fixed in current mainline kernels.
>
> If you have trivial-to-trigger bugs (or other bugs) then please let
> the folks at [email protected] know all the details (test cases,
> etc, are quite useful).
They've known for 7 months (bug 309 in your bugzilla), but the problem
is still trivially triggered in 2.6.8.1.
That's why I posted to LKML.
We got a lot of very useful feedback from a broad audience, and it seems
that it *might* turn out that this XFS problem was never really a
problem in XFS itself.
Let's see what tomorrow brings.
Thaks all!
--
/ jakob
Hi Jakob,
On Thu, Sep 09, 2004 at 01:22:11AM +0200, Jakob Oestergaard wrote:
> On Thu, Sep 09, 2004 at 07:40:47AM +1000, Nathan Scott wrote:
> > > ...
> > > trivial-to-trigger bugs that crash the system and have simple fixes,
> > > have not been fixed in current mainline kernels.
> >
> > If you have trivial-to-trigger bugs (or other bugs) then please let
> > the folks at [email protected] know all the details (test cases,
> > etc, are quite useful).
>
> They've known for 7 months (bug 309 in your bugzilla), but the problem
> is still trivially triggered in 2.6.8.1.
>
OK, so could you add the details on how you're managing to hit it
into that bug?... when you say "trivially" - does that mean you
have a recipe that is guaranteed to quickly hit it? A reproducible
test case would be extremely useful in tracking this down.
thanks.
--
Nathan
On Thu, 2004-09-09 at 03:30, Jakob Oestergaard wrote:
> On Thu, Sep 09, 2004 at 02:36:58AM +1000, Greg Banks wrote:
> > On Thu, 2004-09-09 at 01:44, Jakob Oestergaard wrote:
> Ok - Anders ([email protected]) will hopefully get a test setup (similar
> to the big server) running tomorrow, and will then see if the system can
> be broken with these two patches applied.
Thanks, that will be a useful data point for Nathan.
> Are we right in assuming that no other patches should be necessary atop
> of 2.6.8.1 in order to get a stable XFS? (that we should not apply
> other XFS specific patches?)
Sorry, I didn't mean to be unclear. I'm merely a user of XFS and
am in no position to make a statement except that "it works for me"
with the usual caveat that my platform and workload may differ.
However, I do work on NFS for a living and can point out that in
the 2.6 SMP case there have been issues with the unnatural acts
knfsd needs to perform upon the dcache, and that these might be
factors in the problems you're seeing. In particular, the stack
traces I've seen just might be caused by dcache confusion caused
by one of the bugs I've mentioned. Or not.
I just wanted to point out that there are more potential sources
of bugs in your setup than just XFS.
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
On Thu, 2004-09-09 at 05:06, Bill Davidsen wrote:
> Greg Banks wrote:
> > On Thu, 2004-09-09 at 01:44, Jakob Oestergaard wrote:
> >
> >>SMP systems on 2.6 have a problem with XFS+NFS.
> >
> >
> > Knfsd threads in 2.6 are no longer serialised by the BKL, and the
> > change has exposed a number of SMP issues in the dcache. Try the
> > two patches at
> >
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=108330112505555&w=2
> >
> > and
> >
> > http://linus.bkbits.net:8080/linux-2.5/[email protected]
> >
> > (the latter is in recent Linus kernels). If you're still having
> > problems after applying those patches, Nathan and I need to know.
>
> Do I read you right that this is an SMP issue and that the NFS, quota,
> backup and all that are not relevant?
The first issue is NFS-specific. The second one is an ancient dcache
bug from before 2.6 which NFS changes in 2.6 uncovered; however it
could affect any filesystem with or without NFS. Neither have anything
to do with quota or backup per se, except that backup might generate
enough load to help them happen.
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
[cc'ing back to lkml]
On Wed, Sep 08, 2004 at 05:05:51PM -0500, Steve Lord wrote:
>
> I wonder what the effect of /proc/sys/vm/swappiness
> and /proc/sys/vm/vfs_cache_pressure is on these situations.
Hi Steve.
I very much doubt vm_swappiness will have any effect on this - it
just determines whether to throw away page cache pages or swap out
mapped pages - it won't affect the dentry cache size. The best
it can do is allow us to swap out more pages so the dentry cache
can grow larger....
Looking at vfs_cache_pressure (documented in Documentation/
filesystems/proc.txt), it is used to make the number of unused
inodes and dentries used by the system appear to be smaller or
larger to the slab shrinker function:
661 static int shrink_dcache_memory(int nr, unsigned int gfp_mask)
662 {
663 if (nr) {
664 if (!(gfp_mask & __GFP_FS))
665 return -1;
666 prune_dcache(nr);
667 }
668 return (dentry_stat.nr_unused / 100) * sysctl_vfs_cache_pressure;
669 }
and hence the shrinker will tend to remove more or less dentries or
inodes when the cache is asked to be shrunk. It will have no real
effect if the unused dentry list is small (i.e. we're actively
growing the dentry cache) which seems to be the case here.
FWIW, it appears to me that the real problem is that shrink_dcache_memory()
does not shrink the active dentry cache down - I think it needs to do more
than just free up unused dentries. I'm not saying this is an easy thing
to do (I don't know if it's even possible), but IMO if we allow the dentry
cache to grow without bound or without a method to shrink the active
tree we will hit this problem a lot more often as filesystems grow larger.
For those that know this code well, it looks like there's a bug
in the above code - the shrinker calls into this function first with
nr = 0 to determine how much it can reclaim from the slab.
If the dentry_stat.nr_unused is less than 100, then we'll return 0
due to integer division (99/100 = 0), and the shrinker calculations
will see this as a slab that does not need shrinking because:
185 list_for_each_entry(shrinker, &shrinker_list, list) {
186 unsigned long long delta;
187
188 delta = (4 * scanned) / shrinker->seeks;
189 delta *= (*shrinker->shrinker)(0, gfp_mask);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
190 do_div(delta, lru_pages + 1);
191 shrinker->nr += delta;
192 if (shrinker->nr < 0)
193 shrinker->nr = LONG_MAX; /* It wrapped! */
194
195 if (shrinker->nr <= SHRINK_BATCH)
196 continue;
because we returned zero and therefore delta becomes zero and
shrinker->nr never gets larger than SHRINK_BATCH.
Hence in low memory conditions when you've already reaped most of
the unused dentries, you can't free up the last 99 unused dentries.
Maybe this is intentional (anyone?) because there isn't very much to
free up in this case, but some memory freed is better than none when
you have nothing at all left.
Cheers,
Dave.
--
Dave Chinner
R&D Software Engineer
SGI Australian Software Group
Dave Chinner <[email protected]> wrote:
>
> If the dentry_stat.nr_unused is less than 100, then we'll return 0
> due to integer division (99/100 = 0), and the shrinker calculations
> will see this as a slab that does not need shrinking because:
>
> 185 list_for_each_entry(shrinker, &shrinker_list, list) {
> 186 unsigned long long delta;
> 187
> 188 delta = (4 * scanned) / shrinker->seeks;
> 189 delta *= (*shrinker->shrinker)(0, gfp_mask);
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 190 do_div(delta, lru_pages + 1);
> 191 shrinker->nr += delta;
> 192 if (shrinker->nr < 0)
> 193 shrinker->nr = LONG_MAX; /* It wrapped! */
> 194
> 195 if (shrinker->nr <= SHRINK_BATCH)
> 196 continue;
>
> because we returned zero and therefore delta becomes zero and
> shrinker->nr never gets larger than SHRINK_BATCH.
>
> Hence in low memory conditions when you've already reaped most of
> the unused dentries, you can't free up the last 99 unused dentries.
> Maybe this is intentional (anyone?) because there isn't very much to
> free up in this case, but some memory freed is better than none when
> you have nothing at all left.
Yes, it's intentional. Or at least, it's known-and-not-cared about ;)
The last 99 unused dentries will not be reaped.
On Thu, Sep 09, 2004 at 09:42:55AM +1000, Nathan Scott wrote:
> Hi Jakob,
>
...
> OK, so could you add the details on how you're managing to hit it
> into that bug?... when you say "trivially" - does that mean you
> have a recipe that is guaranteed to quickly hit it? A reproducible
> test case would be extremely useful in tracking this down.
On the two systems where I've seen this, the recipe is to set up an
SMP+NFS+XFS server, and have a number of clients mount the exported
filesystem, then perform reads and writes...
The two servers are used very differently - one is holding a small
number of source trees that are compiled/linked on a small cluster. The
other is holding a very large number of user home directories, where the
primary use is web serving (web servers running on the NFS clients).
A google for 'debug.c:106' turns out some 120 results - it seems that no
special magic is needed, other than a few boxes to set up the test
scenario.
On the 29th of februrary, [email protected] submitted (as comment #23 to
bug #309) a description of a test setup along with a shell script that
was used to trigger this problem.
--
/ jakob
On Thu, Sep 09, 2004 at 02:36:58AM +1000, Greg Banks wrote:
> On Thu, 2004-09-09 at 01:44, Jakob Oestergaard wrote:
> > SMP systems on 2.6 have a problem with XFS+NFS.
>
> Knfsd threads in 2.6 are no longer serialised by the BKL, and the
> change has exposed a number of SMP issues in the dcache. Try the
> two patches at
Ok - the "small" server just hosed itself with a debug.c:106 - so I'll
be doing some testing on that one as well (after hours, today).
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=108330112505555&w=2
Ok, I must say that mail has some *scary* comments to the patch... This
should be interesting :)
The patch assumes some dcache code is
--------------
if (res) {
spin_lock(&res->d_lock);
res->d_sb = inode->i_sb;
res->d_parent = res;
res->d_inode = inode;
res->d_bucket = d_hash(res, res->d_name.hash);
res->d_flags |= DCACHE_DISCONNECTED;
res->d_vfs_flags &= ~DCACHE_UNHASHED;
list_add(&res->d_alias, &inode->i_dentry);
hlist_add_head(&res->d_hash, &inode->i_sb->s_anon);
spin_unlock(&res->d_lock);
}
--------------
While it was actually changed to
--------------
if (res) {
spin_lock(&res->d_lock);
res->d_sb = inode->i_sb;
res->d_parent = res;
res->d_inode = inode;
/*
* Set d_bucket to an "impossible" bucket address so
* that d_move() doesn't get a false positive
*/
res->d_bucket = NULL;
res->d_flags |= DCACHE_DISCONNECTED;
res->d_flags &= ~DCACHE_UNHASHED;
list_add(&res->d_alias, &inode->i_dentry);
hlist_add_head(&res->d_hash, &inode->i_sb->s_anon);
spin_unlock(&res->d_lock);
}
--------------
I'm assuming I should just adapt this to the res->d_bucket change...
New patch against 2.6.8.1 attached.
>
> and
>
> http://linus.bkbits.net:8080/linux-2.5/[email protected]
This one is in plain 2.6.8.1 (as you said).
--
/ jakob
On Thu, Sep 09, 2004 at 02:11:00PM +0200, Jakob Oestergaard wrote:
> A google for 'debug.c:106' turns out some 120 results - it seems that no
> special magic is needed, other than a few boxes to set up the test
> scenario.
linux/fs/xfs/support/debuc.c:
84 void
85 cmn_err(register int level, char *fmt, ...)
86 {
[...]
105 if (level == CE_PANIC)
106 BUG();
107 }
Using cmn_err with CE_PANIC will show up as this, so it's likely your
google search is showing multiple different bugs, many of which have
been fixed. You need to check the stack traces to see if they are the
same.
--cw
On Fri, 2004-09-10 at 00:00, Jakob Oestergaard wrote:
> On Thu, Sep 09, 2004 at 02:36:58AM +1000, Greg Banks wrote:
> > On Thu, 2004-09-09 at 01:44, Jakob Oestergaard wrote:
> >
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=108330112505555&w=2
>
> Ok, I must say that mail has some *scary* comments to the patch... This
> should be interesting :)
Like I said, knfsd does unnatural things to the dcache.
> I'm assuming I should just adapt this to the res->d_bucket change...
Yes, there were a bunch of dcache changes all happening around
that time, this patch may well need some merging.
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
On Sep 09, 2004, at 22:40, Greg Banks wrote:
> Like I said, knfsd does unnatural things to the dcache.
Perhaps there needs to be a standard API that knfsd can use to do many
of the (currently) non-standard dcache operations. This would likely be
useful for other kernel-level file-servers that would be useful to have
(OpenAFS? Coda?). Of course, I could just be totally ignorant of some
nasty reason for the unstandardized hackery, but it doesn't hurt to
ask. :-D
Cheers,
Kyle Moffett
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a17 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$ r
!y?(-)
------END GEEK CODE BLOCK------
On Fri, 2004-09-10 at 13:04, Kyle Moffett wrote:
> On Sep 09, 2004, at 22:40, Greg Banks wrote:
> > Like I said, knfsd does unnatural things to the dcache.
>
> Perhaps there needs to be a standard API that knfsd can use to do many
> of the (currently) non-standard dcache operations. This would likely be
> useful for other kernel-level file-servers that would be useful to have
> (OpenAFS? Coda?). Of course, I could just be totally ignorant of some
> nasty reason for the unstandardized hackery, but it doesn't hurt to
> ask. :-D
In 2.6 there is an API and knfsd code is less interwoven with dcache
internals. In practice what this means is that the dcache code paths
which are only exercised by NFS move from NFS code into fs/dcache.c
and fs/exportfs/ and have a pretty wrapper but are not any less
unnatural or NFS-specific. The problem is the need to convert an NFS
file handle off the wire (which contains an inode number) into a dentry.
This kind of bottom-up construction of dentry paths is *painful* as
the dcache really wants to grow from an fs root down.
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
Hmm,
looks like probs I had recently on big filesystems under load on a smp
machine. the only thing different is that I m using samba instead of
nfs, but behaviour seems the same. On top of that we re encrypting the
entire filesystems on this machine whicht now holds a total of 9 TB of
attached storage.
In my observations those errors occure under the following conditions:
- using an smp system
- using applications which concurrently allocating memory in an
aggressive manner
- freemem is at the lower limit given in vm.min_free_kbytes
Following numerous threats in mailinglists the has been some
changes/patches for the most current kernels scoping these probs,
however, if you are bound to a distribution kernel it may help to set
the mentioned parameter
vm.min_free_kbytes
5 to 10 times bigger than given by default. At least for me it worked
... no more oops, even under heavy load/backup etc.
---
Independently from this prob, can anyone confirm that there is a prob
with concurrent allocating memory under load on linux smp systems?!
ciao
andi
Jakob Oestergaard wrote:
><frustrated_admin mode="on">
>
>Does anyone actually use XFS for serious file-serving? (yes, I run it
>on my desktop at home and I don't have problems there - such reports are
>not really relevant).
>
>Is anyone actually maintaining/bugfixing XFS? Yes, I know the
>MAINTAINERS file, but I am a little bit confused here - seeing that
>trivial-to-trigger bugs that crash the system and have simple fixes,
>have not been fixed in current mainline kernels.
>
>If XFS is a no-go because of lack of support, is there any realistic
>alternatives under Linux (taking our need for quota into account) ?
>
>And finally, if Linux is simply a no-go for high performance file
>serving, what other suggestions might people have? NetApp?
>
></>
>
>
>
In my expierence XFS, was right after JFS the worst and the slowest
filesystem ever made.
Since than, I am using reiserfs 3.6. I don't need quota for my servers,
but there is patch avaliable for it I belive, from SuSE.
ReiserFS is da most reliable FS for linux (with journaling). If you
don't need journaling, ext2 is da choice.
--
GJ
Just for the sake of it, I'm experiencing SMP+XFS+NFS problems as well.
See the oops below. It does not need alot of stress on the server to
trigger the oops. The filesystem is running on a Raid5 3Ware7506 array,
distro is FC2, kernel 2.6.9-rc1.
B.
---cut---
Unable to handle kernel paging request at virtual address 64c17165
printing eip:
c014221b
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: sg sr_mod binfmt_misc vmnet vmmon parport_pc lp
parport rfcomm l2cap bluetooth nfsd exportfs lockd sunrpc ipv6
iptable_filter ip_tables e1000 st ide_scsi ohci1394 ieee1394 uhci_hcd
ehci_hcd usbcore
CPU: 1
EIP: 0060:[<c014221b>] Tainted: P VLI
EFLAGS: 00010086 (2.6.9-rc1)
EIP is at kfree+0x3d/0x6d
eax: 00000001 ebx: 64c17165 ecx: f8a98560 edx: c17164f9
esi: f8b284dc edi: 00000282 ebp: f7825100 esp: f6c9bdd4
ds: 007b es: 007b ss: 0068
Process rpc.mountd (pid: 1753, threadinfo=f6c9a000 task=f6ffb870)
Stack: f8a985c0 00000001 f7825100 f6c9be90 00000001 f8a87e1a f8b284dc
c1bfa140
f8a88432 f7825100 f8a985c0 00000000 f8a99778 f6c9bf58 f6e4c760
f6c9bec0
4142f814 f8a8807e f6c9be90 00000001 00000032 f6c9be40 f6c9be44
f6c9be48
Call Trace:
[<f8a87e1a>] ip_map_put+0x45/0x71 [sunrpc]
[<f8a88432>] ip_map_lookup+0x260/0x2f9 [sunrpc]
[<f8a8807e>] ip_map_parse+0x19c/0x24f [sunrpc]
[<c0148c03>] do_anonymous_page+0x143/0x179
[<c0148c9c>] do_no_page+0x63/0x2c2
[<c0149114>] handle_mm_fault+0x105/0x189
[<c01193fa>] do_page_fault+0x141/0x511
[<c0234e28>] copy_from_user+0x42/0x6e
[<f8a8b18d>] cache_write+0xb2/0xca [sunrpc]
[<f8a8b0db>] cache_write+0x0/0xca [sunrpc]
[<c01571af>] vfs_write+0xb0/0x119
[<c01572e9>] sys_write+0x51/0x80
[<c0105f91>] sysenter_past_esp+0x52/0x71
Code: 18 85 f6 74 36 9c 5f fa 8b 15 10 a6 48 c0 8d 86 00 00 00 40 c1 e8
0c c1 e0 05 8b 54 02 18 b8 00 e0 ff ff 21 e0 8b 40 10 8b 1c 82 <8b> 03
3b 43 04 73 19 89 74 83 10 83 03 01 57 9d 8b 5c 24 08 8b
> In my expierence XFS, was right after JFS the worst and the slowest
> filesystem ever made.
On our NFS benchmarks JFS is _significantly_ faster than ext3 and
reiserfs. It depends on your workload but calling JFS the worst and
slowest filesystem ever made is unfair.
Anton
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Jakob Oestergaard wrote:
| <frustrated_admin mode="on">
|
| Does anyone actually use XFS for serious file-serving? (yes, I run it
| on my desktop at home and I don't have problems there - such reports are
| not really relevant).
I have our fileserver running completly on XFS (because its quota &
journaled).
I have an internal 60GB HW RAID 1 and an external 4 disk SCSI 400GB
software RAID 5 both running XFS. The Server is NFS, Samba and Appletalk
(thought that is almost not used). NFS is not the main point (except the
servers for sharing a backup disk and two office PCs who run linux there
is no NFS traffic, the rest ~50 PCs connect via Samba). It's Xeon single
CPU box, but I have an SMP kernel because of HT. 2GB ram.
I haven't had a single XFS connected error. It surved 5 hard reboots
because of another external disk that got berserk and forced me to turn
on/off the server.
A nightly backup on another HD on the same box goes well, even from 4
other servers via NFS.
| Is anyone actually maintaining/bugfixing XFS? Yes, I know the
| MAINTAINERS file, but I am a little bit confused here - seeing that
| trivial-to-trigger bugs that crash the system and have simple fixes,
| have not been fixed in current mainline kernels.
well there is the linux-xfs ML ... :)
| If XFS is a no-go because of lack of support, is there any realistic
| alternatives under Linux (taking our need for quota into account) ?
lack of support. in my opinion there work some very bright persons. Main
problem is, that it comes from a system which is completly different
designed than linux and I think this problem still triggers those SMP,
etc bugs.
| And finally, if Linux is simply a no-go for high performance file
| serving, what other suggestions might people have? NetApp?
Well. I am not yet in the TB leage and 200+ user boxes, etc. So I can't
say about that. But that will come soon, and then I will see if I have
to runt about that.
lg, clemens
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFBQwSjjBz/yQjBxz8RAg3nAKCOsh6TieGXgmutX/sbge4JvvKLMgCgghfg
IWo7h1QIZhGUOv0FH51FOVE=
=76lm
-----END PGP SIGNATURE-----
In article <20040911133812.GC32755@krispykreme>,
Anton Blanchard <[email protected]> wrote:
>
>> In my expierence XFS, was right after JFS the worst and the slowest
>> filesystem ever made.
>
>On our NFS benchmarks JFS is _significantly_ faster than ext3 and
>reiserfs. It depends on your workload but calling JFS the worst and
>slowest filesystem ever made is unfair.
Same goes for XFS. In the application I use it for it is by _far_
the fastest filesystem, whereas reiser is by far the slowest.
ext3 is somewhere in between.
And that's because XFS has extents and does pre-allocation.
Mike.
--
"In times of universal deceit, telling the truth becomes
a revolutionary act." -- George Orwell.
Anton Blanchard wrote:
>>In my expierence XFS, was right after JFS the worst and the slowest
>>filesystem ever made.
>>
>>
>
>On our NFS benchmarks JFS is _significantly_ faster than ext3 and
>reiserfs. It depends on your workload but calling JFS the worst and
>slowest filesystem ever made is unfair.
>
>
as always, I am speaking for my own and colegues expierence.
We are using dell SMP machines with p3 and p4, different speeds. Plus
hardware SCSI raid5's.
For samba, and VoIP services, CVS, and mail. Maybe JFS works nicely with
NFS, but my expierence shows
that XFS is the slowest among all filesystems vaillia linux 2.6 can
serve. JFS was not so extensively tested, but it doesn't do miracles,
and expierence shows it's rather close to XFS. Reiserfs so far was the
finest. Maybe because we have pretty much number of files, mostly small
ones. I don't know.
I didn't ment to hurt anyone's feelings, just giving my opinion on FSs.
Thanks.
--
GJ
On Thu, Sep 09, 2004 at 04:00:17PM +0200, Jakob Oestergaard wrote:
>
> >
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=108330112505555&w=2
>
> Ok, I must say that mail has some *scary* comments to the patch... This
> should be interesting :)
...
>
> I'm assuming I should just adapt this to the res->d_bucket change...
>
> New patch against 2.6.8.1 attached.
>
Ok - the "small" box has been running with this patch since yesterday
evening - I ran some stress testing on it for some hours yesterday, and
will be working on the machine all day today.
So far, it seems like the patch at least hasn't broken anything (if I
had file corruption I should have noticed already, because the testing
I've been doing is some large compile/link jobs - those things tend to
fail if the .o files are corrupted).
It's a little early to say if it solves the problem. I would say it
looks good so far - but let's see.
We'll also have to see about the test setup duplicating the "large" box.
I'll let you know if anything breaks - and I'll ask to have the patch
included by the end of the week, if the small box hasn't hosed itself by
then.
--
/ jakob
On Mon, Sep 13, 2004 at 09:29:19AM +0200, Jakob Oestergaard wrote:
...
> I'll let you know if anything breaks - and I'll ask to have the patch
> included by the end of the week, if the small box hasn't hosed itself by
> then.
Ok - it's been almost a week now. Time for a status update:
It seems that the theory that the proposed dcache patch is fixing an SMP
race which mostly affects XFS on NFS servers, is sound.
Further, it seems that the XFS patch that never went into the kernel and
really shouldn't be necessary, really isn't necessary. That the sole
problem with SMP+XFS+NFS is solved by the dcache patch (or by running a
UP kernel).
The "large" box is currently running a uniprocessor 2.6.8.1 with the XFS
patch (which didn't quite solve the problem on SMP). It has been rock
solid for seven days with load around 40 during the day. It has been
months since this machine had such "high" uptimes.
The "small" box has been running an SMP 2.6.8.1 kernel with the dcache
patch (but *not* the XFS patch). This machine, too, has been rock solid.
The conclusion so far, seems to be that the old XFS patch is most likely
unnecessary (although this is not confirmed on the large box, but it
seems to have been the general consensus from the XFS people all along),
that no patches are necessary in the UP case of NFS+XFS file serving,
and that the dcache patch solves the real problem in the SMP case.
In the light of all this, I would like to suggest that the following
patch is included in mainline - it is the old patch from Neil Brown
(http://marc.theaimsgroup.com/?l=linux-kernel&m=108330112505555&w=2)
adapted by me for 2.6.8.1 (this patch was attached to a previous mail
from me in this thread as well):
--- fs/dcache.c.orig Sat Aug 14 12:54:50 2004
+++ fs/dcache.c Thu Sep 9 15:56:04 2004
@@ -286,12 +286,11 @@
* any other hashed alias over that one.
*/
-struct dentry * d_find_alias(struct inode *inode)
+static struct dentry * __d_find_alias(struct inode *inode, int want_discon)
{
struct list_head *head, *next, *tmp;
struct dentry *alias, *discon_alias=NULL;
- spin_lock(&dcache_lock);
head = &inode->i_dentry;
next = inode->i_dentry.next;
while (next != head) {
@@ -302,19 +301,26 @@
if (!d_unhashed(alias)) {
if (alias->d_flags & DCACHE_DISCONNECTED)
discon_alias = alias;
- else {
+ else if (!want_discon) {
__dget_locked(alias);
- spin_unlock(&dcache_lock);
return alias;
}
}
}
if (discon_alias)
__dget_locked(discon_alias);
- spin_unlock(&dcache_lock);
return discon_alias;
}
+struct dentry * d_find_alias(struct inode *inode)
+{
+ struct dentry *de;
+ spin_lock(&dcache_lock);
+ de = __d_find_alias(inode, 0);
+ spin_unlock(&dcache_lock);
+ return de;
+}
+
/*
* Try to kill dentries associated with this inode.
* WARNING: you must own a reference to inode.
@@ -833,33 +839,27 @@
tmp->d_parent = tmp; /* make sure dput doesn't croak */
spin_lock(&dcache_lock);
- if (S_ISDIR(inode->i_mode) && !list_empty(&inode->i_dentry)) {
- /* A directory can only have one dentry.
- * This (now) has one, so use it.
- */
- res = list_entry(inode->i_dentry.next, struct dentry, d_alias);
- __dget_locked(res);
- } else {
+ res = __d_find_alias(inode, 0);
+ if (!res) {
/* attach a disconnected dentry */
res = tmp;
tmp = NULL;
- if (res) {
- spin_lock(&res->d_lock);
- res->d_sb = inode->i_sb;
- res->d_parent = res;
- res->d_inode = inode;
+ spin_lock(&res->d_lock);
+ res->d_sb = inode->i_sb;
+ res->d_parent = res;
+ res->d_inode = inode;
+
+ /*
+ * Set d_bucket to an "impossible" bucket address so
+ * that d_move() doesn't get a false positive
+ */
+ res->d_bucket = NULL;
+ res->d_flags |= DCACHE_DISCONNECTED;
+ res->d_flags &= ~DCACHE_UNHASHED;
+ list_add(&res->d_alias, &inode->i_dentry);
+ hlist_add_head(&res->d_hash, &inode->i_sb->s_anon);
+ spin_unlock(&res->d_lock);
- /*
- * Set d_bucket to an "impossible" bucket address so
- * that d_move() doesn't get a false positive
- */
- res->d_bucket = NULL;
- res->d_flags |= DCACHE_DISCONNECTED;
- res->d_flags &= ~DCACHE_UNHASHED;
- list_add(&res->d_alias, &inode->i_dentry);
- hlist_add_head(&res->d_hash, &inode->i_sb->s_anon);
- spin_unlock(&res->d_lock);
- }
inode = NULL; /* don't drop reference */
}
spin_unlock(&dcache_lock);
@@ -881,7 +881,7 @@
* DCACHE_DISCONNECTED), then d_move that in place of the given dentry
* and return it, else simply d_add the inode to the dentry and return NULL.
*
- * This is (will be) needed in the lookup routine of any filesystem that is exportable
+ * This is needed in the lookup routine of any filesystem that is exportable
* (via knfsd) so that we can build dcache paths to directories effectively.
*
* If a dentry was found and moved, then it is returned. Otherwise NULL
@@ -892,11 +892,11 @@
{
struct dentry *new = NULL;
- if (inode && S_ISDIR(inode->i_mode)) {
+ if (inode) {
spin_lock(&dcache_lock);
- if (!list_empty(&inode->i_dentry)) {
- new = list_entry(inode->i_dentry.next, struct dentry, d_alias);
- __dget_locked(new);
+ new = __d_find_alias(inode, 1);
+ if (new) {
+ BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
spin_unlock(&dcache_lock);
security_d_instantiate(new, inode);
d_rehash(dentry);
PS: I'll be without internet access from now until sunday the 26th - if
there are comments or questions, please make sure they are sent to this
list and/or [email protected] - Anders can get in touch with me if
necessary.
--
/ jakob
On Fri, Sep 17, 2004 at 01:26:47PM +0200, Jakob Oestergaard wrote:
> In the light of all this, I would like to suggest that the following
> patch is included in mainline - it is the old patch from Neil Brown
> (http://marc.theaimsgroup.com/?l=linux-kernel&m=108330112505555&w=2)
> adapted by me for 2.6.8.1 (this patch was attached to a previous mail
> from me in this thread as well):
Neil, any chance we could get this patch into mainline ASAP? Without
it we get ->get_parent called on non-directories under heavy NFS loads..
--- fs/dcache.c.orig Sat Aug 14 12:54:50 2004
+++ fs/dcache.c Thu Sep 9 15:56:04 2004
@@ -286,12 +286,11 @@
* any other hashed alias over that one.
*/
-struct dentry * d_find_alias(struct inode *inode)
+static struct dentry * __d_find_alias(struct inode *inode, int want_discon)
{
struct list_head *head, *next, *tmp;
struct dentry *alias, *discon_alias=NULL;
- spin_lock(&dcache_lock);
head = &inode->i_dentry;
next = inode->i_dentry.next;
while (next != head) {
@@ -302,19 +301,26 @@
if (!d_unhashed(alias)) {
if (alias->d_flags & DCACHE_DISCONNECTED)
discon_alias = alias;
- else {
+ else if (!want_discon) {
__dget_locked(alias);
- spin_unlock(&dcache_lock);
return alias;
}
}
}
if (discon_alias)
__dget_locked(discon_alias);
- spin_unlock(&dcache_lock);
return discon_alias;
}
+struct dentry * d_find_alias(struct inode *inode)
+{
+ struct dentry *de;
+ spin_lock(&dcache_lock);
+ de = __d_find_alias(inode, 0);
+ spin_unlock(&dcache_lock);
+ return de;
+}
+
/*
* Try to kill dentries associated with this inode.
* WARNING: you must own a reference to inode.
@@ -833,33 +839,27 @@
tmp->d_parent = tmp; /* make sure dput doesn't croak */
spin_lock(&dcache_lock);
- if (S_ISDIR(inode->i_mode) && !list_empty(&inode->i_dentry)) {
- /* A directory can only have one dentry.
- * This (now) has one, so use it.
- */
- res = list_entry(inode->i_dentry.next, struct dentry, d_alias);
- __dget_locked(res);
- } else {
+ res = __d_find_alias(inode, 0);
+ if (!res) {
/* attach a disconnected dentry */
res = tmp;
tmp = NULL;
- if (res) {
- spin_lock(&res->d_lock);
- res->d_sb = inode->i_sb;
- res->d_parent = res;
- res->d_inode = inode;
+ spin_lock(&res->d_lock);
+ res->d_sb = inode->i_sb;
+ res->d_parent = res;
+ res->d_inode = inode;
+
+ /*
+ * Set d_bucket to an "impossible" bucket address so
+ * that d_move() doesn't get a false positive
+ */
+ res->d_bucket = NULL;
+ res->d_flags |= DCACHE_DISCONNECTED;
+ res->d_flags &= ~DCACHE_UNHASHED;
+ list_add(&res->d_alias, &inode->i_dentry);
+ hlist_add_head(&res->d_hash, &inode->i_sb->s_anon);
+ spin_unlock(&res->d_lock);
- /*
- * Set d_bucket to an "impossible" bucket address so
- * that d_move() doesn't get a false positive
- */
- res->d_bucket = NULL;
- res->d_flags |= DCACHE_DISCONNECTED;
- res->d_flags &= ~DCACHE_UNHASHED;
- list_add(&res->d_alias, &inode->i_dentry);
- hlist_add_head(&res->d_hash, &inode->i_sb->s_anon);
- spin_unlock(&res->d_lock);
- }
inode = NULL; /* don't drop reference */
}
spin_unlock(&dcache_lock);
@@ -881,7 +881,7 @@
* DCACHE_DISCONNECTED), then d_move that in place of the given dentry
* and return it, else simply d_add the inode to the dentry and return NULL.
*
- * This is (will be) needed in the lookup routine of any filesystem that is exportable
+ * This is needed in the lookup routine of any filesystem that is exportable
* (via knfsd) so that we can build dcache paths to directories effectively.
*
* If a dentry was found and moved, then it is returned. Otherwise NULL
@@ -892,11 +892,11 @@
{
struct dentry *new = NULL;
- if (inode && S_ISDIR(inode->i_mode)) {
+ if (inode) {
spin_lock(&dcache_lock);
- if (!list_empty(&inode->i_dentry)) {
- new = list_entry(inode->i_dentry.next, struct dentry, d_alias);
- __dget_locked(new);
+ new = __d_find_alias(inode, 1);
+ if (new) {
+ BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
spin_unlock(&dcache_lock);
security_d_instantiate(new, inode);
d_rehash(dentry);
PS: I'll be without internet access from now until sunday the 26th - if
there are comments or questions, please make sure they are sent to this
list and/or [email protected] - Anders can get in touch with me if
necessary.
--
/ jakob
On Wed, 29 Sep 2004, Christoph Hellwig wrote:
>
> Neil, any chance we could get this patch into mainline ASAP? Without
> it we get ->get_parent called on non-directories under heavy NFS loads..
Sorry, my bad. I had asked Al to ack it a long time ago, and he said
"looks ok, want more background", and I never got around to that.
The patch looks right, nobody really complains, so I applied it.
Linus