I recently decided to reinstall my system and at the same time try a new
file system. Trying to decide what filesystem to use I found a few
benchmarks but either they don't compare all available fs's, are too
synthetic (copy a source tree multiple times or raw i/o), or are meant
for servers/databases (like Bonnie++). The two most file system
intensive tasks I do regularly are `apt-get upgrade` waiting for the
packages to extract and set themselves up and messing around with the
kernel so I benchmarked these. To make it more realistic I installed
ccache and did two compiles, one to fill the cache and a second using
the full cache.
The tests I timed (in order):
* Debootstrap to install base Debian system
* Extract the kernel source
* Run `make all` using the defconfig and an empty ccache
* Copy the entire new directory tree
* Run `make clean`
* Run `make all` again, this time using the filled ccache
* Deleting the entire directory tree
Here is summary of the results based upon what I am calling "dead" time
calculated as `total time - user time`. As you can see in the full
results on my website the user time is almost identical between
filesystems, so I believe this is an accurate comparison. The dead time
is then normalized using ext2 as a baseline (> 1 means it took that many
times longer than ext2).
FS deb tar make cp clean make2 rm total
ext2 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
ext3 1.12 2.47 0.88 1.16 0.91 0.93 3.01 1.13
jfs 1.64 2.18 1.22 1.90 1.60 1.19 12.84 1.79
reiser 1.12 1.99 1.05 1.41 0.92 1.56 1.42 1.28
reiser4 2.69 1.87 1.80 0.63 1.33 2.71 4.14 1.83
xfs 1.06 1.99 0.97 1.67 0.78 1.03 10.27 1.43
Some observations of mine
* Ext2 is still overall the fastest but I think the margin is small
enough that a journal is well worth it
* Ext3, ReiserFS, and XFS all perform similarly and almost up to
Ext2 except:
o XFS takes an abnormally long time to do a large rm even
though it is very fast at a kernel `make clean`
o ReiserFS is significantly slower at the second make (from
ccache)
* JFS is fairly slow overall
* Reiser4 is exceptionally fast at synthetic benchmarks like copying
the system and untaring, but is very slow at the real-world
debootstrap and kernel compiles.
* Though I didn't benchmark it, ReiserFS sometimes takes a second or
two to mount and Reiser4 sometimes takes a second or two to unmount
while all other filesystem's are instantaneous.
Originally I had planned on using Reiser4 because of the glowing reviews
they give themselves but I'm simply not seeing it. It might be that my
Reiser4 is somehow broken but I don't think so. Based on these results I
personally am now going with XFS as it's faster than ReiserFS in the
real-world benchmarks and my current Ext3 partition's performance is
getting worse and worse.
Full benchmark results, system information, and the script I used to run
these tests are available from my website here:
<http://avatar.res.cmu.edu/news/pages/Projects/2.6FileSystemBenchmarks>
Feel free to comment, suggest improvements to my script, or run the test
yourself.
-Peter Nelson
Are you sure your benchmark is large enough to not fit into memory,
particularly the first stages of it? It looks like not. reiser4 is
much faster on tasks like untarring enough files to not fit into ram,
but (despite your words) your results seem to show us as slower unless I
misread them....
Reiser4 performs best on benchmarks that use the disk drive, and we
usually only run benchmarks that use the disk drive.
Hans
Peter Nelson wrote:
> I recently decided to reinstall my system and at the same time try a
> new file system. Trying to decide what filesystem to use I found a few
> benchmarks but either they don't compare all available fs's, are too
> synthetic (copy a source tree multiple times or raw i/o), or are meant
> for servers/databases (like Bonnie++). The two most file system
> intensive tasks I do regularly are `apt-get upgrade` waiting for the
> packages to extract and set themselves up and messing around with the
> kernel so I benchmarked these. To make it more realistic I installed
> ccache and did two compiles, one to fill the cache and a second using
> the full cache.
>
> The tests I timed (in order):
> * Debootstrap to install base Debian system
> * Extract the kernel source
> * Run `make all` using the defconfig and an empty ccache
> * Copy the entire new directory tree
> * Run `make clean`
> * Run `make all` again, this time using the filled ccache
> * Deleting the entire directory tree
>
> Here is summary of the results based upon what I am calling "dead"
> time calculated as `total time - user time`.
You should be able to script out the user time.
> As you can see in the full results on my website the user time is
> almost identical between filesystems, so I believe this is an accurate
> comparison. The dead time is then normalized using ext2 as a baseline
> (> 1 means it took that many times longer than ext2).
>
> FS deb tar make cp clean make2 rm total
> ext2 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
> ext3 1.12 2.47 0.88 1.16 0.91 0.93 3.01 1.13
> jfs 1.64 2.18 1.22 1.90 1.60 1.19 12.84 1.79
> reiser 1.12 1.99 1.05 1.41 0.92 1.56 1.42 1.28
> reiser4 2.69 1.87 1.80 0.63 1.33 2.71 4.14 1.83
> xfs 1.06 1.99 0.97 1.67 0.78 1.03 10.27 1.43
>
> Some observations of mine
> * Ext2 is still overall the fastest but I think the margin is small
> enough that a journal is well worth it
> * Ext3, ReiserFS, and XFS all perform similarly and almost up to
> Ext2 except:
> o XFS takes an abnormally long time to do a large rm even
> though it is very fast at a kernel `make clean`
> o ReiserFS is significantly slower at the second make (from
> ccache)
> * JFS is fairly slow overall
> * Reiser4 is exceptionally fast at synthetic benchmarks like copying
> the system and untaring, but is very slow at the real-world
> debootstrap and kernel compiles.
> * Though I didn't benchmark it, ReiserFS sometimes takes a second or
> two to mount and Reiser4 sometimes takes a second or two to unmount
> while all other filesystem's are instantaneous.
>
> Originally I had planned on using Reiser4 because of the glowing
> reviews they give themselves but I'm simply not seeing it. It might be
> that my Reiser4 is somehow broken but I don't think so. Based on these
> results I personally am now going with XFS as it's faster than
> ReiserFS in the real-world benchmarks and my current Ext3 partition's
> performance is getting worse and worse.
>
> Full benchmark results, system information, and the script I used to
> run these tests are available from my website here:
> <http://avatar.res.cmu.edu/news/pages/Projects/2.6FileSystemBenchmarks>
>
> Feel free to comment, suggest improvements to my script, or run the
> test yourself.
> -Peter Nelson
>
>
--
Hans
Hans Reiser wrote:
> Are you sure your benchmark is large enough to not fit into memory,
> particularly the first stages of it? It looks like not. reiser4 is
> much faster on tasks like untarring enough files to not fit into ram,
> but (despite your words) your results seem to show us as slower unless
> I misread them....
I'm pretty sure most of the benchmarking I am doing fits into ram,
particularly because my system has 1GB of it, but I see this as
realistic. When I download a bunch of debs (or rpms or the kernel) I'm
probably going to install them directly with them still in the file
cache. Same with rebuilding the kernel after working on it.
For untarring reiser4 is the fastest other than ext2. A somewhat less
ambiguous conclusion:
* Reiser4 is exceptionally fast at copying the system and the
fastest other than Ext2 at untaring, but is very slow at the
real-world debootstrap and kernel compiles.
> Reiser4 performs best on benchmarks that use the disk drive, and we
> usually only run benchmarks that use the disk drive.
I'm confused as to why performing a benchmark out of cache as opposed to
on disk would hurt performance?
> Here is summary of the results based upon what I am calling "dead"
> time calculated as `total time - user time`.
>
> You should be able to script out the user time.
I'm working with a friend of mine here at CMU doing hard drive research
to create a execution trace and test that directly instead of performing
all of the script actions.
-Peter Nelson
Hey there,
> Based on these results I personally am now going with XFS as it's
> faster than ReiserFS in the real-world benchmarks and my current
> Ext3 partition's performance is getting worse and worse.
If your current Ext3 partition was created under 2.4.x, you may wish to
recreate it under 2.6. 2.6 uses a different algorithm to lay out
directory blocks (google on 'orlov allocator') and this can affect
performance.
~ ~
Nicely done, btw.
Ray
On Tue, 2004-03-02 at 09:34, Peter Nelson wrote:
> Hans Reiser wrote:
>
> I'm confused as to why performing a benchmark out of cache as opposed to
> on disk would hurt performance?
My understanding (which could be completely wrong) is that reieserfs v3
and v4 are algorithmically more complex than ext2 or ext3. Reiserfs
spends more CPU time to make the eventual ondisk operations more
efficient/faster.
When operating purely or mostly out of ram, the higher CPU utilization
of reiserfs hurts performance compared to ext2 and ext3.
When your system I/O utilization exceeds cache size and your disks
starting getting busy, the CPU time previously invested by reiserfs pays
big dividends and provides large performance gains versus more
simplistic filesystems.
In other words, the CPU penalty paid by reiserfs v3/v4 is more than made
up for by the resultant more efficient disk operations. Reiserfs trades
CPU for disk performance.
In a nutshell, if you have more memory than you know what do to with,
stick with ext3. If you spend all your time waiting for disk operations
to complete, go with reiserfs.
Dax Kelson
Guru Labs
On Tue, Mar 02, 2004 at 03:33:13PM -0700, Dax Kelson wrote:
> On Tue, 2004-03-02 at 09:34, Peter Nelson wrote:
> > Hans Reiser wrote:
> >
> > I'm confused as to why performing a benchmark out of cache as opposed to
> > on disk would hurt performance?
>
> My understanding (which could be completely wrong) is that reieserfs v3
> and v4 are algorithmically more complex than ext2 or ext3. Reiserfs
> spends more CPU time to make the eventual ondisk operations more
> efficient/faster.
>
> When operating purely or mostly out of ram, the higher CPU utilization
> of reiserfs hurts performance compared to ext2 and ext3.
>
> When your system I/O utilization exceeds cache size and your disks
> starting getting busy, the CPU time previously invested by reiserfs pays
> big dividends and provides large performance gains versus more
> simplistic filesystems.
>
> In other words, the CPU penalty paid by reiserfs v3/v4 is more than made
> up for by the resultant more efficient disk operations. Reiserfs trades
> CPU for disk performance.
>
> In a nutshell, if you have more memory than you know what do to with,
> stick with ext3. If you spend all your time waiting for disk operations
> to complete, go with reiserfs.
Or rather, if you have more memory than you know what to do with, use
ext3. If you have more CPU power than you know what to do with, use
ReiserFS[34].
On slower machines, I generally prefer a little slower I/O rather than
having the entire system sluggish because of higher CPU-usage.
Regards: David Weinehall
--
/) David Weinehall <[email protected]> /) Northern lights wander (\
// Maintainer of the v2.0 kernel // Dance across the winter sky //
\) http://www.acc.umu.se/~tao/ (/ Full colour fire (/
XFS is the best filesystem.
David Weinehall wrote:
>On Tue, Mar 02, 2004 at 03:33:13PM -0700, Dax Kelson wrote:
>
>
>>On Tue, 2004-03-02 at 09:34, Peter Nelson wrote:
>>
>>
>>>Hans Reiser wrote:
>>>
>>>I'm confused as to why performing a benchmark out of cache as opposed to
>>>on disk would hurt performance?
>>>
>>>
>>My understanding (which could be completely wrong) is that reieserfs v3
>>and v4 are algorithmically more complex than ext2 or ext3. Reiserfs
>>spends more CPU time to make the eventual ondisk operations more
>>efficient/faster.
>>
>>When operating purely or mostly out of ram, the higher CPU utilization
>>of reiserfs hurts performance compared to ext2 and ext3.
>>
>>When your system I/O utilization exceeds cache size and your disks
>>starting getting busy, the CPU time previously invested by reiserfs pays
>>big dividends and provides large performance gains versus more
>>simplistic filesystems.
>>
>>In other words, the CPU penalty paid by reiserfs v3/v4 is more than made
>>up for by the resultant more efficient disk operations. Reiserfs trades
>>CPU for disk performance.
>>
>>In a nutshell, if you have more memory than you know what do to with,
>>stick with ext3. If you spend all your time waiting for disk operations
>>to complete, go with reiserfs.
>>
>>
>
>Or rather, if you have more memory than you know what to do with, use
>ext3. If you have more CPU power than you know what to do with, use
>ReiserFS[34].
>
>On slower machines, I generally prefer a little slower I/O rather than
>having the entire system sluggish because of higher CPU-usage.
>
>
>Regards: David Weinehall
>
>
On Tue, Mar 02, 2004 at 08:30:32PM -0500, Andrew Ho wrote:
> XFS is the best filesystem.
Well it'd better be, it's 10 times the size of ext3, 5 times the size of
ReiserFS and 3.5 times the size of JFS.
And people say size doesn't matter.
Regards: David Weinehall
--
/) David Weinehall <[email protected]> /) Northern lights wander (\
// Maintainer of the v2.0 kernel // Dance across the winter sky //
\) http://www.acc.umu.se/~tao/ (/ Full colour fire (/
David Weinehall <[email protected]> writes:
> On Tue, Mar 02, 2004 at 08:30:32PM -0500, Andrew Ho wrote:
> > XFS is the best filesystem.
>
> Well it'd better be, it's 10 times the size of ext3, 5 times the size of
> ReiserFS and 3.5 times the size of JFS.
I think your ext3 numbers are off, most likely you didn't include JBD.
> And people say size doesn't matter.
A lot of this is actually optional features the other FS don't have,
like support for separate realtime volumes and compat code for old
revisions, journaled quotas etc. I think you could
relatively easily do a "mini xfs" that would be a lot smaller.
But on today's machines it's not really an issue anymore.
-Andi
On Wednesday 03 March 2004 02:41, David Weinehall wrote:
> On Tue, Mar 02, 2004 at 08:30:32PM -0500, Andrew Ho wrote:
> > XFS is the best filesystem.
>
> Well it'd better be, it's 10 times the size of ext3, 5 times the size of
> ReiserFS and 3.5 times the size of JFS.
>
> And people say size doesn't matter.
Recoverability matters to me. The driver could be 10 megabyte and
*I* would not care. XFS seems to stand no matter how rudely the OS
is knocked down.
After a few hundred crashes (laptop, kids, drained batteries) I'd expect
something bad to happen, but no. XFS returns my data quickly and happily
everytime (as opposed to most of the time). Maybe the're a bit of luck.
Salute to XFS!
-- robin
Unfortunately it is a bit more complex, and the truth is less
complementary to us than what you write. Reiser4's CPU usage has come
down a lot, but it still consumes more CPU than V3. It should consume
less, and Zam is currently working on making writes more CPU efficient.
As soon as I get funding from somewhere and can stop worrying about
money, I will do a complete code review, and CPU usage will go way
down. There are always lots of stupid little things that consume a lot
of CPU that I find whenever I stop chasing money and review code.
We are shipping because CPU usage is not as important as IO efficiency
for a filesystem, and while Reiser4 is not as fast as it will be in 3-6
months, it is faster than anything else available so it should be shipped.
Hans
Dax Kelson wrote:
>On Tue, 2004-03-02 at 09:34, Peter Nelson wrote:
>
>
>>Hans Reiser wrote:
>>
>>I'm confused as to why performing a benchmark out of cache as opposed to
>>on disk would hurt performance?
>>
>>
>
>My understanding (which could be completely wrong) is that reieserfs v3
>and v4 are algorithmically more complex than ext2 or ext3. Reiserfs
>spends more CPU time to make the eventual ondisk operations more
>efficient/faster.
>
>When operating purely or mostly out of ram, the higher CPU utilization
>of reiserfs hurts performance compared to ext2 and ext3.
>
>When your system I/O utilization exceeds cache size and your disks
>starting getting busy, the CPU time previously invested by reiserfs pays
>big dividends and provides large performance gains versus more
>simplistic filesystems.
>
>In other words, the CPU penalty paid by reiserfs v3/v4 is more than made
>up for by the resultant more efficient disk operations. Reiserfs trades
>CPU for disk performance.
>
>In a nutshell, if you have more memory than you know what do to with,
>stick with ext3. If you spend all your time waiting for disk operations
>to complete, go with reiserfs.
>
>Dax Kelson
>Guru Labs
>
>
>
>
>
--
Hans
On Wed, Mar 03, 2004 at 03:39:26AM +0100, Andi Kleen wrote:
> A lot of this is actually optional features the other FS don't have,
> like support for separate realtime volumes and compat code for old
> revisions, journaled quotas etc. I think you could
> relatively easily do a "mini xfs" that would be a lot smaller.
And a whole lot of code to stay somewhat in sync with other codebases..
Christoph Hellwig wrote:
>On Wed, Mar 03, 2004 at 03:39:26AM +0100, Andi Kleen wrote:
>
>
>>A lot of this is actually optional features the other FS don't have,
>>like support for separate realtime volumes and compat code for old
>>revisions, journaled quotas etc. I think you could
>>relatively easily do a "mini xfs" that would be a lot smaller.
>>
>>
>
>And a whole lot of code to stay somewhat in sync with other codebases..
>
>
>
>
>
What is significant is not the affect of code size on modern
architectures, code size hurts developers as the code becomes very hard
to make deep changes to. It is very important to carefully design your
code to be easy to change. This is why we tossed the V3 code and wrote
V4 from scratch using plugins at every conceivable abstraction layer. I
think V4 will be our last rewrite from scratch because of our plugins,
and because of how easy we find the code to work on now.
I think XFS is going to stagnate over time based on the former
developers who have told me how hard it is to work on the code.
Christoph probably disagrees, and he knows the XFS code far better than
I.;-)
--
Hans
On Wed, 2004-03-03 at 09:03, Hans Reiser wrote:
> I
> think V4 will be our last rewrite from scratch because of our plugins,
> and because of how easy we find the code to work on now.
can we quote you on that 3 years from now ? ;-)
Arjan van de Ven wrote:
>On Wed, 2004-03-03 at 09:03, Hans Reiser wrote:
>
>
>> I
>>think V4 will be our last rewrite from scratch because of our plugins,
>>and because of how easy we find the code to work on now.
>>
>>
>
>can we quote you on that 3 years from now ? ;-)
>
>
Yes, I think so.
We are going to add a nice little optimization for compiles to Reiser4
as a result of thinking about compile benchmarks. We are going to sort
filenames (and their corresponding file bodies) whose penultimate
character is . by their last character first. It seems this is optimal,
and it is simple, and it is without any real world drawbacks. This is
easy for us because of our plugin design.
--
Hans
On Wed, 2004-03-03 at 07:00, Robin Rosenberg wrote:
> On Wednesday 03 March 2004 02:41, David Weinehall wrote:
> > On Tue, Mar 02, 2004 at 08:30:32PM -0500, Andrew Ho wrote:
> > > XFS is the best filesystem.
> >
> > Well it'd better be, it's 10 times the size of ext3, 5 times the size of
> > ReiserFS and 3.5 times the size of JFS.
> >
> > And people say size doesn't matter.
>
> Recoverability matters to me. The driver could be 10 megabyte and
> *I* would not care. XFS seems to stand no matter how rudely the OS
> is knocked down.
But XFS easily breaks down due to media defects. Once ago I used XFS,
but I lost all data on one of my volumes due to a bad block on my hard
disk. XFS was unable to recover from the error, and the XFS recovery
tools were unable to deal with the error.
On Wednesday 03 March 2004 10:43, Felipe Alfaro Solana wrote:
> But XFS easily breaks down due to media defects. Once ago I used XFS,
> but I lost all data on one of my volumes due to a bad block on my hard
> disk. XFS was unable to recover from the error, and the XFS recovery
> tools were unable to deal with the error.
What file systems work on defect media?
As for crashed disks I rarely bothered trying to "fix" them anymore. I save
what I can and restore what's backed up and recovery tools (other than
the undo-delete ones) usually destroy what's left, but that's not unique to
XFS. Depending on how good my backups are I sometimes try the recovery
tools just to see, but that has never helped so far.
-- robin
On Wed, 2004-03-03 at 10:43, Felipe Alfaro Solana wrote:
> On Wed, 2004-03-03 at 07:00, Robin Rosenberg wrote:
> > On Wednesday 03 March 2004 02:41, David Weinehall wrote:
> > > On Tue, Mar 02, 2004 at 08:30:32PM -0500, Andrew Ho wrote:
> > > > XFS is the best filesystem.
> > >
> > > Well it'd better be, it's 10 times the size of ext3, 5 times the size of
> > > ReiserFS and 3.5 times the size of JFS.
> > >
> > > And people say size doesn't matter.
> >
> > Recoverability matters to me. The driver could be 10 megabyte and
> > *I* would not care. XFS seems to stand no matter how rudely the OS
> > is knocked down.
>
> But XFS easily breaks down due to media defects. Once ago I used XFS,
> but I lost all data on one of my volumes due to a bad block on my hard
> disk. XFS was unable to recover from the error, and the XFS recovery
> tools were unable to deal with the error.
You lost all data? Or you just had to restore them from backup? If you
didn't have a backup it is your fault not XFS one :)
But even if you had no backup, why didn't you move your data (using dd
or something else) to another (without defects) drive, and run recovery
on new drive?
Regards,
Olaf
On Wed, 2004-03-03 at 10:59, Robin Rosenberg wrote:
> On Wednesday 03 March 2004 10:43, Felipe Alfaro Solana wrote:
> > But XFS easily breaks down due to media defects. Once ago I used XFS,
> > but I lost all data on one of my volumes due to a bad block on my hard
> > disk. XFS was unable to recover from the error, and the XFS recovery
> > tools were unable to deal with the error.
>
> What file systems work on defect media?
It's not a matter of working: it's a matter of recovering. A bad disk
block could potentially destroy a file or a directory, but shouldn't
make a filesystem not mountable nor recoverable.
> As for crashed disks I rarely bothered trying to "fix" them anymore. I save
> what I can and restore what's backed up and recovery tools (other than
> the undo-delete ones) usually destroy what's left, but that's not unique to
> XFS. Depending on how good my backups are I sometimes try the recovery
> tools just to see, but that has never helped so far.
The problem is that I couldn't save anything: the XFS volume refused to
mount and the XFS recovery tools refused to fix anything. It was just a
single disk bad block. For example in ext2/3 critical parts are
replicated several times over the volume, so there's minimal chance of
being unable to mount the volume and recover important files.
On Wednesday 03 March 2004 10:43, Felipe Alfaro Solana wrote:
> But XFS easily breaks down due to media defects. Once ago I used XFS,
> but I lost all data on one of my volumes due to a bad block on my hard
> disk. XFS was unable to recover from the error, and the XFS recovery
> tools were unable to deal with the error.
A single bad-block rendered the entire filesystem non-recoverable
for XFS? Sounds difficult to believe since there is redundancy such
as multiple copies of the superblock etc.
I can believe you lost *some* data, but "lost all my data"??? -- I
believe that you'd have to had had *considerably* more than
"a bad block" :-)
Mike
On Wed, 2004-03-03 at 11:13, Olaf Frączyk wrote:
> > > Recoverability matters to me. The driver could be 10 megabyte and
> > > *I* would not care. XFS seems to stand no matter how rudely the OS
> > > is knocked down.
> > But XFS easily breaks down due to media defects. Once ago I used XFS,
> > but I lost all data on one of my volumes due to a bad block on my hard
> > disk. XFS was unable to recover from the error, and the XFS recovery
> > tools were unable to deal with the error.
> You lost all data? Or you just had to restore them from backup? If you
> didn't have a backup it is your fault not XFS one :)
Well, it was a testing machine with no important data, so I could just
afford to lose everything, as it was the case.
> But even if you had no backup, why didn't you move your data (using dd
> or something else) to another (without defects) drive, and run recovery
> on new drive?
I tried, but it proved more difficult than expected, since the computer
was a laptop and I couldn't move the HDD to another computer. Using the
distro rescue CD was useless as it's kernel didn't have XFS support. All
in all, XFS recovery was a nightmare compared to ext3 recovery, for
example.
On Wed, 2004-03-03 at 11:24, Mike Gigante wrote:
> On Wednesday 03 March 2004 10:43, Felipe Alfaro Solana wrote:
> > But XFS easily breaks down due to media defects. Once ago I used XFS,
> > but I lost all data on one of my volumes due to a bad block on my hard
> > disk. XFS was unable to recover from the error, and the XFS recovery
> > tools were unable to deal with the error.
>
> A single bad-block rendered the entire filesystem non-recoverable
> for XFS? Sounds difficult to believe since there is redundancy such
> as multiple copies of the superblock etc.
You should believe it... It was a combination of a power failure and
some bad disk sectors. Maybe it was just a kernel bug, after all, as
this happened with 2.5 kernels: during kernel bootup, the kernel invoked
XFS recovery but it failed due to media errors.
> I can believe you lost *some* data, but "lost all my data"??? -- I
> believe that you'd have to had had *considerably* more than
> "a bad block" :-)
It was exactly one disk block, at least that's what the low-level HDD
diagnostic program for my IBM/Hitachi laptop drive told me. In fact, the
HDD diagnostic was able to recover the media defects.
That could have been one of those very improbable cases, but I lost the
entire volume. Neither the kernel nor XFS tools were able to recover the
XFS volume. However, I must say that I didn't try every single known way
of performing the recovery, but recovery with ext2/3 is pretty
straightforward.
As I said, it could have been a kernel bug, or maybe I simply didn't
understand the implications of recovery, but xfs_repair was totally
unable to fix the problem. It instructed me to use "dd" to move the
volume to a healthy disk and retry the operation, but it was not easy to
do that as I explained before.
Robin Rosenberg wrote:
>On Wednesday 03 March 2004 10:43, Felipe Alfaro Solana wrote:
>
>
>>But XFS easily breaks down due to media defects. Once ago I used XFS,
>>but I lost all data on one of my volumes due to a bad block on my hard
>>disk. XFS was unable to recover from the error, and the XFS recovery
>>tools were unable to deal with the error.
>>
>>
>
>What file systems work on defect media?
>
>As for crashed disks I rarely bothered trying to "fix" them anymore. I save
>what I can and restore what's backed up and recovery tools (other than
>the undo-delete ones) usually destroy what's left, but that's not unique to
>XFS. Depending on how good my backups are I sometimes try the recovery
>tools just to see, but that has never helped so far.
>
>-- robin
>
>
>
>
Never attempt to recover without first dd_rescue ing to a good hard
drive, and doing the recovery there on good hard drive.
--
Hans
Felipe Alfaro Solana wrote:
>
>
>As I said, it could have been a kernel bug, or maybe I simply didn't
>understand the implications of recovery, but xfs_repair was totally
>unable to fix the problem. It instructed me to use "dd" to move the
>volume to a healthy disk and retry the operation, but it was not easy to
>do that as I explained before.
>
>
>
>
>
I think that your expectation is unreasonable. XFS was designed for
machines where popping in a working hard drive was feasible. Making a
disk layout adaptable to any arbitrary block going bad is more work than
you might think, and for their intended market (not laptops) they did
the right thing.
You can buy cables that allow you to connect laptop drives to desktops.
--
Hans
Peter Nelson wrote:
> Hans Reiser wrote:
>
> >Are you sure your benchmark is large enough to not fit into memory,
> >particularly the first stages of it? It looks like not. reiser4 is
> >much faster on tasks like untarring enough files to not fit into ram,
> >but (despite your words) your results seem to show us as slower unless
> >I misread them....
>
> I'm pretty sure most of the benchmarking I am doing fits into ram,
> particularly because my system has 1GB of it, but I see this as
> realistic. When I download a bunch of debs (or rpms or the kernel) I'm
> probably going to install them directly with them still in the file
> cache. Same with rebuilding the kernel after working on it.
OK, that test is not very interesting for the FS gurus because it
doesn't stress the disk enough.
Anyway, I have some related questions concerning disk/fs performance:
o I see you are using and IDE disk with a large (8MB) write cache.
My understanding is that enabling write cache is a risky
thing for journaled file systems, so for a fair comparison you
would have to enable the write cache for ext2 and disable it
for all journaled file systems.
It would be nice if someone with more profound knowledge could comment
on this, but my understanding of the problem is:
- journaled filesystems can only work when they can enforce that
journal data is written to the platters at specifc times wrt
normal data writes
- IDE write caching makes the disk "lie" to the kernel, i.e. it says
"I've written the data" when it was only put in the cache
- now if a *power failure* keeps the disk from writing the cache
contents to the platter, the fs and journal are inconsistent
(a kernel crash would not cause this problem because the disk can
still write the cache contents to the platters)
- at next mount time the fs will read the journal from the disk
and try to use it to bring the fs into a consistent state;
however, since the journal on disk is not guaranteed to be up to date
this can *fail* (I have no idea what various fs implementations do
to handle this; I suspect they at least refuse to mount and require
you to manually run fsck. Or they don't notice and let you work
with a corrupt filesystem until they blow up.)
Right? Or is this just paranoia?
To me it looks like IDE write barrier support
(http://lwn.net/Articles/65353/) would be a way
to safely enable IDE write caches for journaled filesystems.
Has anyone done any benchmarks concerning write cache and journaling?
o And one totally different :-) question:
Has anyone benchmarked fs performance on PATA IDE disks vs.
otherwise comparable SCSI or SATA disks (I vaguely recall
having read that SATA has working TCQ, i.e. not broken by
design as with PATA)?
I have read a few times that SCSI disks perform much better
than IDE disks. The usual reason given is "SCSI disks are built for
servers, IDE for desktops". Is this all, or is it TCQ that
matters? Or is the Linux SCSI core better than the IDE core?
Johannes
On Wednesday 03 March 2004 11:19, Felipe Alfaro Solana wrote:
> The problem is that I couldn't save anything: the XFS volume refused to
> mount and the XFS recovery tools refused to fix anything. It was just a
> single disk bad block. For example in ext2/3 critical parts are
> replicated several times over the volume, so there's minimal chance of
> being unable to mount the volume and recover important files.
That is a misconception. What is being replicated multiple times in ext2 is
the superblock and the block group descriptors. But these are not really
needed for recovery (as long as they have default values, which is the case
in the vast majority of installations).
What is not being replicated is the block allocation bitmap, inode allocation
bitmap and the inodes themselves.
By running "mke2fs -S" on a ext2 file system, you will rewrite all
superblocks, all block group descriptors, and all allocation bitmaps, but
leave the inodes themselves intact. You can recreate the filesystem from that
with e2fsck, proving that the information from the replicated parts of the
file systems is not really necessary. All that e2fsck needs to recover the
system is the information from the inodes. If they are damaged (and they are
not replicated), the files having inodes in damaged blocks cannot be
recovered.
Kristian
Quoting Felipe Alfaro Solana <[email protected]>:
> But XFS easily breaks down due to media defects. Once ago I used
> XFS,
> but I lost all data on one of my volumes due to a bad block on my
> hard
> disk. XFS was unable to recover from the error, and the XFS recovery
> tools were unable to deal with the error.
1. How long ago is "Once ago"? Did you report that to the xfs
developers?
2. Speaking for servers, we live in a RAID and/or SAN-world. The media
error issue is a non-issue.
Just my $0.02,
Pascal
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Felipe Alfaro Solana wrote:
| The problem is that I couldn't save anything: the XFS volume refused to
| mount and the XFS recovery tools refused to fix anything. It was just a
| single disk bad block. For example in ext2/3 critical parts are
| replicated several times over the volume, so there's minimal chance of
| being unable to mount the volume and recover important files.
just my two cents here:
if you have an XFS volume, then you mostly do more than just storing
your baby photos, so you should have a raid below (software or hardware)
and then you don't worry about bad blocks, because a) you have a raid
(probably with a hot spare drive) and b) a daly (or more often) backup.
as for me I stopped using raiser, jfs or xfs at home. why? too many
negative experience. bad blocks (xfs total b0rked), raiserfs (similar
things) and I even didn't try jfs. with ext3 it works very well. yes I
do have a crappy board with a sucky via chipset and some super super old
hds, but with ext3 I had NO single problem since 6 months (heavily
knocking on wood here).
all those high end journaling file systems are no good for home systems
in my opinion
but again, those are just my little two cents here
- --
Clemens Schwaighofer - IT Engineer & System Administration
==========================================================
Tequila Japan, 6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN
Tel: +81-(0)3-3545-7703 Fax: +81-(0)3-3545-7343
http://www.tequila.jp
==========================================================
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQFAR98FjBz/yQjBxz8RAjbtAJ9gyiy3QNak2NgsFyWGm355wshhMgCgz/5E
r9ARfA4kajBAUZCLOFBi0gw=
=InvR
-----END PGP SIGNATURE-----
On Wed, 2004-03-03 at 18:41, Johannes Stezenbach wrote:
> Peter Nelson wrote:
> > Hans Reiser wrote:
> >
> > >Are you sure your benchmark is large enough to not fit into memory,
> > >particularly the first stages of it? It looks like not. reiser4 is
> > >much faster on tasks like untarring enough files to not fit into ram,
> > >but (despite your words) your results seem to show us as slower unless
> > >I misread them....
> >
> > I'm pretty sure most of the benchmarking I am doing fits into ram,
> > particularly because my system has 1GB of it, but I see this as
> > realistic. When I download a bunch of debs (or rpms or the kernel) I'm
> > probably going to install them directly with them still in the file
> > cache. Same with rebuilding the kernel after working on it.
>
> OK, that test is not very interesting for the FS gurus because it
> doesn't stress the disk enough.
>
> Anyway, I have some related questions concerning disk/fs performance:
>
> o I see you are using and IDE disk with a large (8MB) write cache.
>
> My understanding is that enabling write cache is a risky
> thing for journaled file systems, so for a fair comparison you
> would have to enable the write cache for ext2 and disable it
> for all journaled file systems.
>
> It would be nice if someone with more profound knowledge could comment
> on this, but my understanding of the problem is:
>
Jens just sent me an updated version of his IDE barrier code, and I'm
adding support for reiserfs and ext3 to it this weekend. It's fairly
trivial to add support for each FS, I just don't know the critical
sections of the others as well.
The SUSE 2.4 kernels have had various forms of the patch, it took us a
while to get things right. It does impact performance slightly, since
we are forcing cache flushes that otherwise would not have been done.
The common workloads don't slow down with the patch, fsync heavy
workloads typically lose around 10%.
-chris
Hi!
> It would be nice if someone with more profound knowledge could comment
> on this, but my understanding of the problem is:
>
> - journaled filesystems can only work when they can enforce that
> journal data is written to the platters at specifc times wrt
> normal data writes
> - IDE write caching makes the disk "lie" to the kernel, i.e. it says
> "I've written the data" when it was only put in the cache
> - now if a *power failure* keeps the disk from writing the cache
> contents to the platter, the fs and journal are inconsistent
> (a kernel crash would not cause this problem because the disk can
> still write the cache contents to the platters)
> - at next mount time the fs will read the journal from the disk
> and try to use it to bring the fs into a consistent state;
> however, since the journal on disk is not guaranteed to be up to date
> this can *fail* (I have no idea what various fs implementations do
> to handle this; I suspect they at least refuse to mount and require
> you to manually run fsck. Or they don't notice and let you work
> with a corrupt filesystem until they blow up.)
>
> Right? Or is this just paranoia?
Twice a year I fsck my reiser drives, and yes there's some corruption there.
So you are right, and its not paranoia.
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms