2001-10-03 12:00:30

by sebastien.cabaniols

[permalink] [raw]
Subject: [POT] Which journalised filesystem uses Linus Torvalds ?

Hello lkml,

With the availability of XFS,JFS,ext3 and ReiserFS I am a
little
lost and I don't know which one I should use for entreprise
class
servers.

In terms of intergration into the kernel, functionnalities,
stability
and performance which one is the best for entreprise class
servers

I guess the begining of the answer is: it depends... on what
you are doing

So, what do you think if

I want a database server
or
a supercomputer (HPC use)
or
a Linux KDE/GNOME desktop

Thanks for your help, links and experience.


Sebastien CABANIOLS



"Ce message vous est envoy? par laposte.net - web : http://www.laposte.net/ minitel : 3615 LAPOSTENET (0,84 F TTC la minute)/ t?l?phone : 08 92 68 13 50 (2,21 F TTC la minute)"



2001-10-03 12:39:57

by Rik van Riel

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

On Wed, 3 Oct 2001, sebastien.cabaniols wrote:

> With the availability of XFS,JFS,ext3 and ReiserFS I am a little lost
> and I don't know which one I should use for entreprise class servers.

Personally I like ext3 a lot. I've been using it for almost a
year now and it has never given me trouble. In the theoretical
case where it would give me trouble, I'd have the very well
tested e2fsck utility to rescue me (ext2 and ext3 have the same
on-disk layout).

regards,

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ (volunteers needed)

http://www.surriel.com/ http://distro.conectiva.com/

2001-10-03 12:54:17

by Dave Jones

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

On Wed, 3 Oct 2001, Rik van Riel wrote:

> Personally I like ext3 a lot. I've been using it for almost a
> year now and it has never given me trouble.

I've similar experiences with ext3, except for one bad instance
recently when I put it on my laptop. Lots of asserts were triggered,
and on reboot it couldn't find the journal, the superblock,
or the backup superblocks. I spent a few hours trying to get data
back, and eventually gave up and reformatted as ext2.

Alan mentioned this was something to do with the IBM hard disk
having strange write-cache properties that confuse ext3.
I'm not sure if this has been fixed or not yet, but its enough
to make me think twice about trying it on the vaio for a while.

regards,

Dave.

--
| Dave Jones. http://www.suse.de/~davej
| SuSE Labs

2001-10-03 13:00:27

by Billy Harvey

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

On Wed, 2001-10-03 at 08:54, Dave Jones wrote:

> I've similar experiences with ext3, except for one bad instance
> recently when I put it on my laptop. Lots of asserts were triggered,
> and on reboot it couldn't find the journal, the superblock,
> or the backup superblocks. I spent a few hours trying to get data
> back, and eventually gave up and reformatted as ext2.
>
> Alan mentioned this was something to do with the IBM hard disk
> having strange write-cache properties that confuse ext3.
> I'm not sure if this has been fixed or not yet, but its enough
> to make me think twice about trying it on the vaio for a while.
>
> regards,
>
> Dave.

I've been using ext3 on my ThinkPad (A20P) for about a month now with
nary the slightest problem. I've even smoke tested it by shutting it
down in the middle of disk writes and it worked fine.

Billy



2001-10-03 13:02:37

by Ragnar Kjørstad

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

On Wed, Oct 03, 2001 at 02:54:17PM +0200, Dave Jones wrote:
> Alan mentioned this was something to do with the IBM hard disk
> having strange write-cache properties that confuse ext3.
> I'm not sure if this has been fixed or not yet, but its enough
> to make me think twice about trying it on the vaio for a while.

If a disk is doing write-back caching, it's likely to break all
journaling filesystem and anything else that relies on write ordering.


--
Ragnar Kj?rstad
Big Storage

2001-10-03 13:24:10

by Dave Jones

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

On Wed, 3 Oct 2001, Ragnar Kj?rstad wrote:

> If a disk is doing write-back caching, it's likely to break all
> journaling filesystem and anything else that relies on write ordering.

Yup, I know this *now* :-)
My point is that I had no idea the drive was doing write-caching.

hdparm only offers an option to set it to on/off, not to query it.
Just disabling it in a boot up script *might* be enough to make this
safe again, but I've not looked at the hdparm & IDE code, so this
is just a theory.

regards,

Dave.

--
| Dave Jones. http://www.suse.de/~davej
| SuSE Labs

2001-10-03 14:31:50

by Dave Cinege

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem uses Linus Torvalds ?

On Wednesday 03 October 2001 8:00, sebastien.cabaniols wrote:
> Hello lkml,
>
> With the availability of XFS,JFS,ext3 and ReiserFS I am a
> little
> lost and I don't know which one I should use for entreprise
> class
> servers.

I use Reiserfs on everything now, including a 13 drive Fiber Channel
SAN with 3 hosts and multiple levels of Software RAID between them.

It is as fast as ext2, and in some case much faster. (IE rm 10K+ files in ~2
seconds) FYI I Bonnie 70MB/s on 6 7200rpm drives in RAID 0. (64k blocks)

Keeping up with the 'best' reiserfs patch set can be a little bit of a
chore. (However it looks like we're coming to the end of that with 2.4.10)

Never used ext3. From what I did read about it, it didn't excite me.
The others I've yet to see a mature enough version to actually use, and
considering Reiserfs, don't see a reason to try them.

Dave

--
The time is now 22:19 (Totalitarian) - http://www.ccops.org/clock.html

2001-10-03 14:48:53

by Sean Hunter

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem uses Linus Torvalds ?

I use ext3 on a couple of servers and a couple of laptops. I think which fs is
best for you will depend enormously on the intended use of the machines and
your own expectations. A mail and dns server that I operate running ext3 has
been very happy since conversion, and has definitely benefitted.

I feel I get the benefit of no more fscks and fast operations on "-o
sync"-mounted filesystems without (IMO) exposing the box to immature code that
you might see in less conservative "experimental" filesystem options.

I personally feel more comfortable with the stability and robustness criteria
of the ext3 developers than some others. If you want a very fast filesystem or
one that handles very large numbers of files very well, your choice may well be
different from mine.

I quite like my filesystems to be boring. :)

Sean

On Wed, Oct 03, 2001 at 10:33:17AM -0400, Dave Cinege wrote:
> On Wednesday 03 October 2001 8:00, sebastien.cabaniols wrote:
> > Hello lkml,
> >
> > With the availability of XFS,JFS,ext3 and ReiserFS I am a
> > little
> > lost and I don't know which one I should use for entreprise
> > class
> > servers.
>
> I use Reiserfs on everything now, including a 13 drive Fiber Channel
> SAN with 3 hosts and multiple levels of Software RAID between them.
>
> It is as fast as ext2, and in some case much faster. (IE rm 10K+ files in ~2
> seconds) FYI I Bonnie 70MB/s on 6 7200rpm drives in RAID 0. (64k blocks)
>
> Keeping up with the 'best' reiserfs patch set can be a little bit of a
> chore. (However it looks like we're coming to the end of that with 2.4.10)
>
> Never used ext3. From what I did read about it, it didn't excite me.
> The others I've yet to see a mature enough version to actually use, and
> considering Reiserfs, don't see a reason to try them.
>
> Dave
>
> --
> The time is now 22:19 (Totalitarian) - http://www.ccops.org/clock.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2001-10-03 15:21:03

by Roy Murphy

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem uses Linus Torvalds ?

'Twas brillig when Sebastien Cabaniols scrobe:
>With the availability of XFS,JFS,ext3 and ReiserFS I am a
>little lost and I don't know which one I should use for entreprise
>class servers.

Well, the Linus Torvalds filesystem (ltfs for short) is a highly developed,
version control filesystem, but it still has a few shortcomings.

When saving a file to ltfs, it sometimes suggests that you should do it a different
way. The ltfs is very particular about how things should be done.

Often, when saving a file, it is dropped without any notification. Experienced
users of the ltfs follow the mantra "submit early and submit often". They repeatedly
resave their files hoping that one of them will be accepted into a "version"
that does get saved to disk.

Several forks of the ltfs (i.e the Alan Cox filesystem -- acfs and the Anread
Arcangeli filesystem -- aafs) are a little better about saving files, but each
of them has its own idea about which files are worthy of being saved.

While these advanced filesystems hold great promise for the future, they should
probably not be used in a production server due to these failings. In fact,
one user of the acfs, Telsa Cox, reports that the acfs often dosn't work at
all before noon local time.

YMMV.

2001-10-03 15:34:53

by André Dahlqvist

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

Dave Jones <[email protected]> wrote:

> Alan mentioned this was something to do with the IBM hard disk
> having strange write-cache properties that confuse ext3.

Which IBM harddrive(s) does this? How can one check if it does?
--

Andr? Dahlqvist <[email protected]>

2001-10-03 16:54:16

by Fabio Massimo Di Nitto

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem uses Linus Torvalds ?

Hi Sebastien,
I had the possiblity to poke around with ext3 and reiserfs,
but endup converting all my machine to ext3 for various reasons.

First of all we ext3 you don't need to re-format your partitions, so no
mkreiserfs or mke3fs
but a simple tune2fs. No need to backup all your data, rebuild on top of
the new fs and
reinstall and....

When I was testing reiserfs (it was atleast a couple of months ago) I
got very bad performance
but I know that they have improved performance within the 2.4.10 release
of the kernel
that unfortunatly seems having many other problems.

Fabbione

"sebastien.cabaniols" wrote:
>
> Hello lkml,
>
> With the availability of XFS,JFS,ext3 and ReiserFS I am a
> little
> lost and I don't know which one I should use for entreprise
> class
> servers.

--
Debian GNU/Linux Unstable Kernel 2.4.9
fabbione on irc.atdot.it #coredump #kchat | [email protected]

2001-10-03 17:03:16

by Matthias Andree

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

On Wed, 03 Oct 2001, Dave Jones wrote:

> Alan mentioned this was something to do with the IBM hard disk
> having strange write-cache properties that confuse ext3.

hdparm -W0 /dev/hda is your friend.

2001-10-03 17:37:13

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

Hi,

On Wed, Oct 03, 2001 at 02:54:17PM +0200, Dave Jones wrote:

> > Personally I like ext3 a lot. I've been using it for almost a
> > year now and it has never given me trouble.
>
> I've similar experiences with ext3, except for one bad instance
> recently when I put it on my laptop. Lots of asserts were triggered,
> and on reboot it couldn't find the journal, the superblock,
> or the backup superblocks. I spent a few hours trying to get data
> back, and eventually gave up and reformatted as ext2.

Which laptop? I've seen several reports of disk corruption with
recent kernels on certain laptops.

Cheers,
Stephen

2001-10-03 17:39:23

by Sujal Shah

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

On Wed, 2001-10-03 at 13:03, Matthias Andree wrote:
> On Wed, 03 Oct 2001, Dave Jones wrote:
>
> > Alan mentioned this was something to do with the IBM hard disk
> > having strange write-cache properties that confuse ext3.
>
> hdparm -W0 /dev/hda is your friend.

Dumb question: when would you want it to be -W1?

I mean, I can imagine maybe media recording or something where you might
*really* want the performance increase... but generally speaking, I
want my data to be there in case things blow up.

does anyone know what the performance increase is?

Sujal

> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
---- Sujal Shah ---- PSC Labs (Progress Software) ----

Now Playing: Ministry Of Sound - York - The Awakening


Attachments:
(No filename) (232.00 B)

2001-10-03 17:40:53

by Dave Jones

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

On Wed, 3 Oct 2001, Stephen C. Tweedie wrote:

> Which laptop? I've seen several reports of disk corruption with
> recent kernels on certain laptops.

Sony Vaio Z600TEK
Hard disk is an IBM-DJSA-220

regards,

Dave.

--
| Dave Jones. http://www.suse.de/~davej
| SuSE Labs

2001-10-03 17:46:55

by Xavier Bestel

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

le mer 03-10-2001 at 19:03 Matthias Andree a ?crit :
> On Wed, 03 Oct 2001, Dave Jones wrote:
>
> > Alan mentioned this was something to do with the IBM hard disk
> > having strange write-cache properties that confuse ext3.
>
> hdparm -W0 /dev/hda is your friend.

Unfortunately I think IDE drives don't honor this setting - write-cache
is always on.

Xav

2001-10-03 17:52:15

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem uses Linus Torvalds ?

In article <GKMPCZ$IZh2dKhbICnp0WDXKHB6iO7OKoHwqOxmqj9XfriOC7PjHiIDA6bHi6xrImT@laposte.net> you wrote:
> With the availability of XFS,JFS,ext3 and ReiserFS I am a
> little
> lost and I don't know which one I should use for entreprise
> class
> servers.

In former versions of ReiserFS you had a weak support for fschk. And since a
lot of bugs and heavy load triggered this problem regularly, it was not
awise idea to use Reiser. Things are reported to have increased, but I do
not have any first hand experineces since then.

Personally I think xfs is a very mature Journaling File System. A bit
annoying is, that the CVS tree is hard to track from SGI. I have reports
from heavyly loaded servers that it performs very well (i.e. newsspool).

ext3 is the alternative, cause of its compatibility to ext2. But I am not
sure, if this is good or bad, since it has not increaesed some of the
performance issues of the ext2 structure, afaik.

I have no experience with JFS, IBM seems to missed a opportunity to have
large community support.

GFS as a general purpose filesystem may need some more tweaking, but it's
cluster properties are great for enterprise systems.

Greetings
Bernd

2001-10-03 17:51:44

by Andrew Morton

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

Ragnar Kj?rstad wrote:
>
> On Wed, Oct 03, 2001 at 02:54:17PM +0200, Dave Jones wrote:
> > Alan mentioned this was something to do with the IBM hard disk
> > having strange write-cache properties that confuse ext3.
> > I'm not sure if this has been fixed or not yet, but its enough
> > to make me think twice about trying it on the vaio for a while.
>
> If a disk is doing write-back caching, it's likely to break all
> journaling filesystem and anything else that relies on write ordering.

In theory, disk write caching can defeat ext3's ordering requirements.

However I have never observed this in practice, nor have I seen
any report of it happening.

Think about it: ext3 writes a chunk of blocks, waits on them,
then writes a single commit block and waits on that. The "chunk"
of blocks are very probably contiguous on disk. The commit block
will most probably be at the very next LBA afer the "chunk".

The only way in which the drive can cause corruption is for it to write the
commit block before the "chunk", and for you to lose power [*] within
that time window. Unless some serious block remapping has occurred
at the physical level, I really can't see any reason why the disk
should choose to flush those blocks in the wrong order. Nor do I see why
the disk should leave a large time window between flushing the commit
block and then flushing the "chunk".

So.... I wouldn't be too fussed about it, personally.




[*] I think it has to be a power outage - a kernel crash won't be
enough - the disk should still flush its write cache. I'm not sure
if hitting the front-panel reset button would prevent a disk from
flushing its cache?

-

2001-10-03 17:53:44

by Matthias Andree

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

On Wed, 03 Oct 2001, Xavier Bestel wrote:

> le mer 03-10-2001 at 19:03 Matthias Andree a ?crit :
> > On Wed, 03 Oct 2001, Dave Jones wrote:
> >
> > > Alan mentioned this was something to do with the IBM hard disk
> > > having strange write-cache properties that confuse ext3.
> >
> > hdparm -W0 /dev/hda is your friend.
>
> Unfortunately I think IDE drives don't honor this setting - write-cache
> is always on.

It's meant for IDE drives, and the write cache has been in the feature
register for ages. Just try it, you'll notice if it fails.

BTW, it works as expected on my DJNA, DPTA, DTLA drives, and I know you
can turn the cache of DARA drives off as well.

--
Matthias Andree

"Those who give up essential liberties for temporary safety deserve
neither liberty nor safety." - Benjamin Franklin

2001-10-03 18:01:04

by Luigi Genoni

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem uses Linus Torvalds ?

I would bet that Linus is using ext2 :).

apart of this, everyone will give you difefrent suggestions.

basically ext3 can journal data, but this way is slower, and is a simple
ext2 with journal.

reiserFS is really interesting, is the most space effective, thanx to
B*Tree and the advanced hash techniques, but
actually journals just meta-data. The real point is that
reiserFS does a tree traversal every time it writes 4k block, and the it
puts one pointer at time inside of the tree. So the tree is balanced every
4k write, That is bad for very large files.

jfs, should be quite stable. is a very interesting technology, and
i know it very well from AIX (but the linux one comes from OS2).
it's very solid, quite fast, can journal also data (??).
The way jfs manages free data block group is very smart, altought it is
not an extent based FS (but leaf node are piece of bitmap instead of
extent).

xfs, I dislike the way they are isering a kind of double VFS, but i
understand that Irix buffer cache was developed with some xfs features in
mind, and so they need this pagebuf module, but i dislike it. I also
dislike the concept of per-group quota, but this is just my taste.
Anyway, I have to admit that on very big file xfs is very efficient.
On Irix 6.4 i found it to be a little slow with small files.

That is just my opinion, I am wayting for reiserFS 4.

On Wed, 3 Oct 2001, sebastien.cabaniols wrote:

> Hello lkml,
>
> With the availability of XFS,JFS,ext3 and ReiserFS I am a
> little
> lost and I don't know which one I should use for entreprise
> class
> servers.
>
> In terms of intergration into the kernel, functionnalities,
> stability
> and performance which one is the best for entreprise class
> servers
>
> I guess the begining of the answer is: it depends... on what
> you are doing
>
> So, what do you think if
>
> I want a database server
reiserFS
> or
> a supercomputer (HPC use)
jfs / ext3
> or
> a Linux KDE/GNOME desktop
ext2 :)
>
Luigi

2001-10-03 19:13:21

by Erik Mouw

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

On Wed, Oct 03, 2001 at 01:40:36PM -0400, Sujal Shah wrote:
> On Wed, 2001-10-03 at 13:03, Matthias Andree wrote:
> > hdparm -W0 /dev/hda is your friend.
>
> Dumb question: when would you want it to be -W1?
>
> I mean, I can imagine maybe media recording or something where you might
> *really* want the performance increase... but generally speaking, I
> want my data to be there in case things blow up.

I've used it in the past for an SGI Octane that was (and still is) used
to do real time TV studio quality (CCIR-601 YUV422 data, about 20MB/s)
record/playback to four striped SCSI disks.

> does anyone know what the performance increase is?

It made the difference between "doesn't cut it" and "enough headroom".
IIRC it was something like 18MB/s without and 30MB/s with write
caching, but don't quote me on the exact numbers.


Erik

--
J.A.K. (Erik) Mouw, Information and Communication Theory Group, Department
of Electrical Engineering, Faculty of Information Technology and Systems,
Delft University of Technology, PO BOX 5031, 2600 GA Delft, The Netherlands
Phone: +31-15-2783635 Fax: +31-15-2781843 Email: [email protected]
WWW: http://www-ict.its.tudelft.nl/~erik/

2001-10-03 20:52:41

by Mark Hahn

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

> IIRC it was something like 18MB/s without and 30MB/s with write

for a current Maxtor 60G 5400 RPM UDMA100 disk, 2.4.10, ext2,
I just measured: 7 MBps with -W0, vs 27 MB/s with -W1.

2001-10-03 22:31:37

by Matt Bernstein

[permalink] [raw]
Subject: [OT] Re: Which journalised filesystem uses Linus Torvalds ?

At 11:21 -0500 Roy Murphy wrote:

>'Twas brillig when Sebastien Cabaniols scrobe:
>>With the availability of XFS,JFS,ext3 and ReiserFS I am a
>>little lost and I don't know which one I should use for entreprise
>>class servers.
>
>Well, the Linus Torvalds filesystem (ltfs for short) is a highly developed,
>version control filesystem, but it still has a few shortcomings.

Wrong! If you're Linus you're allowed to declare that backups are for wimps,
and that if you're code is worth the bytes it was written with, it will be
mirrored all over the world. So there's no journal except google.com :)

2001-10-04 05:42:49

by Andrew Ip

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem uses Linus Torvalds ?

For those who are interested on trying out journalling filesystem. I have
made a kernel rpm which supports XFS, JFS, Ext3 and ReiserFS. You can get it
at ftp://ftp.cwlinux.com/pub/downloads/journaling_fs/kernel. Comments are
welcome.

-Andrew

On Wed, Oct 03, 2001 at 02:00:35PM +0200, sebastien.cabaniols wrote:
> Hello lkml,
>
> With the availability of XFS,JFS,ext3 and ReiserFS I am a
> little
> lost and I don't know which one I should use for entreprise
> class
> servers.
>
> In terms of intergration into the kernel, functionnalities,
> stability
> and performance which one is the best for entreprise class
> servers
>
> I guess the begining of the answer is: it depends... on what
> you are doing
>
> So, what do you think if
>
> I want a database server
> or
> a supercomputer (HPC use)
> or
> a Linux KDE/GNOME desktop
>
> Thanks for your help, links and experience.
>
>
> Sebastien CABANIOLS
>
>
>
> "Ce message vous est envoy? par laposte.net - web : http://www.laposte.net/ minitel : 3615 LAPOSTENET (0,84 F TTC la minute)/ t?l?phone : 08 92 68 13 50 (2,21 F TTC la minute)"
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Andrew Ip
Email: [email protected]
Tel: (852) 2542 2046
Fax: (852) 2542 2046
Mobile: (852) 9201 9866

Cwlinux Limited
18B Tower 1 Tern Centre,
237 Queen's Road Central,
Hong Kong.


Attachments:
(No filename) (1.52 kB)
(No filename) (232.00 B)
Download all attachments

2001-10-04 07:32:53

by Constantin Loizides

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem uses Linus Torvalds ?

Hallo Sebastien,


> In terms of intergration into the kernel, functionnalities,
> stability and performance which one is the best for entreprise class
> servers

You might want to take a look at

http://www.informatik.uni-frankfurt.de/~loizides/reiserfs/

where I try to answer the one of your criteria, namely performance,
and how performance behaves over time, eg. when the file system
is heavily used...


The focus is on ReiserFS compared to Ext2, though I plan to set up
some tests with XFS and JFS soon (to get the results before end
of october)



Constantin

2001-10-04 07:55:16

by David Woodhouse

[permalink] [raw]
Subject: Re: [OT] Re: Which journalised filesystem uses Linus Torvalds ?


[email protected] said:
> Wrong! If you're Linus you're allowed to declare that backups are for
> wimps, and that if you're code is worth the bytes it was written with,
> it will be mirrored all over the world. So there's no journal except
> google.com :)

Nah. Linus gets to play with cute embedded toys. It'd have to be JFFS2 :)

--
dwmw2


2001-10-04 16:29:56

by Nathan Straz

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem uses Linus Torvalds ?

On Wed, Oct 03, 2001 at 02:00:35PM +0200, sebastien.cabaniols wrote:
> With the availability of XFS,JFS,ext3 and ReiserFS I am a little lost
> and I don't know which one I should use for entreprise class servers.

I'd recommend reading:

http://www.mandrakeforum.com/article.php?sid=1212&lang=en

It's an article in the Mandrake forums concerning ext3, JFS, XFS, and
ReiserFS, all of which are in Mandrake 8.1.


> In terms of intergration into the kernel, functionnalities, stability
> and performance which one is the best for entreprise class servers

For enterprise stuff, I would recommend XFS based on the tools it
provides. XFS has a complete set of tools for dumping XFS, repairing a
broken file system (should it every break), and debugging should you
find something wrong with it.

--
Nate Straz [email protected]
sgi, inc http://www.sgi.com/
Linux Test Project http://ltp.sf.net/

2001-10-04 17:21:13

by Hristo Grigorov

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem uses Linus Torvalds ?

Heh..

Choosing the best FS is like choosing the best Linux distribution or choosing
the best women for the rest of your life, as you like.. :)

Each FS implementation has its strengths and weaknesses. I read that article
and come to the opinion that every peace of software is more or less PnP
(plug-n-pray). You know, every code has bugs and the worst of them are never
found :)

Hristo.

On Thursday 04 October 2001 19:30, Nathan Straz wrote:
> On Wed, Oct 03, 2001 at 02:00:35PM +0200, sebastien.cabaniols wrote:
> > With the availability of XFS,JFS,ext3 and ReiserFS I am a little lost
> > and I don't know which one I should use for entreprise class servers.
>
> I'd recommend reading:
>
> http://www.mandrakeforum.com/article.php?sid=1212&lang=en
>
> It's an article in the Mandrake forums concerning ext3, JFS, XFS, and
> ReiserFS, all of which are in Mandrake 8.1.
>
> > In terms of intergration into the kernel, functionnalities, stability
> > and performance which one is the best for entreprise class servers
>
> For enterprise stuff, I would recommend XFS based on the tools it
> provides. XFS has a complete set of tools for dumping XFS, repairing a
> broken file system (should it every break), and debugging should you
> find something wrong with it.


2001-10-04 21:04:42

by Alan

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

> Which laptop? I've seen several reports of disk corruption with
> recent kernels on certain laptops.

20Gbyte IBM 2.5" ones I suspect ? If so then we aren't the only OS

Alan

2001-10-04 21:19:44

by Alan

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

> > Alan mentioned this was something to do with the IBM hard disk
> > having strange write-cache properties that confuse ext3.
>
> Which IBM harddrive(s) does this? How can one check if it does?

Its not specifically IBM, there are two sets of things to watch out for

- Cache flush as a nop/unimplemented. This is legal in all but the
most recent ATA specification. The spec has been tightened so that
problem will go in time

- Some IBM laptop drives appeared to fail to write back the cache on
machine shutdown/suspend etc. The exact rights/wrongs/details on
that one haven't been pinned down because the folks concerned
swapped a couple of drives for different ones, saw the problem
vanish and being a large organisation had the supplier replace the
other fifty odd.

Alan

2001-10-04 21:51:05

by Alessandro Suardi

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

Alan Cox wrote:
>
> > > Alan mentioned this was something to do with the IBM hard disk
> > > having strange write-cache properties that confuse ext3.
> >
> > Which IBM harddrive(s) does this? How can one check if it does?
>
> Its not specifically IBM, there are two sets of things to watch out for
>
> - Cache flush as a nop/unimplemented. This is legal in all but the
> most recent ATA specification. The spec has been tightened so that
> problem will go in time
>
> - Some IBM laptop drives appeared to fail to write back the cache on
> machine shutdown/suspend etc. The exact rights/wrongs/details on
> that one haven't been pinned down because the folks concerned
> swapped a couple of drives for different ones, saw the problem
> vanish and being a large organisation had the supplier replace the
> other fifty odd.

[asuardi@dolphin asuardi]$ dmesg | grep hda
ide0: BM-DMA at 0x0860-0x0867, BIOS settings: hda:DMA, hdb:pio
hda: IBM-DJSA-220, ATA DISK drive
hda: 39070080 sectors (20004 MB) w/1874KiB Cache, CHS=2432/255/63, UDMA(33)
hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 >

This one has been used in the last 4 months without any issue
doing lots of shutdowns, suspends, kernel rebuilds etc. ;)

--alessandro

"this is no time to get cute, it's a mad dog's promenade
so walk tall, or baby don't walk at all"
(Bruce Springsteen, 'New York City Serenade')

2001-10-04 22:10:15

by Alan

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

> I've been using ext3 on my ThinkPad (A20P) for about a month now with
> nary the slightest problem. I've even smoke tested it by shutting it
> down in the middle of disk writes and it worked fine.

I have no recorded case of an ext3 crash that someone showed was even
likely to have been disk caching stuff.

2001-10-04 22:14:37

by Dave Jones

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

On Thu, 4 Oct 2001, Alan Cox wrote:

> I have no recorded case of an ext3 crash that someone showed was even
> likely to have been disk caching stuff.

So the case I mentioned to you about 2 months ago was some 'quirk'
of the drive rather than its write cache ? (Yup, 20gb IBM).
I'm sure you mentioned write cache in relation to that, but I could
be wrong.

regards,

Dave.

--
| Dave Jones. http://www.suse.de/~davej
| SuSE Labs

2001-10-04 22:18:57

by Alan

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

> So the case I mentioned to you about 2 months ago was some 'quirk'
> of the drive rather than its write cache ? (Yup, 20gb IBM).
> I'm sure you mentioned write cache in relation to that, but I could
> be wrong.

Write cache yes - not apparently writing it out always on suspend/power off

2001-10-04 22:50:11

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

In article <Pine.LNX.4.10.10110031648250.20425-100000@coffee.psychology.mcmaster.ca> you wrote:
> for a current Maxtor 60G 5400 RPM UDMA100 disk, 2.4.10, ext2,
> I just measured: 7 MBps with -W0, vs 27 MB/s with -W1.

how much data do you have written to get those numbers? The drive cache is
is most often so small it only can cache a few blocks.

Greetings
Bernd

2001-10-04 23:28:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

In article <[email protected]>,
Bernd Eckenfels <[email protected]> wrote:
>In article <Pine.LNX.4.10.10110031648250.20425-100000@coffee.psychology.mcmaster.ca> you wrote:
>> for a current Maxtor 60G 5400 RPM UDMA100 disk, 2.4.10, ext2,
>> I just measured: 7 MBps with -W0, vs 27 MB/s with -W1.
>
>how much data do you have written to get those numbers? The drive cache is
>is most often so small it only can cache a few blocks.

Actually, that's not the main win of writeback caching.

Themain win is being able to write a whole track in one go, starting at
the _right_ position (where "right" is defined as "where the head
happens to be when it can start writing). Along with making up for the
occasional seek for meta-data, and other "smooth out the writes so that
the platter keeps gettint written to all the time" things.

Which can be a HUGE win, and which is why I personally think that any
disk that doesn't do write-back caching is a waste of good money.

We (as in Linux) should make sure that we explicitly tell the disk when
we need it to flush its disk buffers. We don't do that right, and
because of _our_ problems some people claim that writeback caching is
evil and bad.

Linus

2001-10-04 23:55:12

by Rik van Riel

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

On Thu, 4 Oct 2001, Linus Torvalds wrote:

> We (as in Linux) should make sure that we explicitly tell the disk when
> we need it to flush its disk buffers. We don't do that right, and
> because of _our_ problems some people claim that writeback caching is
> evil and bad.

Does this even work right for IDE ?

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ (volunteers needed)

http://www.surriel.com/ http://distro.conectiva.com/

2001-10-05 01:04:59

by Mike Fedyk

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

On Thu, Oct 04, 2001 at 11:27:45PM +0000, Linus Torvalds wrote:
> We (as in Linux) should make sure that we explicitly tell the disk when
> we need it to flush its disk buffers. We don't do that right, and
> because of _our_ problems some people claim that writeback caching is
> evil and bad.
>

Actually, their claim is that most drives won't even *honor* the request to
sync to oxide.

Once the number of drives that support this goes up, then write cache is
safe to use...

Personally, I have a script that enables write cache, and sets the drive to
its highest dma level on boot...

Mike

2001-10-05 10:28:30

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

Hi,

On Thu, Oct 04, 2001 at 10:09:38PM +0100, Alan Cox wrote:
> > Which laptop? I've seen several reports of disk corruption with
> > recent kernels on certain laptops.
>
> 20Gbyte IBM 2.5" ones I suspect ? If so then we aren't the only OS

Yes, it was.

--Stephen

2001-10-05 14:52:46

by Alan

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

> > We (as in Linux) should make sure that we explicitly tell the disk when
> > we need it to flush its disk buffers. We don't do that right, and
> > because of _our_ problems some people claim that writeback caching is
> > evil and bad.
>
> Does this even work right for IDE ?

Current IDE drives it may be a NOP. Worse than that it would totally ruin
high end raid performance. We need to pass write barriers. A good i2o card
might have 256Mb of writeback cache that we want to avoid flushing - because
it is battery backed and can be ordered.

By all means have drivers fall back to cache writeback, but don't assume
that is the basic operation.

Indeed a smarter raid card can generally do

"read"
"read with readahead"
"read with readahead and some readahead on card only"
"read but dont cache"

"write to cache"
"write through cache"
"write uncached"

Alan

2001-10-05 15:35:07

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

Alan Cox <[email protected]> writes:

> > > We (as in Linux) should make sure that we explicitly tell the disk when
> > > we need it to flush its disk buffers. We don't do that right, and
> > > because of _our_ problems some people claim that writeback caching is
> > > evil and bad.
> >
> > Does this even work right for IDE ?
>
> Current IDE drives it may be a NOP. Worse than that it would totally ruin
> high end raid performance. We need to pass write barriers. A good i2o card
> might have 256Mb of writeback cache that we want to avoid flushing - because
> it is battery backed and can be ordered.

If the cache is small and is primarily a track cache (IDE) one trick that
we can do is to flood the cache with data so everything is forced out.

We can do this at mkfs time, (so even destructive tests are allowed)
and we can probe how to make this work for a particular drive. And
then the kernel can just use the results of that probe.

> By all means have drivers fall back to cache writeback, but don't assume
> that is the basic operation.

Definentily. We want a write-barrier however we can get it.

> Indeed a smarter raid card can generally do

Eric

2001-10-05 20:25:18

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

In article <[email protected]> you wrote:
> Definentily. We want a write-barrier however we can get it.

Does that mean we can or we can't? Is there a flush write cache operation in
ATA? I asume there is one in SCSI?

Greetings
Bernd

2001-10-05 23:41:40

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

In article <[email protected]>,
Bernd Eckenfels <[email protected]> wrote:
>In article <[email protected]> you wrote:
>> Definentily. We want a write-barrier however we can get it.
>
>Does that mean we can or we can't? Is there a flush write cache operation in
>ATA? I asume there is one in SCSI?

Well hdparm has a -W option with which you can turn on/off the
write cache. If that works (and it appears it does) you should be
able to turn write cache off, write *one* block so that the
cache gets flushed and turn it back on. I'm not sure how to
test this, though.

Mike.
--
Move sig.

2001-10-06 08:32:24

by Tonu Samuel

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

On Sat, 2001-10-06 at 01:41, Miquel van Smoorenburg wrote:
> >Does that mean we can or we can't? Is there a flush write cache operation in
> >ATA? I asume there is one in SCSI?
>
> Well hdparm has a -W option with which you can turn on/off the
> write cache. If that works (and it appears it does) you should be
> able to turn write cache off, write *one* block so that the
> cache gets flushed and turn it back on. I'm not sure how to
> test this, though.

Doesn't hdparm -W0f do the work?

T?nu

2001-10-06 09:15:53

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

In article <[email protected]>,
Tonu Samuel <[email protected]> wrote:
>On Sat, 2001-10-06 at 01:41, Miquel van Smoorenburg wrote:
>> >Does that mean we can or we can't? Is there a flush write cache operation in
>> >ATA? I asume there is one in SCSI?
>>
>> Well hdparm has a -W option with which you can turn on/off the
>> write cache. If that works (and it appears it does) you should be
>> able to turn write cache off, write *one* block so that the
>> cache gets flushed and turn it back on. I'm not sure how to
>> test this, though.
>
>Doesn't hdparm -W0f do the work?

No, -f flushes the kernels buffer cache, not the IDE disk write cache.

Mike.
--
Move sig.

2001-10-06 16:42:17

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

In article <[email protected]> you wrote:
>> Well hdparm has a -W option with which you can turn on/off the
>> write cache. If that works (and it appears it does) you should be
>> able to turn write cache off, write *one* block so that the
>> cache gets flushed and turn it back on. I'm not sure how to
>> test this, though.

> Doesn't hdparm -W0f do the work?

We are talking about a write barrier. This means you write all stuff which
can be written unordered (all data) and then you initiate the barrier.. and
if that is finished, you write the commit block. That way you can get
increased write performance and still transaction safe persitence.

Gruss
Bernd

2001-10-06 20:18:34

by Pavel Machek

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

Hi!

> > > > We (as in Linux) should make sure that we explicitly tell the disk when
> > > > we need it to flush its disk buffers. We don't do that right, and
> > > > because of _our_ problems some people claim that writeback caching is
> > > > evil and bad.
> > >
> > > Does this even work right for IDE ?
> >
> > Current IDE drives it may be a NOP. Worse than that it would totally ruin
> > high end raid performance. We need to pass write barriers. A good i2o card
> > might have 256Mb of writeback cache that we want to avoid flushing - because
> > it is battery backed and can be ordered.
>
> If the cache is small and is primarily a track cache (IDE) one trick that
> we can do is to flood the cache with data so everything is forced out.
>
> We can do this at mkfs time, (so even destructive tests are allowed)
> and we can probe how to make this work for a particular drive. And
> then the kernel can just use the results of that probe.

How do you probe this without actually powering system down?

2001-10-07 01:01:04

by ebiederman

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

Pavel Machek <[email protected]> writes:

> Hi!
>
> > > > > We (as in Linux) should make sure that we explicitly tell the disk when
> > > > > we need it to flush its disk buffers. We don't do that right, and
> > > > > because of _our_ problems some people claim that writeback caching is
> > > > > evil and bad.
> > > >
> > > > Does this even work right for IDE ?
> > >
> > > Current IDE drives it may be a NOP. Worse than that it would totally ruin
> > > high end raid performance. We need to pass write barriers. A good i2o card
> > > might have 256Mb of writeback cache that we want to avoid flushing - because
>
> > > it is battery backed and can be ordered.
> >
> > If the cache is small and is primarily a track cache (IDE) one trick that
> > we can do is to flood the cache with data so everything is forced out.
> >
> > We can do this at mkfs time, (so even destructive tests are allowed)
> > and we can probe how to make this work for a particular drive. And
> > then the kernel can just use the results of that probe.
>
> How do you probe this without actually powering system down?

You can't be 100% certain. But you can do timings. And usually you can
infer what is happening in the caches from that. For example if you
take timings with the cache enabled and disabled, and the speed is the
same you can be fairly confident that the caches doen't disable.

Having a final verification step where you ask the user to pull the plug
could add some extra confidence. But even then weird cases of buggy
firmware could defeat you.

Eric





2001-10-10 17:31:04

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [POT] Which journalised filesystem ?

Hi,

On Fri, Oct 05, 2001 at 03:57:49PM +0100, Alan Cox wrote:
> > > We (as in Linux) should make sure that we explicitly tell the disk when
> > > we need it to flush its disk buffers. We don't do that right, and
> > > because of _our_ problems some people claim that writeback caching is
> > > evil and bad.
> >
> > Does this even work right for IDE ?
>
> Current IDE drives it may be a NOP. Worse than that it would totally ruin
> high end raid performance. We need to pass write barriers. A good i2o card
> might have 256Mb of writeback cache that we want to avoid flushing - because
> it is battery backed and can be ordered.

The important thing is to flush to non-volatile storage: non-volatile
cache still qualifies. The one thing we need to avoid is the data
lingering in volatile cache, and that's a different thing.

Sure, journaling filesystems can benefit from a write barrier, but at
some point that's not sufficient --- we really need to know, at a high
level, whether the data is permanently secured. When your MTA
finishes its fsync(), it assumes that the mail spool file has been
securely stored and it can tell the sender to go ahead and delete the
upstream copy.

A barrier is not sufficient there. It's a useful primitive to have,
but not a substitute for a flush to permanent storage.

--Stephen