2003-02-17 00:35:00

by Martin J. Bligh

[permalink] [raw]
Subject: Performance of ext3 on large systems

OK, so I guess we all know that ext3 doesn't scale well. But by
accident, I have some numbers on exactly how bad it really is:

Kernbench-2: (make -j N vmlinux, where N = 2 x num_cpus)
Elapsed User System CPU
2.5.61-mjb0.1-ext3 48.47 564.13 143.16 1458.67
2.5.61-mjb0.1-ext2 46.06 563.04 115.36 1472.33

(look at system time ... eeek!)

diffprofile (+ is worse with ext3, - better)

12702 .text.lock.inode
7786 default_idle
1706 ext3_dirty_inode
1694 start_this_handle
1636 ext3_do_update_inode
1304 .text.lock.dir
983 journal_add_journal_head
903 __find_get_block_slow
797 .text.lock.sem
630 __mark_inode_dirty
567 __brelse
537 __find_get_block
523 __wake_up
459 ext3_get_inode_loc
454 find_get_page
434 __blk_queue_bounce
382 generic_fillattr
360 fd_install
357 do_get_write_access
308 d_lookup
290 vfs_read
289 do_anonymous_page
272 dput
267 file_ra_state_init
249 journal_get_write_access
243 page_remove_rmap
222 link_path_walk
220 may_open
195 vm_enough_memory
189 ext3_readdir
186 journal_stop
185 journal_dirty_metadata
152 __fput
148 update_atime
110 zap_pte_range
105 .text.lock.sched
100 fput
96 buffered_rmqueue
95 filemap_nopage
93 ext3_prepare_write
90 page_add_rmap
86 find_next_usable_block
76 block_write_full_page
71 journal_cancel_revoke
70 bh_lru_install
65 journal_unlock_journal_head
64 .text.lock.namei
63 do_page_cache_readahead
61 log_space_left
61 ext3_check_dir_entry
59 __copy_from_user_ll
58 kfree
58 journal_commit_transaction
54 kmem_cache_free
54 do_sync_read
54 .text.lock.char_dev
50 ext3_get_block_handle
...
-52 find_vma
-58 page_address
-58 get_empty_filp
-60 do_generic_mapping_read
-74 ext2_readdir
-85 generic_file_open
-87 atomic_dec_and_lock
-109 file_move
-513 dentry_open
-1468 follow_mount
-2091 .text.lock.file_table


2003-02-17 01:09:48

by Dave Hansen

[permalink] [raw]
Subject: Re: Performance of ext3 on large systems

Martin J. Bligh wrote:
> OK, so I guess we all know that ext3 doesn't scale well. But by
> accident, I have some numbers on exactly how bad it really is:
>
> Kernbench-2: (make -j N vmlinux, where N = 2 x num_cpus)
> Elapsed User System CPU
> 2.5.61-mjb0.1-ext3 48.47 564.13 143.16 1458.67
> 2.5.61-mjb0.1-ext2 46.06 563.04 115.36 1472.33
>
> (look at system time ... eeek!)
>
> diffprofile (+ is worse with ext3, - better)
>
> 12702 .text.lock.inode

# grep -c lock_kernel fs/ext3/inode.c
35



--
Dave Hansen
[email protected]

2003-02-17 01:18:58

by Andrew Morton

[permalink] [raw]
Subject: Re: Performance of ext3 on large systems

"Martin J. Bligh" <[email protected]> wrote:
>
> (look at system time ... eeek!)

Can we just say that ext3's talents lie elsewhere?

I've got some stuff which helps a bit, but nobody has had the time
to implement the significant overhaul which is needed here.

noatime would help.

2003-02-17 15:10:31

by Sean Neakums

[permalink] [raw]
Subject: Re: Performance of ext3 on large systems

commence Andrew Morton quotation:

> "Martin J. Bligh" <[email protected]> wrote:
>>
>> (look at system time ... eeek!)
>
> Can we just say that ext3's talents lie elsewhere?
>
> I've got some stuff which helps a bit, but nobody has had the time
> to implement the significant overhaul which is needed here.
>
> noatime would help.

ext3 doesn't implement noatime!? Hurg...

--
/ |
[|] Sean Neakums | Size *does* matter.
[|] <[email protected]> | That's why I use Emacs.
\ |

2003-02-17 15:18:18

by John Bradford

[permalink] [raw]
Subject: Re: Performance of ext3 on large systems

> > Can we just say that ext3's talents lie elsewhere?
> >
> > I've got some stuff which helps a bit, but nobody has had the time
> > to implement the significant overhaul which is needed here.
> >
> > noatime would help.
>
> ext3 doesn't implement noatime!? Hurg...

Actually, it makes sense in a way - noatime only speeds up reads, not
writes, (access time is always updated on a write), whereas a
journaled filesystem is presumably intended to be tuned for write
performance. So, for it's intended usage, not implementing noatime
shouldn't be a huge problem, although it would be useful.

John.

2003-02-17 15:46:04

by Robert Love

[permalink] [raw]
Subject: Re: Performance of ext3 on large systems

On Mon, 2003-02-17 at 10:29, John Bradford wrote:

> > ext3 doesn't implement noatime!? Hurg...

noatime is implemented.

> Actually, it makes sense in a way - noatime only speeds up reads, not
> writes, (access time is always updated on a write), whereas a
> journaled filesystem is presumably intended to be tuned for write
> performance. So, for it's intended usage, not implementing noatime
> shouldn't be a huge problem, although it would be useful.

But updating the access time _is_ a write, even if its due to a read.
And using 'noatime' does help, and it is implemented. I guess Andrew's
statement was just misinterpreted, because this is what he said.

Robert Love

2003-02-17 15:56:58

by Sean Neakums

[permalink] [raw]
Subject: Re: Performance of ext3 on large systems

commence Robert Love quotation:

> On Mon, 2003-02-17 at 10:29, John Bradford wrote:
>
>> > ext3 doesn't implement noatime!? Hurg...
>
> noatime is implemented.
>
>> Actually, it makes sense in a way - noatime only speeds up reads, not
>> writes, (access time is always updated on a write), whereas a
>> journaled filesystem is presumably intended to be tuned for write
>> performance. So, for it's intended usage, not implementing noatime
>> shouldn't be a huge problem, although it would be useful.
>
> But updating the access time _is_ a write, even if its due to a read.
> And using 'noatime' does help, and it is implemented. I guess Andrew's
> statement was just misinterpreted, because this is what he said.

Ah, yes. My bad.

--
/ |
[|] Sean Neakums | Size *does* matter.
[|] <[email protected]> | That's why I use Emacs.
\ |

2003-02-17 16:11:13

by John Bradford

[permalink] [raw]
Subject: Re: Performance of ext3 on large systems

> > Actually, it makes sense in a way - noatime only speeds up reads, not
> > writes, (access time is always updated on a write), whereas a
> > journaled filesystem is presumably intended to be tuned for write
> > performance. So, for it's intended usage, not implementing noatime
> > shouldn't be a huge problem, although it would be useful.
>
> But updating the access time _is_ a write, even if its due to a read.
> And using 'noatime' does help, and it is implemented. I guess Andrew's
> statement was just misinterpreted, because this is what he said.

Well, yes, but that's not what I was saying - what was saying is that
if you are primarily reading anyway, there isn't much to be gained
from using EXT-3, over EXT-2.

If you are primarily writing, EXT-3 atime should be faster than EXT-2
noatime. EXT-3 notime will obviously be even faster.

John.

2003-02-17 16:38:01

by Matti Aarnio

[permalink] [raw]
Subject: Re: Performance of ext3 on large systems

On Mon, Feb 17, 2003 at 04:22:22PM +0000, John Bradford wrote:
...
> Well, yes, but that's not what I was saying - what was saying is that
> if you are primarily reading anyway, there isn't much to be gained
> from using EXT-3, over EXT-2.

Besides of data robustness.

> If you are primarily writing, EXT-3 atime should be faster than EXT-2
> noatime. EXT-3 notime will obviously be even faster.

No. For primarily writing the 'noatime' effect disappears in background
noice. Every time you write into file, mtime will be updated, and also
ctime. Only one of i-node timestamps _not_ updated is atime...

> John.

/Matti Aarnio

2003-02-17 16:54:09

by John Bradford

[permalink] [raw]
Subject: Re: Performance of ext3 on large systems

> > Well, yes, but that's not what I was saying - what was saying is that
> > if you are primarily reading anyway, there isn't much to be gained
> > from using EXT-3, over EXT-2.
>
> Besides of data robustness.

Well yes, but that only matters if the filesystem isn't unmounted
cleanly.

> > If you are primarily writing, EXT-3 atime should be faster than EXT-2
> > noatime. EXT-3 notime will obviously be even faster.
>
> No. For primarily writing the 'noatime' effect disappears in background
> noice. Every time you write into file, mtime will be updated, and also
> ctime. Only one of i-node timestamps _not_ updated is atime...

Well, that's what I was implying, that for primarily writing, EXT-3
should be better than EXT-2, regardless of the atime configuration.

So, we agree :-).

John.