2007-01-07 09:16:34

by Andrew Morton

[permalink] [raw]
Subject: Re: How git affects kernel.org performance

On Sun, 7 Jan 2007 09:55:26 +0100
Willy Tarreau <[email protected]> wrote:

> On Sat, Jan 06, 2007 at 09:39:42PM -0800, Linus Torvalds wrote:
> >
> >
> > On Sat, 6 Jan 2007, H. Peter Anvin wrote:
> > >
> > > During extremely high load, it appears that what slows kernel.org down more
> > > than anything else is the time that each individual getdents() call takes.
> > > When I've looked this I've observed times from 200 ms to almost 2 seconds!
> > > Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly
> > > packed tree, you can do the math yourself.
> >
> > "getdents()" is totally serialized by the inode semaphore. It's one of the
> > most expensive system calls in Linux, partly because of that, and partly
> > because it has to call all the way down into the filesystem in a way that
> > almost no other common system call has to (99% of all filesystem calls can
> > be handled basically at the VFS layer with generic caches - but not
> > getdents()).
> >
> > So if there are concurrent readdirs on the same directory, they get
> > serialized. If there is any file creation/deletion activity in the
> > directory, it serializes getdents().
> >
> > To make matters worse, I don't think it has any read-ahead at all when you
> > use hashed directory entries. So if you have cold-cache case, you'll read
> > every single block totally individually, and serialized. One block at a
> > time (I think the non-hashed case is likely also suspect, but that's a
> > separate issue)
> >
> > In other words, I'm not at all surprised it hits on filldir time.
> > Especially on ext3.
>
> At work, we had the same problem on a file server with ext3. We use rsync
> to make backups to a local IDE disk, and we noticed that getdents() took
> about the same time as Peter reports (0.2 to 2 seconds), especially in
> maildir directories. We tried many things to fix it with no result,
> including enabling dirindexes. Finally, we made a full backup, and switched
> over to XFS and the problem totally disappeared. So it seems that the
> filesystem matters a lot here when there are lots of entries in a
> directory, and that ext3 is not suitable for usages with thousands
> of entries in directories with millions of files on disk. I'm not
> certain it would be that easy to try other filesystems on kernel.org
> though :-/
>

Yeah, slowly-growing directories will get splattered all over the disk.

Possible short-term fixes would be to just allocate up to (say) eight
blocks when we grow a directory by one block. Or teach the
directory-growth code to use ext3 reservations.

Longer-term people are talking about things like on-disk rerservations.
But I expect directories are being forgotten about in all of that.


2007-01-07 09:38:20

by Rene Herman

[permalink] [raw]
Subject: Re: How git affects kernel.org performance

On 01/07/2007 10:15 AM, Andrew Morton wrote:

> Yeah, slowly-growing directories will get splattered all over the
> disk.
>
> Possible short-term fixes would be to just allocate up to (say) eight
> blocks when we grow a directory by one block. Or teach the
> directory-growth code to use ext3 reservations.
>
> Longer-term people are talking about things like on-disk
> rerservations. But I expect directories are being forgotten about in
> all of that.

I wish people would just talk about de2fsrag... ;-\

Rene

2007-01-08 03:02:47

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: How git affects kernel.org performance

On Sun, Jan 07, 2007 at 01:15:42AM -0800, Andrew Morton wrote:
> On Sun, 7 Jan 2007 09:55:26 +0100
> Willy Tarreau <[email protected]> wrote:
>
> > On Sat, Jan 06, 2007 at 09:39:42PM -0800, Linus Torvalds wrote:
> > >
> > >
> > > On Sat, 6 Jan 2007, H. Peter Anvin wrote:
> > > >
> > > > During extremely high load, it appears that what slows kernel.org down more
> > > > than anything else is the time that each individual getdents() call takes.
> > > > When I've looked this I've observed times from 200 ms to almost 2 seconds!
> > > > Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly
> > > > packed tree, you can do the math yourself.
> > >
> > > "getdents()" is totally serialized by the inode semaphore. It's one of the
> > > most expensive system calls in Linux, partly because of that, and partly
> > > because it has to call all the way down into the filesystem in a way that
> > > almost no other common system call has to (99% of all filesystem calls can
> > > be handled basically at the VFS layer with generic caches - but not
> > > getdents()).
> > >
> > > So if there are concurrent readdirs on the same directory, they get
> > > serialized. If there is any file creation/deletion activity in the
> > > directory, it serializes getdents().
> > >
> > > To make matters worse, I don't think it has any read-ahead at all when you
> > > use hashed directory entries. So if you have cold-cache case, you'll read
> > > every single block totally individually, and serialized. One block at a
> > > time (I think the non-hashed case is likely also suspect, but that's a
> > > separate issue)
> > >
> > > In other words, I'm not at all surprised it hits on filldir time.
> > > Especially on ext3.
> >
> > At work, we had the same problem on a file server with ext3. We use rsync
> > to make backups to a local IDE disk, and we noticed that getdents() took
> > about the same time as Peter reports (0.2 to 2 seconds), especially in
> > maildir directories. We tried many things to fix it with no result,
> > including enabling dirindexes. Finally, we made a full backup, and switched
> > over to XFS and the problem totally disappeared. So it seems that the
> > filesystem matters a lot here when there are lots of entries in a
> > directory, and that ext3 is not suitable for usages with thousands
> > of entries in directories with millions of files on disk. I'm not
> > certain it would be that easy to try other filesystems on kernel.org
> > though :-/
> >
>
> Yeah, slowly-growing directories will get splattered all over the disk.
>
> Possible short-term fixes would be to just allocate up to (say) eight
> blocks when we grow a directory by one block. Or teach the
> directory-growth code to use ext3 reservations.
>
> Longer-term people are talking about things like on-disk rerservations.
> But I expect directories are being forgotten about in all of that.

By on-disk reservations, do you mean persistent file preallocation ? (that
is explicit preallocation of blocks to a given file) If so, you are
right, we haven't really given any thought to the possibility of directories
needing that feature.

Regards
Suparna

>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India

2007-01-08 13:15:08

by Theodore Ts'o

[permalink] [raw]
Subject: Re: How git affects kernel.org performance

On Mon, Jan 08, 2007 at 08:35:55AM +0530, Suparna Bhattacharya wrote:
> > Yeah, slowly-growing directories will get splattered all over the disk.
> >
> > Possible short-term fixes would be to just allocate up to (say) eight
> > blocks when we grow a directory by one block. Or teach the
> > directory-growth code to use ext3 reservations.
> >
> > Longer-term people are talking about things like on-disk rerservations.
> > But I expect directories are being forgotten about in all of that.
>
> By on-disk reservations, do you mean persistent file preallocation ? (that
> is explicit preallocation of blocks to a given file) If so, you are
> right, we haven't really given any thought to the possibility of directories
> needing that feature.

The fastest and probably most important thing to add is some readahead
smarts to directories --- both to the htree and non-htree cases. If
you're using some kind of b-tree structure, such as XFS does for
directories, preallocation doesn't help you much. Delayed allocation
can save you if your delayed allocator knows how to structure disk
blocks so that a btree-traversal is efficient, but I'm guessing the
biggest reason why we are losing is because we don't have sufficient
readahead. This also has the advantage that it will help without
needing to doing a backup/restore to improve layout.

Allocating some number of empty blocks when we grow the directory
would be a quick hack that I'd probably do as a 2nd priority. It
won't help pre-existing directories, but combined with readahead
logic, should help us out greatly in the non-btree case.

- Ted

2007-01-08 13:43:01

by Jeff Garzik

[permalink] [raw]
Subject: Re: How git affects kernel.org performance

Theodore Tso wrote:
> The fastest and probably most important thing to add is some readahead
> smarts to directories --- both to the htree and non-htree cases. If
> you're using some kind of b-tree structure, such as XFS does for
> directories, preallocation doesn't help you much. Delayed allocation
> can save you if your delayed allocator knows how to structure disk
> blocks so that a btree-traversal is efficient, but I'm guessing the
> biggest reason why we are losing is because we don't have sufficient
> readahead. This also has the advantage that it will help without
> needing to doing a backup/restore to improve layout.


Something I just thought of: ATA and SCSI hard disks do their own
read-ahead. Seeking all over the place to pick up bits of directory
will hurt even more with the disk reading and throwing away data (albeit
in its internal elevator and cache).

Jeff

2007-01-08 13:57:47

by Theodore Ts'o

[permalink] [raw]
Subject: Re: How git affects kernel.org performance

On Mon, Jan 08, 2007 at 02:41:47PM +0100, Johannes Stezenbach wrote:
>
> Would e2fsck -D help? What kind of optimization
> does it perform?

It will help a little; e2fsck -D compresses the logical view of the
directory, but it doesn't optimize the physical layout on disk at all,
and of course, it won't help with the lack of readahead logic. It's
possible to improve how e2fsck -D works, at the moment, it's not
trying to make the directory be contiguous on disk. What it should
probably do is to pull a list of all of the blocks used by the
directory, sort them, and then try to see if it can improve on the
list by allocating some new blocks that would make the directory more
contiguous on disk. I suspect any improvements that would be seen by
doing this would be second order effects at most, though.

- Ted

2007-01-08 14:01:23

by Pavel Machek

[permalink] [raw]
Subject: Re: How git affects kernel.org performance

Hi!

> > Would e2fsck -D help? What kind of optimization
> > does it perform?
>
> It will help a little; e2fsck -D compresses the logical view of the
> directory, but it doesn't optimize the physical layout on disk at all,
> and of course, it won't help with the lack of readahead logic. It's
> possible to improve how e2fsck -D works, at the moment, it's not
> trying to make the directory be contiguous on disk. What it should
> probably do is to pull a list of all of the blocks used by the
> directory, sort them, and then try to see if it can improve on the
> list by allocating some new blocks that would make the directory more
> contiguous on disk. I suspect any improvements that would be seen by
> doing this would be second order effects at most, though.

...sounds like a job for e2defrag, not e2fsck...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-01-08 13:41:47

by Johannes Stezenbach

[permalink] [raw]
Subject: Re: How git affects kernel.org performance

On Mon, Jan 08, 2007 at 07:58:19AM -0500, Theodore Tso wrote:
>
> The fastest and probably most important thing to add is some readahead
> smarts to directories --- both to the htree and non-htree cases. If
> you're using some kind of b-tree structure, such as XFS does for
> directories, preallocation doesn't help you much. Delayed allocation
> can save you if your delayed allocator knows how to structure disk
> blocks so that a btree-traversal is efficient, but I'm guessing the
> biggest reason why we are losing is because we don't have sufficient
> readahead. This also has the advantage that it will help without
> needing to doing a backup/restore to improve layout.

Would e2fsck -D help? What kind of optimization
does it perform?


Thanks,
Johannes

2007-01-08 14:17:55

by Theodore Ts'o

[permalink] [raw]
Subject: Re: How git affects kernel.org performance

On Mon, Jan 08, 2007 at 02:59:52PM +0100, Pavel Machek wrote:
> Hi!
>
> > > Would e2fsck -D help? What kind of optimization
> > > does it perform?
> >
> > It will help a little; e2fsck -D compresses the logical view of the
> > directory, but it doesn't optimize the physical layout on disk at all,
> > and of course, it won't help with the lack of readahead logic. It's
> > possible to improve how e2fsck -D works, at the moment, it's not
> > trying to make the directory be contiguous on disk. What it should
> > probably do is to pull a list of all of the blocks used by the
> > directory, sort them, and then try to see if it can improve on the
> > list by allocating some new blocks that would make the directory more
> > contiguous on disk. I suspect any improvements that would be seen by
> > doing this would be second order effects at most, though.
>
> ...sounds like a job for e2defrag, not e2fsck...

I wasn't proposing to move other data blocks around in order make the
directory be contiguous, but just a "quick and dirty" try to make
things better. But yes, in order to really fix layout issues you
would have to do a full defrag, and it's probably more important that
we try to fix things so that defragmentation runs aren't necessary in
the first place....

- Ted

2007-01-09 01:11:52

by Paul Jackson

[permalink] [raw]
Subject: Re: How git affects kernel.org performance

Jeff wrote:
> Something I just thought of: ATA and SCSI hard disks do their own
> read-ahead.

Probably this is wishful thinking on my part, but I would have hoped
that most of the read-ahead they did was for stuff that happened to be
on the cylinder they were reading anyway. So long as their read-ahead
doesn't cause much extra or delayed disk head motion, what does it
matter?

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2007-01-09 02:19:01

by Jeremy Higdon

[permalink] [raw]
Subject: Re: How git affects kernel.org performance

On Mon, Jan 08, 2007 at 05:09:34PM -0800, Paul Jackson wrote:
> Jeff wrote:
> > Something I just thought of: ATA and SCSI hard disks do their own
> > read-ahead.
>
> Probably this is wishful thinking on my part, but I would have hoped
> that most of the read-ahead they did was for stuff that happened to be
> on the cylinder they were reading anyway. So long as their read-ahead
> doesn't cause much extra or delayed disk head motion, what does it
> matter?


And they usually won't readahead if there is another command to
process, though they can be set up to read unrequested data in
spite of outstanding commands.

When they are reading ahead, they'll only fetch LBAs beyond the last
request until a buffer fills or the readahead gets interrupted.

jeremy

2007-01-09 07:59:46

by Wu Fengguang

[permalink] [raw]
Subject: Re: How git affects kernel.org performance

On Mon, Jan 08, 2007 at 07:58:19AM -0500, Theodore Tso wrote:
> On Mon, Jan 08, 2007 at 08:35:55AM +0530, Suparna Bhattacharya wrote:
> > > Yeah, slowly-growing directories will get splattered all over the disk.
> > >
> > > Possible short-term fixes would be to just allocate up to (say) eight
> > > blocks when we grow a directory by one block. Or teach the
> > > directory-growth code to use ext3 reservations.
> > >
> > > Longer-term people are talking about things like on-disk rerservations.
> > > But I expect directories are being forgotten about in all of that.
> >
> > By on-disk reservations, do you mean persistent file preallocation ? (that
> > is explicit preallocation of blocks to a given file) If so, you are
> > right, we haven't really given any thought to the possibility of directories
> > needing that feature.
>
> The fastest and probably most important thing to add is some readahead
> smarts to directories --- both to the htree and non-htree cases. If

Here's is a quick hack to practice the directory readahead idea.
Comments are welcome, it's a freshman's work :)

Regards,
Wu
---
fs/ext3/dir.c | 22 ++++++++++++++++++++++
fs/ext3/inode.c | 2 +-
2 files changed, 23 insertions(+), 1 deletion(-)

--- linux.orig/fs/ext3/dir.c
+++ linux/fs/ext3/dir.c
@@ -94,6 +94,25 @@ int ext3_check_dir_entry (const char * f
return error_msg == NULL ? 1 : 0;
}

+int ext3_get_block(struct inode *inode, sector_t iblock,
+ struct buffer_head *bh_result, int create);
+
+static void ext3_dir_readahead(struct file * filp)
+{
+ struct inode *inode = filp->f_path.dentry->d_inode;
+ struct address_space *mapping = inode->i_sb->s_bdev->bd_inode->i_mapping;
+ unsigned long sector;
+ unsigned long blk;
+ pgoff_t offset;
+
+ for (blk = 0; blk < inode->i_blocks; blk++) {
+ sector = blk << (inode->i_blkbits - 9);
+ sector = generic_block_bmap(inode->i_mapping, sector, ext3_get_block);
+ offset = sector >> (PAGE_CACHE_SHIFT - 9);
+ do_page_cache_readahead(mapping, filp, offset, 1);
+ }
+}
+
static int ext3_readdir(struct file * filp,
void * dirent, filldir_t filldir)
{
@@ -108,6 +127,9 @@ static int ext3_readdir(struct file * fi

sb = inode->i_sb;

+ if (!filp->f_pos)
+ ext3_dir_readahead(filp);
+
#ifdef CONFIG_EXT3_INDEX
if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
EXT3_FEATURE_COMPAT_DIR_INDEX) &&
--- linux.orig/fs/ext3/inode.c
+++ linux/fs/ext3/inode.c
@@ -945,7 +945,7 @@ out:

#define DIO_CREDITS (EXT3_RESERVE_TRANS_BLOCKS + 32)

-static int ext3_get_block(struct inode *inode, sector_t iblock,
+int ext3_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
handle_t *handle = journal_current_handle();

2007-01-09 08:07:48

by Wu Fengguang

[permalink] [raw]
Subject: Re: How git affects kernel.org performance

On Mon, Jan 08, 2007 at 07:58:19AM -0500, Theodore Tso wrote:
> On Mon, Jan 08, 2007 at 08:35:55AM +0530, Suparna Bhattacharya wrote:
> > > Yeah, slowly-growing directories will get splattered all over the disk.
> > >
> > > Possible short-term fixes would be to just allocate up to (say) eight
> > > blocks when we grow a directory by one block. Or teach the
> > > directory-growth code to use ext3 reservations.
> > >
> > > Longer-term people are talking about things like on-disk rerservations.
> > > But I expect directories are being forgotten about in all of that.
> >
> > By on-disk reservations, do you mean persistent file preallocation ? (that
> > is explicit preallocation of blocks to a given file) If so, you are
> > right, we haven't really given any thought to the possibility of directories
> > needing that feature.
>
> The fastest and probably most important thing to add is some readahead
> smarts to directories --- both to the htree and non-htree cases. If

Here's is a quick hack to practice the directory readahead idea.
Comments are welcome, it's a freshman's work :)

Regards,
Wu
---
fs/ext3/dir.c | 22 ++++++++++++++++++++++
fs/ext3/inode.c | 2 +-
2 files changed, 23 insertions(+), 1 deletion(-)

--- linux.orig/fs/ext3/dir.c
+++ linux/fs/ext3/dir.c
@@ -94,6 +94,25 @@ int ext3_check_dir_entry (const char * f
return error_msg == NULL ? 1 : 0;
}

+int ext3_get_block(struct inode *inode, sector_t iblock,
+ struct buffer_head *bh_result, int create);
+
+static void ext3_dir_readahead(struct file * filp)
+{
+ struct inode *inode = filp->f_path.dentry->d_inode;
+ struct address_space *mapping = inode->i_sb->s_bdev->bd_inode->i_mapping;
+ unsigned long sector;
+ unsigned long blk;
+ pgoff_t offset;
+
+ for (blk = 0; blk < inode->i_blocks; blk++) {
+ sector = blk << (inode->i_blkbits - 9);
+ sector = generic_block_bmap(inode->i_mapping, sector, ext3_get_block);
+ offset = sector >> (PAGE_CACHE_SHIFT - 9);
+ do_page_cache_readahead(mapping, filp, offset, 1);
+ }
+}
+
static int ext3_readdir(struct file * filp,
void * dirent, filldir_t filldir)
{
@@ -108,6 +127,9 @@ static int ext3_readdir(struct file * fi

sb = inode->i_sb;

+ if (!filp->f_pos)
+ ext3_dir_readahead(filp);
+
#ifdef CONFIG_EXT3_INDEX
if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
EXT3_FEATURE_COMPAT_DIR_INDEX) &&
--- linux.orig/fs/ext3/inode.c
+++ linux/fs/ext3/inode.c
@@ -945,7 +945,7 @@ out:

#define DIO_CREDITS (EXT3_RESERVE_TRANS_BLOCKS + 32)

-static int ext3_get_block(struct inode *inode, sector_t iblock,
+int ext3_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
handle_t *handle = journal_current_handle();

2007-01-09 07:59:46

by Wu Fengguang

[permalink] [raw]
Subject: Re: How git affects kernel.org performance

On Mon, Jan 08, 2007 at 07:58:19AM -0500, Theodore Tso wrote:
> On Mon, Jan 08, 2007 at 08:35:55AM +0530, Suparna Bhattacharya wrote:
> > > Yeah, slowly-growing directories will get splattered all over the disk.
> > >
> > > Possible short-term fixes would be to just allocate up to (say) eight
> > > blocks when we grow a directory by one block. Or teach the
> > > directory-growth code to use ext3 reservations.
> > >
> > > Longer-term people are talking about things like on-disk rerservations.
> > > But I expect directories are being forgotten about in all of that.
> >
> > By on-disk reservations, do you mean persistent file preallocation ? (that
> > is explicit preallocation of blocks to a given file) If so, you are
> > right, we haven't really given any thought to the possibility of directories
> > needing that feature.
>
> The fastest and probably most important thing to add is some readahead
> smarts to directories --- both to the htree and non-htree cases. If

Here's is a quick hack to practice the directory readahead idea.
Comments are welcome, it's a freshman's work :)

Regards,
Wu
---
fs/ext3/dir.c | 22 ++++++++++++++++++++++
fs/ext3/inode.c | 2 +-
2 files changed, 23 insertions(+), 1 deletion(-)

--- linux.orig/fs/ext3/dir.c
+++ linux/fs/ext3/dir.c
@@ -94,6 +94,25 @@ int ext3_check_dir_entry (const char * f
return error_msg == NULL ? 1 : 0;
}

+int ext3_get_block(struct inode *inode, sector_t iblock,
+ struct buffer_head *bh_result, int create);
+
+static void ext3_dir_readahead(struct file * filp)
+{
+ struct inode *inode = filp->f_path.dentry->d_inode;
+ struct address_space *mapping = inode->i_sb->s_bdev->bd_inode->i_mapping;
+ unsigned long sector;
+ unsigned long blk;
+ pgoff_t offset;
+
+ for (blk = 0; blk < inode->i_blocks; blk++) {
+ sector = blk << (inode->i_blkbits - 9);
+ sector = generic_block_bmap(inode->i_mapping, sector, ext3_get_block);
+ offset = sector >> (PAGE_CACHE_SHIFT - 9);
+ do_page_cache_readahead(mapping, filp, offset, 1);
+ }
+}
+
static int ext3_readdir(struct file * filp,
void * dirent, filldir_t filldir)
{
@@ -108,6 +127,9 @@ static int ext3_readdir(struct file * fi

sb = inode->i_sb;

+ if (!filp->f_pos)
+ ext3_dir_readahead(filp);
+
#ifdef CONFIG_EXT3_INDEX
if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
EXT3_FEATURE_COMPAT_DIR_INDEX) &&
--- linux.orig/fs/ext3/inode.c
+++ linux/fs/ext3/inode.c
@@ -945,7 +945,7 @@ out:

#define DIO_CREDITS (EXT3_RESERVE_TRANS_BLOCKS + 32)

-static int ext3_get_block(struct inode *inode, sector_t iblock,
+int ext3_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
handle_t *handle = journal_current_handle();

2007-01-09 16:28:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: How git affects kernel.org performance



On Tue, 9 Jan 2007, Fengguang Wu wrote:
> >
> > The fastest and probably most important thing to add is some readahead
> > smarts to directories --- both to the htree and non-htree cases. If
>
> Here's is a quick hack to practice the directory readahead idea.
> Comments are welcome, it's a freshman's work :)

Well, I'd probably have done it differently, but more important is whether
this actually makes a difference performance-wise. Have you benchmarked it
at all?

Doing an

echo 3 > /proc/sys/vm/drop_caches

is your friend for testing things like this, to force cold-cache
behaviour..

Linus

2007-01-10 01:57:13

by Wu Fengguang

[permalink] [raw]
Subject: Re: How git affects kernel.org performance

On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote:
>
>
> On Tue, 9 Jan 2007, Fengguang Wu wrote:
> > >
> > > The fastest and probably most important thing to add is some readahead
> > > smarts to directories --- both to the htree and non-htree cases. If
> >
> > Here's is a quick hack to practice the directory readahead idea.
> > Comments are welcome, it's a freshman's work :)
>
> Well, I'd probably have done it differently, but more important is whether
> this actually makes a difference performance-wise. Have you benchmarked it
> at all?

Yes, a trivial test shows a marginal improvement, on a minimal debian system:

# find / | wc -l
13641

# time find / > /dev/null

real 0m10.000s
user 0m0.210s
sys 0m4.370s

# time find / > /dev/null

real 0m9.890s
user 0m0.160s
sys 0m3.270s

> Doing an
>
> echo 3 > /proc/sys/vm/drop_caches
>
> is your friend for testing things like this, to force cold-cache
> behaviour..

Thanks, I'll work out numbers on large/concurrent dir accesses soon.

Regards,
Wu

2007-01-10 01:57:39

by Wu Fengguang

[permalink] [raw]
Subject: Re: How git affects kernel.org performance

On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote:
>
>
> On Tue, 9 Jan 2007, Fengguang Wu wrote:
> > >
> > > The fastest and probably most important thing to add is some readahead
> > > smarts to directories --- both to the htree and non-htree cases. If
> >
> > Here's is a quick hack to practice the directory readahead idea.
> > Comments are welcome, it's a freshman's work :)
>
> Well, I'd probably have done it differently, but more important is whether
> this actually makes a difference performance-wise. Have you benchmarked it
> at all?

Yes, a trivial test shows a marginal improvement, on a minimal debian system:

# find / | wc -l
13641

# time find / > /dev/null

real 0m10.000s
user 0m0.210s
sys 0m4.370s

# time find / > /dev/null

real 0m9.890s
user 0m0.160s
sys 0m3.270s

> Doing an
>
> echo 3 > /proc/sys/vm/drop_caches
>
> is your friend for testing things like this, to force cold-cache
> behaviour..

Thanks, I'll work out numbers on large/concurrent dir accesses soon.

Regards,
Wu

2007-01-10 01:57:39

by Wu Fengguang

[permalink] [raw]
Subject: Re: How git affects kernel.org performance

On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote:
>
>
> On Tue, 9 Jan 2007, Fengguang Wu wrote:
> > >
> > > The fastest and probably most important thing to add is some readahead
> > > smarts to directories --- both to the htree and non-htree cases. If
> >
> > Here's is a quick hack to practice the directory readahead idea.
> > Comments are welcome, it's a freshman's work :)
>
> Well, I'd probably have done it differently, but more important is whether
> this actually makes a difference performance-wise. Have you benchmarked it
> at all?

Yes, a trivial test shows a marginal improvement, on a minimal debian system:

# find / | wc -l
13641

# time find / > /dev/null

real 0m10.000s
user 0m0.210s
sys 0m4.370s

# time find / > /dev/null

real 0m9.890s
user 0m0.160s
sys 0m3.270s

> Doing an
>
> echo 3 > /proc/sys/vm/drop_caches
>
> is your friend for testing things like this, to force cold-cache
> behaviour..

Thanks, I'll work out numbers on large/concurrent dir accesses soon.

Regards,
Wu