2002-08-04 16:06:03

by Roland Kuhn

[permalink] [raw]
Subject: reiserfs blocks long on getdents64() during concurrent write

Dear kernel hackers!

Yesterday, Linus kindly directed me towards the filesystem implementation
concerning this, so I produced some diagnostics in that direction. The
task is to write a big file with dd to a hardware RAID on a dual Athlon MP
machine, while in the meantime checking the progress with ls -l. Here's
the relevant part of 'strace -T ls -l' (it is not always so slow)

open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3 <0.000013>
fstat64(3, {st_mode=S_IFDIR|0755, st_size=160, ...}) = 0 <0.000007>
fcntl64(0x3, 0x2, 0x1, 0x3) = 0 <0.000011>
brk(0x805b000) = 0x805b000 <0.000008>
getdents64(0x3, 0x8058ac8, 0x1000, 0x8058ac8) = 176 <4.726011>
lstat64("bigfile_2", {st_mode=S_IFREG|0644, st_size=2147483648, ...}) = 0 <0.000023>
lstat64("bigfile_3", {st_mode=S_IFREG|0644, st_size=726581248, ...}) = 0 <0.000012>
lstat64("writer", {st_mode=S_IFREG|0755, st_size=58, ...}) = 0 <0.000012>
lstat64("bigfile_", {st_mode=S_IFREG|0644, st_size=2147483648, ...}) = 0 <0.000012>
getdents64(0x3, 0x8058ac8, 0x1000, 0x8058ac8) = 0 <0.000033>
close(3) = 0 <0.000013>

The first call to getdents64 takes 4.7s! I captured the task status and
got this:

writer S 00000018 4732 1652 1 1653 1242 (NOTLB)
Call Trace: [sys_wait4+952/1008] [system_call+51/56]
dd D 0000001A 0 1653 1652 (NOTLB)
Call Trace: [sleep_on+74/112]
[do_journal_begin_r+87/544]
[unmap_underlying_metadata+26/96]
[__block_prepare_write+231/752]
[call_reschedule_interrupt+5/11]
[journal_begin+19/32]
[reiserfs_commit_write+136/336]
[block_prepare_write+29/64]
[generic_file_write+1256/1776]
[sys_write+149/272]
[do_IRQ+197/240]
[system_call+51/56]
ls D 080541A4 5084 1659 1562 (NOTLB)
Call Trace: [sleep_on+74/112]
[do_journal_begin_r+87/544]
[search_by_key+2338/3552]
[journal_begin+19/32]
[reiserfs_dirty_inode+88/160]
[__mark_inode_dirty+46/160]
[update_atime+81/96]
[reiserfs_readdir+1112/1136]
[filemap_nopage+233/528]
[rb_insert_color+112/240]
[vfs_readdir+124/192]
[filldir64+0/320]
[sys_getdents64+79/186]
[filldir64+0/320]
[system_call+51/56]

writer is a shell script calling dd with a blocksize of 1M. The problem of
course vanishes when using noatime, but it still makes me wonder why a
single write request is delayed so long. The other possibility to avoid
this is to use a sync mount, and there I discovered something really
strange: 2.4.19 gives me about 17MB/s while 2.4.18-3 (RedHat) was creeping
slow with 10kB/s!

If you have any thoughts on the cause of this behaviour and/or on how to
fix it, I would be glad to hear them. If it's not too complicated I could
even code something up myself, and I for sure can do any testing needed.

Thanks in advance,
Roland

+---------------------------+-------------------------+
| TU Muenchen | |
| Physik-Department E18 | Raum 3558 |
| James-Franck-Str. | Telefon 089/289-12592 |
| 85747 Garching | |
+---------------------------+-------------------------+


2002-08-05 05:40:59

by Oleg Drokin

[permalink] [raw]
Subject: Re: reiserfs blocks long on getdents64() during concurrent write

Hello!

On Sun, Aug 04, 2002 at 06:09:34PM +0200, Roland Kuhn wrote:

> The first call to getdents64 takes 4.7s! I captured the task status and
> got this:

[call traces skipped]

> writer is a shell script calling dd with a blocksize of 1M. The problem of
> course vanishes when using noatime, but it still makes me wonder why a
> single write request is delayed so long. The other possibility to avoid

It seems that your kernel performs writes slowly.

> this is to use a sync mount, and there I discovered something really
> strange: 2.4.19 gives me about 17MB/s while 2.4.18-3 (RedHat) was creeping
> slow with 10kB/s!

2.4.18-3 is particularly bad know for this exact problem (slow write speed),
you should consider upgrading to at least 2.4.18-5 kernel from redhat updates.

For me on plain 2.4.18 the problem is not visible that bad as for you.
I.e. ls on a directory where I write this big file finishes in under
half a second.

> If you have any thoughts on the cause of this behaviour and/or on how to
> fix it, I would be glad to hear them. If it's not too complicated I could
> even code something up myself, and I for sure can do any testing needed.

You said 2.4.19 writes stuff faster for you, how about testing ls on that
kernel?

Bye,
Oleg

2002-08-05 09:19:42

by Roland Kuhn

[permalink] [raw]
Subject: Re: reiserfs blocks long on getdents64() during concurrent write

Hi!

On Mon, 5 Aug 2002, Oleg Drokin wrote:

> Hello!
>
> On Sun, Aug 04, 2002 at 06:09:34PM +0200, Roland Kuhn wrote:
>
> > The first call to getdents64 takes 4.7s! I captured the task status and
> > got this:
>
> It seems that your kernel performs writes slowly.
>
That's true only in the case of a "filled write queue" (I don't know
exactly how it's working). However, I verified that the average SCSI
command takes less than two seconds to complete in this case (time between
send to card and receive response interrupt).

BTW: was it directly clear for you for what the do_journal_begin_r() was
waiting? I'm not so familiar with fs coding, especially the locking...

While we're at it: it looks like e.g. the 3ware driver is busy waiting in
tw_scsi_queue() until (by virtue of an interrupt?) a request_id becomes
free. Is this common practice? Or is there code somewhere checking for the
availability of these request slots before calling scsi_queue? I would
think that the CPU can do more useful things while waiting for an
interrupt...

> > this is to use a sync mount, and there I discovered something really
> > strange: 2.4.19 gives me about 17MB/s while 2.4.18-3 (RedHat) was creeping
> > slow with 10kB/s!
>
> 2.4.18-3 is particularly bad know for this exact problem (slow write speed),
> you should consider upgrading to at least 2.4.18-5 kernel from redhat updates.
>
> For me on plain 2.4.18 the problem is not visible that bad as for you.
> I.e. ls on a directory where I write this big file finishes in under
> half a second.
>
Do you use IDE or SCSI? It seems like the SCSI drivers can have up to
255*256 sectors pending on the controller. If that's the problem, how can
I decrease this buffering?

> > If you have any thoughts on the cause of this behaviour and/or on how to
> > fix it, I would be glad to hear them. If it's not too complicated I could
> > even code something up myself, and I for sure can do any testing needed.
>
> You said 2.4.19 writes stuff faster for you, how about testing ls on that
> kernel?
>
The call trace was from 2.4.19, and the situation actually was much better
than in the 2.4.18-3 case, where it took about 1 minute for ls to come
back.

Ciao,
Roland

+---------------------------+-------------------------+
| TU Muenchen | |
| Physik-Department E18 | Raum 3558 |
| James-Franck-Str. | Telefon 089/289-12592 |
| 85747 Garching | |
+---------------------------+-------------------------+

2002-08-05 09:50:48

by Oleg Drokin

[permalink] [raw]
Subject: Re: reiserfs blocks long on getdents64() during concurrent write

Hello!

On Mon, Aug 05, 2002 at 11:22:49AM +0200, Roland Kuhn wrote:

> BTW: was it directly clear for you for what the do_journal_begin_r() was
> waiting? I'm not so familiar with fs coding, especially the locking...

I'd say that it's waiting for journal lock that was captured by kupdated
(or something like that) doing periodic write_super() calls, when it thinks
memory is low. Our tests some time ago shown that once memory is lower than
some value, write_super() "method" is called for every superblock that
have "dirty" flag set. In reiserfs that flushes the journal, and while journal
is being flushed, nobody can create new transactions obviously, and since
ls changes directory's atime in normal case, it want to open new transaction.
Unfortunatelly dirty flag for reiserfs gets set way to often than necessary,
we have some patches that should help this (from Chris Mason).
You can try these for yourself too, for example from here:
ftp://ftp.suse.com/pub/people/mason/patches/data-logging/02-commit_super-8-relocation.diff.gz

> > > this is to use a sync mount, and there I discovered something really
> > > strange: 2.4.19 gives me about 17MB/s while 2.4.18-3 (RedHat) was creeping
> > > slow with 10kB/s!
> > 2.4.18-3 is particularly bad know for this exact problem (slow write speed),
> > you should consider upgrading to at least 2.4.18-5 kernel from redhat updates.
> > For me on plain 2.4.18 the problem is not visible that bad as for you.
> > I.e. ls on a directory where I write this big file finishes in under
> > half a second.
> Do you use IDE or SCSI? It seems like the SCSI drivers can have up to

I tested on IDE.

> > You said 2.4.19 writes stuff faster for you, how about testing ls on that
> > kernel?
> The call trace was from 2.4.19, and the situation actually was much better

Ah, I thought call trace was from 2.4.18-3.

> than in the 2.4.18-3 case, where it took about 1 minute for ls to come
> back.

So you might try the patch I mentioned or you can mount with nodiratime mount
option to prevent updqting of directory atimes, but having atimes still to be
updated on regular files at the same time.

Bye,
Oleg

2002-08-05 10:48:05

by Roland Kuhn

[permalink] [raw]
Subject: Re: reiserfs blocks long on getdents64() during concurrent write

Hi!

On Mon, 5 Aug 2002, Oleg Drokin wrote:

> Hello!
>
> On Mon, Aug 05, 2002 at 11:22:49AM +0200, Roland Kuhn wrote:
>
> > BTW: was it directly clear for you for what the do_journal_begin_r() was
> > waiting? I'm not so familiar with fs coding, especially the locking...
>
> I'd say that it's waiting for journal lock that was captured by kupdated
> (or something like that) doing periodic write_super() calls, when it thinks
> memory is low. Our tests some time ago shown that once memory is lower than
> some value, write_super() "method" is called for every superblock that
> have "dirty" flag set. In reiserfs that flushes the journal, and while journal
> is being flushed, nobody can create new transactions obviously, and since
> ls changes directory's atime in normal case, it want to open new transaction.
> Unfortunatelly dirty flag for reiserfs gets set way to often than necessary,
> we have some patches that should help this (from Chris Mason).
> You can try these for yourself too, for example from here:
> ftp://ftp.suse.com/pub/people/mason/patches/data-logging/02-commit_super-8-relocation.diff.gz
>
I will try it immediately. And I will try to find documentation on how
this all is working, so I can understand the implications. Does the
"flushing" also happen, when the journal is full? Or is it possible to
begin a new journal while the old one is written out?

> > than in the 2.4.18-3 case, where it took about 1 minute for ls to come
> > back.
>
> So you might try the patch I mentioned or you can mount with nodiratime mount
> option to prevent updqting of directory atimes, but having atimes still to be
> updated on regular files at the same time.
>
Interesting mount option, though I think very few application actually use
atime anyway, isn't it?

Ciao,
Roland

+---------------------------+-------------------------+
| TU Muenchen | |
| Physik-Department E18 | Raum 3558 |
| James-Franck-Str. | Telefon 089/289-12592 |
| 85747 Garching | |
+---------------------------+-------------------------+

2002-08-05 11:01:05

by Oleg Drokin

[permalink] [raw]
Subject: Re: reiserfs blocks long on getdents64() during concurrent write

Hello!

On Mon, Aug 05, 2002 at 12:51:35PM +0200, Roland Kuhn wrote:

> > Unfortunatelly dirty flag for reiserfs gets set way to often than necessary,
> > we have some patches that should help this (from Chris Mason).
> > You can try these for yourself too, for example from here:
> > ftp://ftp.suse.com/pub/people/mason/patches/data-logging/02-commit_super-8-relocation.diff.gz
> I will try it immediately. And I will try to find documentation on how
> this all is working, so I can understand the implications. Does the
> "flushing" also happen, when the journal is full? Or is it possible to

Yes, if journal is full it is also flushed.

> begin a new journal while the old one is written out?

No, it is not possible to begin "new journal" in reiser3.

> > So you might try the patch I mentioned or you can mount with nodiratime mount
> > option to prevent updqting of directory atimes, but having atimes still to be
> > updated on regular files at the same time.
> Interesting mount option, though I think very few application actually use
> atime anyway, isn't it?

I know for sure that atime on mbox-style files used to determine which messages
are new, and which are not. (N and O status in mutt).
There may be other users as well.

Bye,
Oleg

2002-08-05 18:15:31

by Roland Kuhn

[permalink] [raw]
Subject: Re: reiserfs blocks long on getdents64() during concurrent write

Hi Oleg!

On Mon, 5 Aug 2002, Oleg Drokin wrote:

> Hello!
>
> On Mon, Aug 05, 2002 at 12:51:35PM +0200, Roland Kuhn wrote:
>
> > > Unfortunatelly dirty flag for reiserfs gets set way to often than necessary,
> > > we have some patches that should help this (from Chris Mason).
> > > You can try these for yourself too, for example from here:
> > > ftp://ftp.suse.com/pub/people/mason/patches/data-logging/02-commit_super-8-relocation.diff.gz

>From there I get 'permission denied', but I got it somewhere else (google
is great).

However, it does not apply cleanly to 2.4.19. It is already partly in, as
it seems, but there are some rejects that are not obvious to fix for me.
If this patch still makes sense, it would be great if someone with more
knowledge/experience than me could have a look...

> No, it is not possible to begin "new journal" in reiser3.
>
Is this going to change in reiser4?

> I know for sure that atime on mbox-style files used to determine which messages
> are new, and which are not. (N and O status in mutt).
> There may be other users as well.
>
Yes, this kind of stuff. But we use it "only" to buffer some 600GB of
valuable physics data before it can go to tape ;-)

Ciao,
Roland

+---------------------------+-------------------------+
| TU Muenchen | |
| Physik-Department E18 | Raum 3558 |
| James-Franck-Str. | Telefon 089/289-12592 |
| 85747 Garching | |
+---------------------------+-------------------------+

2002-08-05 18:26:41

by Oleg Drokin

[permalink] [raw]
Subject: Re: reiserfs blocks long on getdents64() during concurrent write

Hello!

On Mon, Aug 05, 2002 at 08:19:05PM +0200, Roland Kuhn wrote:
> > > > ftp://ftp.suse.com/pub/people/mason/patches/data-logging/02-commit_super-8-relocation.diff.gz
> >From there I get 'permission denied', but I got it somewhere else (google
> is great).
> However, it does not apply cleanly to 2.4.19. It is already partly in, as
> it seems, but there are some rejects that are not obvious to fix for me.
> If this patch still makes sense, it would be great if someone with more
> knowledge/experience than me could have a look...

In the same dir there is
03-data-logging-24.diff.gz
It contains this patch and more stuff, you can try it.

> > No, it is not possible to begin "new journal" in reiser3.
> Is this going to change in reiser4?

Sort of, there would be no static journal.

Bye,
Oleg

2002-08-05 18:50:16

by Chris Mason

[permalink] [raw]
Subject: Re: reiserfs blocks long on getdents64() during concurrent write

On Mon, 2002-08-05 at 14:30, Oleg Drokin wrote:
> Hello!
>
> On Mon, Aug 05, 2002 at 08:19:05PM +0200, Roland Kuhn wrote:
> > > > > ftp://ftp.suse.com/pub/people/mason/patches/data-logging/02-commit_super-8-relocation.diff.gz
> > >From there I get 'permission denied', but I got it somewhere else (google
> > is great).
> > However, it does not apply cleanly to 2.4.19. It is already partly in, as
> > it seems, but there are some rejects that are not obvious to fix for me.
> > If this patch still makes sense, it would be great if someone with more
> > knowledge/experience than me could have a look...

The stack traces you sent earlier show a few procs stuck waiting for the
transaction to begin, but they don't show which process is currently in
a transaction (this is who they are waiting on).

Oleg is right, they are probably waiting on kupdated, since the FS might
get marked dirty faster than it can clear it.

Another possibility is ctime/mtime updates during write.

So, on ftp.suse.com/pub/people/mason/patches/data-logging

Apply:
01-relocation-4.diff
02-commit_super-8.diff # this is the one you want, but it depends on 01.

And try again. If that doesn't do it, try 04-write_times.diff (which
doesn't depend on anything).

Or, send a few fully decoded sysrq-t outputs during the run, so we can
see what all the procs are up to.

-chris


2002-08-05 19:08:24

by Roland Kuhn

[permalink] [raw]
Subject: Re: reiserfs blocks long on getdents64() during concurrent write

Hi!

On Mon, 5 Aug 2002, Oleg Drokin wrote:

> Hello!
>
> On Mon, Aug 05, 2002 at 08:19:05PM +0200, Roland Kuhn wrote:
> > > > > ftp://ftp.suse.com/pub/people/mason/patches/data-logging/02-commit_super-8-relocation.diff.gz
> > >From there I get 'permission denied', but I got it somewhere else (google
> > is great).
> > However, it does not apply cleanly to 2.4.19. It is already partly in, as
> > it seems, but there are some rejects that are not obvious to fix for me.
> > If this patch still makes sense, it would be great if someone with more
> > knowledge/experience than me could have a look...
>
> In the same dir there is
> 03-data-logging-24.diff.gz
> It contains this patch and more stuff, you can try it.
>
The same comment applies: I get 26 rejected hunks, most in journal.c. For
some it could be a whitespace problem, because I could not immediately see
the reason. Are some of these things already in 2.4.19?

Ciao,
Roland

+---------------------------+-------------------------+
| TU Muenchen | |
| Physik-Department E18 | Raum 3558 |
| James-Franck-Str. | Telefon 089/289-12592 |
| 85747 Garching | |
+---------------------------+-------------------------+

2002-08-05 19:44:58

by Roland Kuhn

[permalink] [raw]
Subject: Re: reiserfs blocks long on getdents64() during concurrent write

Hi!

sorry, I read this mail after Oleg's, so disregard my previous post.

On 5 Aug 2002, Chris Mason wrote:

> On Mon, 2002-08-05 at 14:30, Oleg Drokin wrote:
> > Hello!
> >
> > On Mon, Aug 05, 2002 at 08:19:05PM +0200, Roland Kuhn wrote:
> > > > > > ftp://ftp.suse.com/pub/people/mason/patches/data-logging/02-commit_super-8-relocation.diff.gz
> > > >From there I get 'permission denied', but I got it somewhere else (google
> > > is great).
> > > However, it does not apply cleanly to 2.4.19. It is already partly in, as
> > > it seems, but there are some rejects that are not obvious to fix for me.
> > > If this patch still makes sense, it would be great if someone with more
> > > knowledge/experience than me could have a look...
>
> The stack traces you sent earlier show a few procs stuck waiting for the
> transaction to begin, but they don't show which process is currently in
> a transaction (this is who they are waiting on).
>
The ShowTasks output is attached (it's a RedHat patched klogd which
resolves symbols on the fly). To me it looks like kupdated is indeed
inside do_journal_end().

> Oleg is right, they are probably waiting on kupdated, since the FS might
> get marked dirty faster than it can clear it.
>
> Another possibility is ctime/mtime updates during write.
>
In our application there are always at least one writer and one reader
active in one directory of the partition, both with data rates up to
10MB/s. The reader uses a block size of 128kB, but I don't know about the
writer.

> So, on ftp.suse.com/pub/people/mason/patches/data-logging
>
> Apply:
> 01-relocation-4.diff
> 02-commit_super-8.diff # this is the one you want, but it depends on 01.
>
Okay, will try.

> And try again. If that doesn't do it, try 04-write_times.diff (which
> doesn't depend on anything).
>
Is there a documentation about what this patch does as a whole?

Ciao,
Roland

+---------------------------+-------------------------+
| TU Muenchen | |
| Physik-Department E18 | Raum 3558 |
| James-Franck-Str. | Telefon 089/289-12592 |
| 85747 Garching | |
+---------------------------+-------------------------+


Attachments:
tasks.gz (0.00 B)

2002-08-05 19:59:06

by Roland Kuhn

[permalink] [raw]
Subject: Re: reiserfs blocks long on getdents64() during concurrent write

Hi!

On Mon, 5 Aug 2002, Roland Kuhn wrote:

> > So, on ftp.suse.com/pub/people/mason/patches/data-logging
> >
> > Apply:
> > 01-relocation-4.diff
> > 02-commit_super-8.diff # this is the one you want, but it depends on 01.
> >
> Okay, will try.
>
> > And try again. If that doesn't do it, try 04-write_times.diff (which
> > doesn't depend on anything).
> >
> Is there a documentation about what this patch does as a whole?
>
Sorry, stupid question for the 04 one. What my brain wanted to say: The
patches 01 and 02 seem to aim at dirtying the super block less often. If
there is serious writing activity, will this lead to fewer but longer
commits? The problem with our current (kinda stupid) software is that
lower write() latency is more important than a few percent more
throughput.

Ciao,
Roland

+---------------------------+-------------------------+
| TU Muenchen | |
| Physik-Department E18 | Raum 3558 |
| James-Franck-Str. | Telefon 089/289-12592 |
| 85747 Garching | |
+---------------------------+-------------------------+

2002-08-05 22:10:02

by Chris Mason

[permalink] [raw]
Subject: Re: reiserfs blocks long on getdents64() during concurrent write

On Mon, 2002-08-05 at 16:02, Roland Kuhn wrote:
> Hi!
>
> On Mon, 5 Aug 2002, Roland Kuhn wrote:
>
> > > So, on ftp.suse.com/pub/people/mason/patches/data-logging
> > >
> > > Apply:
> > > 01-relocation-4.diff
> > > 02-commit_super-8.diff # this is the one you want, but it depends on 01.
> > >
> > Okay, will try.
> >
> > > And try again. If that doesn't do it, try 04-write_times.diff (which
> > > doesn't depend on anything).
> > >
> > Is there a documentation about what this patch does as a whole?
> >
> Sorry, stupid question for the 04 one. What my brain wanted to say: The
> patches 01 and 02 seem to aim at dirtying the super block less often. If
> there is serious writing activity, will this lead to fewer but longer
> commits? The problem with our current (kinda stupid) software is that
> lower write() latency is more important than a few percent more
> throughput.

01-relocation-4 deals with allowing reiserfs to use an external logging
device. It isn't related to your problem, but 02-commit_super-8 is
diffed against it.

02-commit_super-8 does two things. First it changes sync_supers() so
that it won't loop on a single filesystem while it's super is dirty.
Before, if kupdate triggered a write_super call, and another FS writer
redirtied the super after write_super cleared it (but before it
returned), write_super gets called a second time. Since a commit was
done for each write_super call, that gets expensive quickly.

Second, the patch adds a commit_super call, and changes sync() to use
that instead of write_super. This allows the FS to skip the commit when
write_super is called.

This does lead to fewer commits and longer running transactions, but
does not increase the amount of time it takes the write() call to
complete. It does increase the time between when you make a metadata
change and when that change actually goes to the disk.

-chris


2002-08-05 22:43:16

by Roland Kuhn

[permalink] [raw]
Subject: Re: reiserfs blocks long on getdents64() during concurrent write

On 5 Aug 2002, Chris Mason wrote:

> 01-relocation-4 deals with allowing reiserfs to use an external logging
> device. It isn't related to your problem, but 02-commit_super-8 is
> diffed against it.
>
> 02-commit_super-8 does two things. First it changes sync_supers() so
> that it won't loop on a single filesystem while it's super is dirty.
> Before, if kupdate triggered a write_super call, and another FS writer
> redirtied the super after write_super cleared it (but before it
> returned), write_super gets called a second time. Since a commit was
> done for each write_super call, that gets expensive quickly.
>
> Second, the patch adds a commit_super call, and changes sync() to use
> that instead of write_super. This allows the FS to skip the commit when
> write_super is called.
>
> This does lead to fewer commits and longer running transactions, but
> does not increase the amount of time it takes the write() call to
> complete. It does increase the time between when you make a metadata
> change and when that change actually goes to the disk.
>
Ahh, thanks! This sounds like a good idea to me, hopefully your patch will
be accepted despite the fact that Alan is busy doing other things ;-)

Coming back to the issue: applying these patches increased the throughput
by about 20% :-) Now it takes about 100sec instead of 120sec to write a
2GB file. Tomorrow I will try it without the write_times part, to see how
much that does.

But more important: the hiccups are more seldom and sometimes shorter than
before. With plain 2.4.19 I would hit it about twice per minute (I have
not measured it), now it happens only after two minutes when writing 1M
chunks at 20MB/s. The longest seen so far was also about 4 seconds,
though.

Ciao,
Roland

+---------------------------+-------------------------+
| TU Muenchen | |
| Physik-Department E18 | Raum 3558 |
| James-Franck-Str. | Telefon 089/289-12592 |
| 85747 Garching | |
+---------------------------+-------------------------+

2002-08-06 00:43:12

by Chris Mason

[permalink] [raw]
Subject: Re: reiserfs blocks long on getdents64() during concurrent write

On Mon, 2002-08-05 at 18:46, Roland Kuhn wrote:
> >
> Ahh, thanks! This sounds like a good idea to me, hopefully your patch will
> be accepted despite the fact that Alan is busy doing other things ;-)
>
> Coming back to the issue: applying these patches increased the throughput
> by about 20% :-) Now it takes about 100sec instead of 120sec to write a
> 2GB file. Tomorrow I will try it without the write_times part, to see how
> much that does.

The write_times patch fixes a lot of latency problems, but all 5 of my
patches kind of build on each other to solve problems in different
workloads.

>
> But more important: the hiccups are more seldom and sometimes shorter than
> before. With plain 2.4.19 I would hit it about twice per minute (I have
> not measured it), now it happens only after two minutes when writing 1M
> chunks at 20MB/s. The longest seen so far was also about 4 seconds,
> though.

This is harder to guess at ;-) But I'll try 2 things I know I've fixed.

#1 I've made the metadata writeback code much more efficient, especially
for smaller transactions. A dd to a large file won't generate lots of
log traffic, writing 700M only changes about 160 blocks in 30 seconds.
Anyway, 03-data-logging-24 might make a big difference.

#2 During an ls, the current code ends up reading the directory more or
less one block at a time. Try 05-search_reada-4.diff, which will read
more tree nodes at once.

If that still doesn't work, I'm porting forward some reiserfs
low-latency work that andrew morton did, we can give it a try.

-chris






2002-08-06 09:58:18

by Roland Kuhn

[permalink] [raw]
Subject: Re: reiserfs blocks long on getdents64() during concurrent write

On 5 Aug 2002, Chris Mason wrote:

> On Mon, 2002-08-05 at 18:46, Roland Kuhn wrote:
> >
> > But more important: the hiccups are more seldom and sometimes shorter than
> > before. With plain 2.4.19 I would hit it about twice per minute (I have
> > not measured it), now it happens only after two minutes when writing 1M
> > chunks at 20MB/s. The longest seen so far was also about 4 seconds,
> > though.
>
> This is harder to guess at ;-) But I'll try 2 things I know I've fixed.
>
> #1 I've made the metadata writeback code much more efficient, especially
> for smaller transactions. A dd to a large file won't generate lots of
> log traffic, writing 700M only changes about 160 blocks in 30 seconds.
> Anyway, 03-data-logging-24 might make a big difference.
>
It is hard to get precise numbers, but I have the feeling that with atime
enabled the ls-hiccups have at least not decreased, and there has been a
rather long one of about 13 seconds, frequency is about one per minute,
usually taking 2-4 seconds.

With noatime this of course doesn't happen, but with the patch the file
size does not increase smoothly any more, it sometimes stays the same for
some ten seconds and then jumps a few hundred MBs. However, the throughput
is roughly the same as without the patch (compatible within the errors of
about 10%).

> #2 During an ls, the current code ends up reading the directory more or
> less one block at a time. Try 05-search_reada-4.diff, which will read
> more tree nodes at once.
>
Unfortunately I'm ordered to do something now, so I won't have the time to
check the performance of this one during the next few hours. The
combination of 01+02+04 seemed more stable (concerning sustained
throughput), so I will test it on all 12 machines and tell you what
happens.

Thanks very much for your help! I appreciate that!

Ciao,
Roland

+---------------------------+-------------------------+
| TU Muenchen | |
| Physik-Department E18 | Raum 3558 |
| James-Franck-Str. | Telefon 089/289-12592 |
| 85747 Garching | |
+---------------------------+-------------------------+