2013-07-10 23:13:04

by Paul Taysom

[permalink] [raw]
Subject: [PATCH] fs: sync: fixed performance regression

The following commit introduced a 10x regression for
syncing inodes in ext4 with relatime enabled where just
the atime had been modified.

commit 4ea425b63a3dfeb7707fc7cc7161c11a51e871ed
Author: Jan Kara <[email protected]>
Date: Tue Jul 3 16:45:34 2012 +0200
vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes

See also: http://www.kernelhub.org/?msg=93100&p=2

Fixed by putting back in the call to writeback_inodes_sb.

I'll attach the test in a reply to this e-mail.

The test starts by creating 512 files, syncing, reading one byte
from each of those files, syncing, and then deleting each file
and syncing. The time to do each sync is printed. The process
is then repeated for 1024 files and then the next power of
two up to 262144 files.

Note, when running the test, the slow down doesn't always happen
but most of the tests will show a slow down.

In response to crbug.com/240422

Signed-off-by: Paul Taysom <[email protected]>
---
fs/sync.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/fs/sync.c b/fs/sync.c
index 905f3f6..55c3316 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -73,6 +73,12 @@ static void sync_inodes_one_sb(struct super_block *sb, void *arg)
sync_inodes_sb(sb);
}

+static void writeback_inodes_one_sb(struct super_block *sb, void *arg)
+{
+ if (!(sb->s_flags & MS_RDONLY))
+ writeback_inodes_sb(sb, WB_REASON_SYNC);
+}
+
static void sync_fs_one_sb(struct super_block *sb, void *arg)
{
if (!(sb->s_flags & MS_RDONLY) && sb->s_op->sync_fs)
@@ -104,6 +110,7 @@ SYSCALL_DEFINE0(sync)
int nowait = 0, wait = 1;

wakeup_flusher_threads(0, WB_REASON_SYNC);
+ iterate_supers(writeback_inodes_one_sb, NULL);
iterate_supers(sync_inodes_one_sb, NULL);
iterate_supers(sync_fs_one_sb, &nowait);
iterate_supers(sync_fs_one_sb, &wait);
--
1.8.3


2013-07-10 23:45:46

by Paul Taysom

[permalink] [raw]
Subject: Re: [PATCH] fs: sync: fixed performance regression

On Wed, Jul 10, 2013 at 4:12 PM, Paul Taysom <[email protected]> wrote:
> The following commit introduced a 10x regression for
> syncing inodes in ext4 with relatime enabled where just
> the atime had been modified.
>
> commit 4ea425b63a3dfeb7707fc7cc7161c11a51e871ed
> Author: Jan Kara <[email protected]>
> Date: Tue Jul 3 16:45:34 2012 +0200
> vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
>
> See also: http://www.kernelhub.org/?msg=93100&p=2
>
> Fixed by putting back in the call to writeback_inodes_sb.
>
> I'll attach the test in a reply to this e-mail.
>
> The test starts by creating 512 files, syncing, reading one byte
> from each of those files, syncing, and then deleting each file
> and syncing. The time to do each sync is printed. The process
> is then repeated for 1024 files and then the next power of
> two up to 262144 files.
>
> Note, when running the test, the slow down doesn't always happen
> but most of the tests will show a slow down.
>
> In response to crbug.com/240422
>
> Signed-off-by: Paul Taysom <[email protected]>
> ---
> fs/sync.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/fs/sync.c b/fs/sync.c
> index 905f3f6..55c3316 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -73,6 +73,12 @@ static void sync_inodes_one_sb(struct super_block *sb, void *arg)
> sync_inodes_sb(sb);
> }
>
> +static void writeback_inodes_one_sb(struct super_block *sb, void *arg)
> +{
> + if (!(sb->s_flags & MS_RDONLY))
> + writeback_inodes_sb(sb, WB_REASON_SYNC);
> +}
> +
> static void sync_fs_one_sb(struct super_block *sb, void *arg)
> {
> if (!(sb->s_flags & MS_RDONLY) && sb->s_op->sync_fs)
> @@ -104,6 +110,7 @@ SYSCALL_DEFINE0(sync)
> int nowait = 0, wait = 1;
>
> wakeup_flusher_threads(0, WB_REASON_SYNC);
> + iterate_supers(writeback_inodes_one_sb, NULL);
> iterate_supers(sync_inodes_one_sb, NULL);
> iterate_supers(sync_fs_one_sb, &nowait);
> iterate_supers(sync_fs_one_sb, &wait);
> --
> 1.8.3
>
Try this again but in plaintext mode. Attaching test results and test
program. Tests were run on a Pixel x86 with SSD storage.


Attachments:
syncperf.c (3.03 kB)
syncperf.results (5.90 kB)
Download all attachments

2013-07-10 23:56:52

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] fs: sync: fixed performance regression

On Wed, Jul 10, 2013 at 04:12:36PM -0700, Paul Taysom wrote:

> Note, when running the test, the slow down doesn't always happen
> but most of the tests will show a slow down.
>
> In response to crbug.com/240422

Just in case this ends up as the actual commit msg, that url
seems to be the wrong bug.

Dave.

2013-07-11 00:42:43

by Paul Taysom

[permalink] [raw]
Subject: Re: [PATCH] fs: sync: fixed performance regression

On Wed, Jul 10, 2013 at 4:56 PM, Dave Jones <[email protected]> wrote:
> On Wed, Jul 10, 2013 at 04:12:36PM -0700, Paul Taysom wrote:
>
> > Note, when running the test, the slow down doesn't always happen
> > but most of the tests will show a slow down.
> >
> > In response to crbug.com/240422
>
> Just in case this ends up as the actual commit msg, that url
> seems to be the wrong bug.
>
> Dave.

Copied the wrong number crbug.con/218910
Thanks.

2013-07-11 00:45:38

by Paul Taysom

[permalink] [raw]
Subject: Re: [PATCH] fs: sync: fixed performance regression

Third time is the charm crbug.com/240411

2013-07-11 02:00:18

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] fs: sync: fixed performance regression

On Wed, Jul 10, 2013 at 04:12:36PM -0700, Paul Taysom wrote:
> The following commit introduced a 10x regression for
> syncing inodes in ext4 with relatime enabled where just
> the atime had been modified.
>
> commit 4ea425b63a3dfeb7707fc7cc7161c11a51e871ed
> Author: Jan Kara <[email protected]>
> Date: Tue Jul 3 16:45:34 2012 +0200
> vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
>
> See also: http://www.kernelhub.org/?msg=93100&p=2
>
> Fixed by putting back in the call to writeback_inodes_sb.
>
> I'll attach the test in a reply to this e-mail.
>
> The test starts by creating 512 files, syncing, reading one byte
> from each of those files, syncing, and then deleting each file
> and syncing. The time to do each sync is printed. The process
> is then repeated for 1024 files and then the next power of
> two up to 262144 files.
>
> Note, when running the test, the slow down doesn't always happen
> but most of the tests will show a slow down.

Can you please check if the patch attached to this mail:

http://marc.info/?l=linux-kernel&m=137276874025813&w=2

Fixes your problem?

Cheers,

Dave.
--
Dave Chinner
[email protected]

2013-07-11 10:53:50

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] fs: sync: fixed performance regression

On Wed 10-07-13 16:12:36, Paul Taysom wrote:
> The following commit introduced a 10x regression for
> syncing inodes in ext4 with relatime enabled where just
> the atime had been modified.
>
> commit 4ea425b63a3dfeb7707fc7cc7161c11a51e871ed
> Author: Jan Kara <[email protected]>
> Date: Tue Jul 3 16:45:34 2012 +0200
> vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
>
> See also: http://www.kernelhub.org/?msg=93100&p=2
>
> Fixed by putting back in the call to writeback_inodes_sb.
>
> I'll attach the test in a reply to this e-mail.
>
> The test starts by creating 512 files, syncing, reading one byte
> from each of those files, syncing, and then deleting each file
> and syncing. The time to do each sync is printed. The process
> is then repeated for 1024 files and then the next power of
> two up to 262144 files.
>
> Note, when running the test, the slow down doesn't always happen
> but most of the tests will show a slow down.
>
> In response to crbug.com/240422
>
> Signed-off-by: Paul Taysom <[email protected]>
Thanks for report. Rather than blindly reverting the change, I'd like to
understand why you see so huge regression. As the changelog in the patch
says, flusher thread should be doing async writeback equivalent to the
removed one because it gets woken via wakeup_flusher_threads(). But my
guess is that for some reason we end up doing all the writeback from
sync_inodes_one_sb(). I'll try to reproduce your results and investigate...

Honza

> ---
> fs/sync.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/fs/sync.c b/fs/sync.c
> index 905f3f6..55c3316 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -73,6 +73,12 @@ static void sync_inodes_one_sb(struct super_block *sb, void *arg)
> sync_inodes_sb(sb);
> }
>
> +static void writeback_inodes_one_sb(struct super_block *sb, void *arg)
> +{
> + if (!(sb->s_flags & MS_RDONLY))
> + writeback_inodes_sb(sb, WB_REASON_SYNC);
> +}
> +
> static void sync_fs_one_sb(struct super_block *sb, void *arg)
> {
> if (!(sb->s_flags & MS_RDONLY) && sb->s_op->sync_fs)
> @@ -104,6 +110,7 @@ SYSCALL_DEFINE0(sync)
> int nowait = 0, wait = 1;
>
> wakeup_flusher_threads(0, WB_REASON_SYNC);
> + iterate_supers(writeback_inodes_one_sb, NULL);
> iterate_supers(sync_inodes_one_sb, NULL);
> iterate_supers(sync_fs_one_sb, &nowait);
> iterate_supers(sync_fs_one_sb, &wait);
> --
> 1.8.3
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-07-11 11:58:37

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] fs: sync: fixed performance regression

On Thu 11-07-13 12:53:46, Jan Kara wrote:
> On Wed 10-07-13 16:12:36, Paul Taysom wrote:
> > The following commit introduced a 10x regression for
> > syncing inodes in ext4 with relatime enabled where just
> > the atime had been modified.
> >
> > commit 4ea425b63a3dfeb7707fc7cc7161c11a51e871ed
> > Author: Jan Kara <[email protected]>
> > Date: Tue Jul 3 16:45:34 2012 +0200
> > vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
> >
> > See also: http://www.kernelhub.org/?msg=93100&p=2
> >
> > Fixed by putting back in the call to writeback_inodes_sb.
> >
> > I'll attach the test in a reply to this e-mail.
> >
> > The test starts by creating 512 files, syncing, reading one byte
> > from each of those files, syncing, and then deleting each file
> > and syncing. The time to do each sync is printed. The process
> > is then repeated for 1024 files and then the next power of
> > two up to 262144 files.
> >
> > Note, when running the test, the slow down doesn't always happen
> > but most of the tests will show a slow down.
> >
> > In response to crbug.com/240422
> >
> > Signed-off-by: Paul Taysom <[email protected]>
> Thanks for report. Rather than blindly reverting the change, I'd like to
> understand why you see so huge regression. As the changelog in the patch
> says, flusher thread should be doing async writeback equivalent to the
> removed one because it gets woken via wakeup_flusher_threads(). But my
> guess is that for some reason we end up doing all the writeback from
> sync_inodes_one_sb(). I'll try to reproduce your results and investigate...
Hum, so it must be something timing sensitive. I wasn't able to reproduce
the issue on my test machine in 4 runs of your test program. I was able to
reproduce it on my laptop on every second run of the test program but once
I've enabled some tracepoints, the issue disappeared and I didn't see it in
about 10 runs.

That being said I think that reverting my patch is just papering over the
problem. We will do the async pass over inodes twice instead of once
and thus the timing changes enough that you aren't able to observe the
problem.

I'm looking into this more...

Honza
> > ---
> > fs/sync.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/fs/sync.c b/fs/sync.c
> > index 905f3f6..55c3316 100644
> > --- a/fs/sync.c
> > +++ b/fs/sync.c
> > @@ -73,6 +73,12 @@ static void sync_inodes_one_sb(struct super_block *sb, void *arg)
> > sync_inodes_sb(sb);
> > }
> >
> > +static void writeback_inodes_one_sb(struct super_block *sb, void *arg)
> > +{
> > + if (!(sb->s_flags & MS_RDONLY))
> > + writeback_inodes_sb(sb, WB_REASON_SYNC);
> > +}
> > +
> > static void sync_fs_one_sb(struct super_block *sb, void *arg)
> > {
> > if (!(sb->s_flags & MS_RDONLY) && sb->s_op->sync_fs)
> > @@ -104,6 +110,7 @@ SYSCALL_DEFINE0(sync)
> > int nowait = 0, wait = 1;
> >
> > wakeup_flusher_threads(0, WB_REASON_SYNC);
> > + iterate_supers(writeback_inodes_one_sb, NULL);
> > iterate_supers(sync_inodes_one_sb, NULL);
> > iterate_supers(sync_fs_one_sb, &nowait);
> > iterate_supers(sync_fs_one_sb, &wait);
> > --
> > 1.8.3
> >
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-07-11 21:42:26

by Paul Taysom

[permalink] [raw]
Subject: Re: [PATCH] fs: sync: fixed performance regression

On Thu, Jul 11, 2013 at 4:58 AM, Jan Kara <[email protected]> wrote:
> On Thu 11-07-13 12:53:46, Jan Kara wrote:
>> On Wed 10-07-13 16:12:36, Paul Taysom wrote:
>> > The following commit introduced a 10x regression for
>> > syncing inodes in ext4 with relatime enabled where just
>> > the atime had been modified.
>> >
>> > commit 4ea425b63a3dfeb7707fc7cc7161c11a51e871ed
>> > Author: Jan Kara <[email protected]>
>> > Date: Tue Jul 3 16:45:34 2012 +0200
>> > vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
>> >
>> > See also: http://www.kernelhub.org/?msg=93100&p=2
>> >
>> > Fixed by putting back in the call to writeback_inodes_sb.
>> >
>> > I'll attach the test in a reply to this e-mail.
>> >
>> > The test starts by creating 512 files, syncing, reading one byte
>> > from each of those files, syncing, and then deleting each file
>> > and syncing. The time to do each sync is printed. The process
>> > is then repeated for 1024 files and then the next power of
>> > two up to 262144 files.
>> >
>> > Note, when running the test, the slow down doesn't always happen
>> > but most of the tests will show a slow down.
>> >
>> > In response to crbug.com/240422
>> >
>> > Signed-off-by: Paul Taysom <[email protected]>
>> Thanks for report. Rather than blindly reverting the change, I'd like to
>> understand why you see so huge regression. As the changelog in the patch
>> says, flusher thread should be doing async writeback equivalent to the
>> removed one because it gets woken via wakeup_flusher_threads(). But my
>> guess is that for some reason we end up doing all the writeback from
>> sync_inodes_one_sb(). I'll try to reproduce your results and investigate...
> Hum, so it must be something timing sensitive. I wasn't able to reproduce
> the issue on my test machine in 4 runs of your test program. I was able to
> reproduce it on my laptop on every second run of the test program but once
> I've enabled some tracepoints, the issue disappeared and I didn't see it in
> about 10 runs.
>
> That being said I think that reverting my patch is just papering over the
> problem. We will do the async pass over inodes twice instead of once
> and thus the timing changes enough that you aren't able to observe the
> problem.
>
> I'm looking into this more...
>
> Honza
>> > ---
>> > fs/sync.c | 7 +++++++
>> > 1 file changed, 7 insertions(+)
>> >
>> > diff --git a/fs/sync.c b/fs/sync.c
>> > index 905f3f6..55c3316 100644
>> > --- a/fs/sync.c
>> > +++ b/fs/sync.c
>> > @@ -73,6 +73,12 @@ static void sync_inodes_one_sb(struct super_block *sb, void *arg)
>> > sync_inodes_sb(sb);
>> > }
>> >
>> > +static void writeback_inodes_one_sb(struct super_block *sb, void *arg)
>> > +{
>> > + if (!(sb->s_flags & MS_RDONLY))
>> > + writeback_inodes_sb(sb, WB_REASON_SYNC);
>> > +}
>> > +
>> > static void sync_fs_one_sb(struct super_block *sb, void *arg)
>> > {
>> > if (!(sb->s_flags & MS_RDONLY) && sb->s_op->sync_fs)
>> > @@ -104,6 +110,7 @@ SYSCALL_DEFINE0(sync)
>> > int nowait = 0, wait = 1;
>> >
>> > wakeup_flusher_threads(0, WB_REASON_SYNC);
>> > + iterate_supers(writeback_inodes_one_sb, NULL);
>> > iterate_supers(sync_inodes_one_sb, NULL);
>> > iterate_supers(sync_fs_one_sb, &nowait);
>> > iterate_supers(sync_fs_one_sb, &wait);
>> > --
>> > 1.8.3
>> >
>> --
>> Jan Kara <[email protected]>
>> SUSE Labs, CR
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

I've tried Dave Chinner's patch but it doesn't seem to help.

Looking at the references to WB_SYNC_NONE flag I found the interesting
comment in fs/ext4/inodes.c write_cache_pages_da:
...
lock_page(page);

/*
* If the page is no longer dirty, or its
* mapping no longer corresponds to inode we
* are writing (which means it has been
* truncated or invalidated), or the page is
* already under writeback and we are not
* doing a data integrity writeback, skip the page
*/
if (!PageDirty(page) ||
(PageWriteback(page) &&
(wbc->sync_mode == WB_SYNC_NONE)) ||
unlikely(page->mapping != mapping)) {
unlock_page(page);
continue;
}

wait_on_page_writeback(page);
BUG_ON(PageWriteback(page));
...

I'm wondering if one inode is getting written out then the next inode
in the same page waits for the writeback to finish.

writeback_inodes_sb_nt sets the sync_mode to WB_SYNC_NONE
sync_inodes_sb set the sync_mode to WB_SYNC_ALL.

2013-07-12 15:43:18

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] fs: sync: fixed performance regression

On Thu 11-07-13 13:58:32, Jan Kara wrote:
> On Thu 11-07-13 12:53:46, Jan Kara wrote:
> > On Wed 10-07-13 16:12:36, Paul Taysom wrote:
> > > The following commit introduced a 10x regression for
> > > syncing inodes in ext4 with relatime enabled where just
> > > the atime had been modified.
> > >
> > > commit 4ea425b63a3dfeb7707fc7cc7161c11a51e871ed
> > > Author: Jan Kara <[email protected]>
> > > Date: Tue Jul 3 16:45:34 2012 +0200
> > > vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
> > >
> > > See also: http://www.kernelhub.org/?msg=93100&p=2
> > >
> > > Fixed by putting back in the call to writeback_inodes_sb.
> > >
> > > I'll attach the test in a reply to this e-mail.
> > >
> > > The test starts by creating 512 files, syncing, reading one byte
> > > from each of those files, syncing, and then deleting each file
> > > and syncing. The time to do each sync is printed. The process
> > > is then repeated for 1024 files and then the next power of
> > > two up to 262144 files.
> > >
> > > Note, when running the test, the slow down doesn't always happen
> > > but most of the tests will show a slow down.
> > >
> > > In response to crbug.com/240422
> > >
> > > Signed-off-by: Paul Taysom <[email protected]>
> > Thanks for report. Rather than blindly reverting the change, I'd like to
> > understand why you see so huge regression. As the changelog in the patch
> > says, flusher thread should be doing async writeback equivalent to the
> > removed one because it gets woken via wakeup_flusher_threads(). But my
> > guess is that for some reason we end up doing all the writeback from
> > sync_inodes_one_sb(). I'll try to reproduce your results and investigate...
> Hum, so it must be something timing sensitive. I wasn't able to reproduce
> the issue on my test machine in 4 runs of your test program. I was able to
> reproduce it on my laptop on every second run of the test program but once
> I've enabled some tracepoints, the issue disappeared and I didn't see it in
> about 10 runs.
>
> That being said I think that reverting my patch is just papering over the
> problem. We will do the async pass over inodes twice instead of once
> and thus the timing changes enough that you aren't able to observe the
> problem.
>
> I'm looking into this more...
So I finally understood what's going on. If the system has no dirty pages
at all wakeup_flusher_threads() will submit work with nr_pages == 0. So
wb_writeback() will bail out immediately without doing anything and all the
writeback is left for WB_SYNC_ALL pass of sync(1) which is slow. Attached
patch fixes the problem for me.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR


Attachments:
(No filename) (2.67 kB)
0001-writeback-Fix-occasional-slow-sync-1.patch (1.33 kB)
Download all attachments

2013-07-12 16:59:02

by Paul Taysom

[permalink] [raw]
Subject: Re: [PATCH] fs: sync: fixed performance regression

`On Fri, Jul 12, 2013 at 8:43 AM, Jan Kara <[email protected]> wrote:
> On Thu 11-07-13 13:58:32, Jan Kara wrote:
>> On Thu 11-07-13 12:53:46, Jan Kara wrote:
>> > On Wed 10-07-13 16:12:36, Paul Taysom wrote:
>> > > The following commit introduced a 10x regression for
>> > > syncing inodes in ext4 with relatime enabled where just
>> > > the atime had been modified.
>> > >
>> > > commit 4ea425b63a3dfeb7707fc7cc7161c11a51e871ed
>> > > Author: Jan Kara <[email protected]>
>> > > Date: Tue Jul 3 16:45:34 2012 +0200
>> > > vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
>> > >
>> > > See also: http://www.kernelhub.org/?msg=93100&p=2
>> > >
>> > > Fixed by putting back in the call to writeback_inodes_sb.
>> > >
>> > > I'll attach the test in a reply to this e-mail.
>> > >
>> > > The test starts by creating 512 files, syncing, reading one byte
>> > > from each of those files, syncing, and then deleting each file
>> > > and syncing. The time to do each sync is printed. The process
>> > > is then repeated for 1024 files and then the next power of
>> > > two up to 262144 files.
>> > >
>> > > Note, when running the test, the slow down doesn't always happen
>> > > but most of the tests will show a slow down.
>> > >
>> > > In response to crbug.com/240422
>> > >
>> > > Signed-off-by: Paul Taysom <[email protected]>
>> > Thanks for report. Rather than blindly reverting the change, I'd like to
>> > understand why you see so huge regression. As the changelog in the patch
>> > says, flusher thread should be doing async writeback equivalent to the
>> > removed one because it gets woken via wakeup_flusher_threads(). But my
>> > guess is that for some reason we end up doing all the writeback from
>> > sync_inodes_one_sb(). I'll try to reproduce your results and investigate...
>> Hum, so it must be something timing sensitive. I wasn't able to reproduce
>> the issue on my test machine in 4 runs of your test program. I was able to
>> reproduce it on my laptop on every second run of the test program but once
>> I've enabled some tracepoints, the issue disappeared and I didn't see it in
>> about 10 runs.
>>
>> That being said I think that reverting my patch is just papering over the
>> problem. We will do the async pass over inodes twice instead of once
>> and thus the timing changes enough that you aren't able to observe the
>> problem.
>>
>> I'm looking into this more...
> So I finally understood what's going on. If the system has no dirty pages
> at all wakeup_flusher_threads() will submit work with nr_pages == 0. So
> wb_writeback() will bail out immediately without doing anything and all the
> writeback is left for WB_SYNC_ALL pass of sync(1) which is slow. Attached
> patch fixes the problem for me.
>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

Jan,
Your fix is a clear win! Not only did it fix the sync after read
problem but it made the sync after create faster too.
Thanks,
Paul

2013-07-15 09:46:36

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] fs: sync: fixed performance regression

On Fri 12-07-13 09:59:00, Paul Taysom wrote:
> `On Fri, Jul 12, 2013 at 8:43 AM, Jan Kara <[email protected]> wrote:
> > On Thu 11-07-13 13:58:32, Jan Kara wrote:
> >> On Thu 11-07-13 12:53:46, Jan Kara wrote:
> >> > On Wed 10-07-13 16:12:36, Paul Taysom wrote:
> >> > > The following commit introduced a 10x regression for
> >> > > syncing inodes in ext4 with relatime enabled where just
> >> > > the atime had been modified.
> >> > >
> >> > > commit 4ea425b63a3dfeb7707fc7cc7161c11a51e871ed
> >> > > Author: Jan Kara <[email protected]>
> >> > > Date: Tue Jul 3 16:45:34 2012 +0200
> >> > > vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
> >> > >
> >> > > See also: http://www.kernelhub.org/?msg=93100&p=2
> >> > >
> >> > > Fixed by putting back in the call to writeback_inodes_sb.
> >> > >
> >> > > I'll attach the test in a reply to this e-mail.
> >> > >
> >> > > The test starts by creating 512 files, syncing, reading one byte
> >> > > from each of those files, syncing, and then deleting each file
> >> > > and syncing. The time to do each sync is printed. The process
> >> > > is then repeated for 1024 files and then the next power of
> >> > > two up to 262144 files.
> >> > >
> >> > > Note, when running the test, the slow down doesn't always happen
> >> > > but most of the tests will show a slow down.
> >> > >
> >> > > In response to crbug.com/240422
> >> > >
> >> > > Signed-off-by: Paul Taysom <[email protected]>
> >> > Thanks for report. Rather than blindly reverting the change, I'd like to
> >> > understand why you see so huge regression. As the changelog in the patch
> >> > says, flusher thread should be doing async writeback equivalent to the
> >> > removed one because it gets woken via wakeup_flusher_threads(). But my
> >> > guess is that for some reason we end up doing all the writeback from
> >> > sync_inodes_one_sb(). I'll try to reproduce your results and investigate...
> >> Hum, so it must be something timing sensitive. I wasn't able to reproduce
> >> the issue on my test machine in 4 runs of your test program. I was able to
> >> reproduce it on my laptop on every second run of the test program but once
> >> I've enabled some tracepoints, the issue disappeared and I didn't see it in
> >> about 10 runs.
> >>
> >> That being said I think that reverting my patch is just papering over the
> >> problem. We will do the async pass over inodes twice instead of once
> >> and thus the timing changes enough that you aren't able to observe the
> >> problem.
> >>
> >> I'm looking into this more...
> > So I finally understood what's going on. If the system has no dirty pages
> > at all wakeup_flusher_threads() will submit work with nr_pages == 0. So
> > wb_writeback() will bail out immediately without doing anything and all the
> > writeback is left for WB_SYNC_ALL pass of sync(1) which is slow. Attached
> > patch fixes the problem for me.
> >
> > Honza
> > --
> > Jan Kara <[email protected]>
> > SUSE Labs, CR
>
> Jan,
> Your fix is a clear win! Not only did it fix the sync after read
> problem but it made the sync after create faster too.
Thanks for testing! I've sent the patch to Al for inclusion.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR