2011-04-16 14:06:03

by Fengguang Wu

[permalink] [raw]
Subject: [PATCH 00/12] IO-less dirty throttling v7

Andrew,

This revision undergoes a number of simplifications, cleanups and fixes.
Independent patches are separated out. The core patches (07, 08) now have
easier to understand changelog. Detailed rationals can be found in patch 08.

In response to the complexity complaints, an introduction document is
written explaining the rationals, algorithm and visual case studies:

http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/slides/smooth-dirty-throttling.pdf

The full patchset is accessible in

git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback.git dirty-throttling-v7

Questions, reviews and independent tests will be highly appreciated.

supporting functionalities

[PATCH 01/12] writeback: account per-bdi accumulated written pages
[PATCH 02/12] writeback: account per-bdi accumulated dirtied pages
[PATCH 03/12] writeback: bdi write bandwidth estimation
[PATCH 04/12] writeback: smoothed global/bdi dirty pages
[PATCH 05/12] writeback: smoothed dirty threshold and limit
[PATCH 06/12] writeback: enforce 1/4 gap between the dirty/background thresholds

core changes

[PATCH 07/12] writeback: base throttle bandwidth and position ratio
[PATCH 08/12] writeback: IO-less balance_dirty_pages()

tracing

[PATCH 09/12] writeback: show bdi write bandwidth in debugfs
[PATCH 10/12] writeback: trace dirty_ratelimit
[PATCH 11/12] writeback: trace balance_dirty_pages
[PATCH 12/12] writeback: trace global_dirty_state

fs/fs-writeback.c | 3
include/linux/backing-dev.h | 23
include/linux/sched.h | 8
include/linux/writeback.h | 49 +
include/trace/events/writeback.h | 179 +++++
mm/backing-dev.c | 51 +
mm/memory_hotplug.c | 3
mm/page-writeback.c | 980 +++++++++++++++++++++++------
8 files changed, 1085 insertions(+), 211 deletions(-)

Thanks,
Fengguang


2011-04-16 16:28:04

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 00/12] IO-less dirty throttling v7

On Sat, Apr 16, 2011 at 3:25 PM, Wu Fengguang <[email protected]> wrote:
> Andrew,
>
> This revision undergoes a number of simplifications, cleanups and fixes.
> Independent patches are separated out. The core patches (07, 08) now have
> easier to understand changelog. Detailed rationals can be found in patch 08.
>
> In response to the complexity complaints, an introduction document is
> written explaining the rationals, algorithm and visual case studies:
>
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/slides/smooth-dirty-throttling.pdf
>
> The full patchset is accessible in
>
> git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback.git dirty-throttling-v7
>
> Questions, reviews and independent tests will be highly appreciated.
>
> supporting functionalities
>
>        [PATCH 01/12] writeback: account per-bdi accumulated written pages
>        [PATCH 02/12] writeback: account per-bdi accumulated dirtied pages
>        [PATCH 03/12] writeback: bdi write bandwidth estimation
>        [PATCH 04/12] writeback: smoothed global/bdi dirty pages
>        [PATCH 05/12] writeback: smoothed dirty threshold and limit
>        [PATCH 06/12] writeback: enforce 1/4 gap between the dirty/background thresholds
>
> core changes
>
>        [PATCH 07/12] writeback: base throttle bandwidth and position ratio
>        [PATCH 08/12] writeback: IO-less balance_dirty_pages()
>
> tracing
>
>        [PATCH 09/12] writeback: show bdi write bandwidth in debugfs
>        [PATCH 10/12] writeback: trace dirty_ratelimit
>        [PATCH 11/12] writeback: trace balance_dirty_pages
>        [PATCH 12/12] writeback: trace global_dirty_state
>
>  fs/fs-writeback.c                |    3
>  include/linux/backing-dev.h      |   23
>  include/linux/sched.h            |    8
>  include/linux/writeback.h        |   49 +
>  include/trace/events/writeback.h |  179 +++++
>  mm/backing-dev.c                 |   51 +
>  mm/memory_hotplug.c              |    3
>  mm/page-writeback.c              |  980 +++++++++++++++++++++++------
>  8 files changed, 1085 insertions(+), 211 deletions(-)
>
> Thanks,
> Fengguang
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

I pulled your tree into linux-next (next-20110415) on an i386 Debian host.

My build breaks here:
...
MODPOST vmlinux.o
GEN .version
CHK include/generated/compile.h
UPD include/generated/compile.h
CC init/version.o
LD init/built-in.o
LD .tmp_vmlinux1
mm/built-in.o: In function `bdi_position_ratio':
page-writeback.c:(.text+0x5c83): undefined reference to `__udivdi3'
mm/built-in.o: In function `calc_period_shift.part.10':
page-writeback.c:(.text+0x6446): undefined reference to `____ilog2_NaN'
make[4]: *** [.tmp_vmlinux1] Error

BTW, which kernel-config options have to be set besides
CONFIG_BLK_DEV_THROTTLING=y?

- Sedat -


Attachments:
config_writeback-dirty-throttling-v7.txt (85.97 kB)

2011-04-17 01:45:03

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 00/12] IO-less dirty throttling v7

Hi Sedat,

On Sun, Apr 17, 2011 at 12:27:58AM +0800, Sedat Dilek wrote:

> I pulled your tree into linux-next (next-20110415) on an i386 Debian host.
>
> My build breaks here:
> ...
> MODPOST vmlinux.o
> GEN .version
> CHK include/generated/compile.h
> UPD include/generated/compile.h
> CC init/version.o
> LD init/built-in.o
> LD .tmp_vmlinux1
> mm/built-in.o: In function `bdi_position_ratio':
> page-writeback.c:(.text+0x5c83): undefined reference to `__udivdi3'

Yes it can be fixed by the attached patch.

> mm/built-in.o: In function `calc_period_shift.part.10':
> page-writeback.c:(.text+0x6446): undefined reference to `____ilog2_NaN'

I cannot reproduce this error. In the git tree, calc_period_shift() is
actually quite simple:

static int calc_period_shift(void)
{
return 2 + ilog2(default_backing_dev_info.avg_write_bandwidth);
}

where avg_write_bandwidth is of type "unsigned long".

> make[4]: *** [.tmp_vmlinux1] Error
>
> BTW, which kernel-config options have to be set besides
> CONFIG_BLK_DEV_THROTTLING=y?

No. I used your kconfig on 2.6.39-rc3 and it compiles OK for i386.

I've pushed two patches into the git tree fixing the compile errors.
Thank you for trying it out and report!

Thanks,
Fengguang


Attachments:
(No filename) (1.31 kB)
bdi_position_ratio-__udivdi3-fix (1.07 kB)
Download all attachments

2011-04-17 04:10:20

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 00/12] IO-less dirty throttling v7

On Sun, Apr 17, 2011 at 11:18:43AM +0800, Sedat Dilek wrote:
> On Sun, Apr 17, 2011 at 3:44 AM, Wu Fengguang <[email protected]> wrote:
> > Hi Sedat,
> >
> > On Sun, Apr 17, 2011 at 12:27:58AM +0800, Sedat Dilek wrote:
> >
> >> I pulled your tree into linux-next (next-20110415) on an i386 Debian host.
> >>
> >> My build breaks here:
> >> ...
> >>   MODPOST vmlinux.o
> >>   GEN     .version
> >>   CHK     include/generated/compile.h
> >>   UPD     include/generated/compile.h
> >>   CC      init/version.o
> >>   LD      init/built-in.o
> >>   LD      .tmp_vmlinux1
> >> mm/built-in.o: In function `bdi_position_ratio':
> >> page-writeback.c:(.text+0x5c83): undefined reference to `__udivdi3'
> >
> > Yes it can be fixed by the attached patch.
> >
> >> mm/built-in.o: In function `calc_period_shift.part.10':
> >> page-writeback.c:(.text+0x6446): undefined reference to `____ilog2_NaN'
> >
> > I cannot reproduce this error. In the git tree, calc_period_shift() is
> > actually quite simple:
> >
> > static int calc_period_shift(void)
> > {
> >        return 2 + ilog2(default_backing_dev_info.avg_write_bandwidth);
> > }
> >
> > where avg_write_bandwidth is of type "unsigned long".
> >
> >> make[4]: *** [.tmp_vmlinux1] Error
> >>
> >> BTW, which kernel-config options have to be set besides
> >> CONFIG_BLK_DEV_THROTTLING=y?
> >
> > No. I used your kconfig on 2.6.39-rc3 and it compiles OK for i386.
> >
> > I've pushed two patches into the git tree fixing the compile errors.
> > Thank you for trying it out and report!
> >
> > Thanks,
> > Fengguang
> >
>
> Thanks for your patch.
>
> The 1st part of the build-error is gone, but 2nd part still remains:
>
> LD .tmp_vmlinux1
> mm/built-in.o: In function `calc_period_shift.part.10':
> page-writeback.c:(.text+0x6458): undefined reference to `____ilog2_NaN'
> make[4]: *** [.tmp_vmlinux1] Error 1
>
> I have attached some disasm-ed files.

OK. I tried next-20110415 and your kconfig and still got no error.

Please revert the last commit. It's not necessary anyway.

commit 84a9890ddef487d9c6d70934c0a2addc65923bcf
Author: Wu Fengguang <[email protected]>
Date: Sat Apr 16 18:38:41 2011 -0600

writeback: scale dirty proportions period with writeout bandwidth

CC: Peter Zijlstra <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>

> Unfortunately, I don't see any new commits in your GIT tree.

Yeah I cannot see it in the web interface, too:

http://git.kernel.org/?p=linux/kernel/git/wfg/writeback.git;a=shortlog;h=refs/heads/dirty-throttling-v7

But they are in the dirty-throttling-v7 branch at kernel.org:

commit d0e30163e390d87387ec13e3b1c2168238c26793
Author: Wu Fengguang <[email protected]>
Date: Sun Apr 17 11:59:12 2011 +0800

Revert "writeback: scale dirty proportions period with writeout bandwidth"

This reverts commit 84a9890ddef487d9c6d70934c0a2addc65923bcf.

[email protected]:

LD .tmp_vmlinux1
mm/built-in.o: In function `calc_period_shift.part.10':
page-writeback.c:(.text+0x6458): undefined reference to `____ilog2_NaN'
make[4]: *** [.tmp_vmlinux1] Error 1

commit fc5c8b04119a5bcc46865e66eec3e6133ecb56e9
Author: Wu Fengguang <[email protected]>
Date: Sun Apr 17 09:22:41 2011 -0600

writeback: quick CONFIG_BLK_DEV_THROTTLING=n compile fix

Reported-by: Sedat Dilek <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>

commit c4a7e3f48dcfae71d0e3d2c55dcc381b3def1919
Author: Wu Fengguang <[email protected]>
Date: Sun Apr 17 09:04:44 2011 -0600

writeback: i386 compile fix

mm/built-in.o: In function `bdi_position_ratio':
page-writeback.c:(.text+0x5c83): undefined reference to `__udivdi3'
mm/built-in.o: In function `calc_period_shift.part.10':
page-writeback.c:(.text+0x6446): undefined reference to `____ilog2_NaN'
make[4]: *** [.tmp_vmlinux1] Error

Reported-by: Sedat Dilek <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>


Thanks,
Fengguang

2011-04-17 04:47:05

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 00/12] IO-less dirty throttling v7

On Sun, Apr 17, 2011 at 6:10 AM, Wu Fengguang <[email protected]> wrote:
> On Sun, Apr 17, 2011 at 11:18:43AM +0800, Sedat Dilek wrote:
>> On Sun, Apr 17, 2011 at 3:44 AM, Wu Fengguang <[email protected]> wrote:
>> > Hi Sedat,
>> >
>> > On Sun, Apr 17, 2011 at 12:27:58AM +0800, Sedat Dilek wrote:
>> >
>> >> I pulled your tree into linux-next (next-20110415) on an i386 Debian host.
>> >>
>> >> My build breaks here:
>> >> ...
>> >>   MODPOST vmlinux.o
>> >>   GEN     .version
>> >>   CHK     include/generated/compile.h
>> >>   UPD     include/generated/compile.h
>> >>   CC      init/version.o
>> >>   LD      init/built-in.o
>> >>   LD      .tmp_vmlinux1
>> >> mm/built-in.o: In function `bdi_position_ratio':
>> >> page-writeback.c:(.text+0x5c83): undefined reference to `__udivdi3'
>> >
>> > Yes it can be fixed by the attached patch.
>> >
>> >> mm/built-in.o: In function `calc_period_shift.part.10':
>> >> page-writeback.c:(.text+0x6446): undefined reference to `____ilog2_NaN'
>> >
>> > I cannot reproduce this error. In the git tree, calc_period_shift() is
>> > actually quite simple:
>> >
>> > static int calc_period_shift(void)
>> > {
>> >        return 2 + ilog2(default_backing_dev_info.avg_write_bandwidth);
>> > }
>> >
>> > where avg_write_bandwidth is of type "unsigned long".
>> >
>> >> make[4]: *** [.tmp_vmlinux1] Error
>> >>
>> >> BTW, which kernel-config options have to be set besides
>> >> CONFIG_BLK_DEV_THROTTLING=y?
>> >
>> > No. I used your kconfig on 2.6.39-rc3 and it compiles OK for i386.
>> >
>> > I've pushed two patches into the git tree fixing the compile errors.
>> > Thank you for trying it out and report!
>> >
>> > Thanks,
>> > Fengguang
>> >
>>
>> Thanks for your patch.
>>
>> The 1st part of the build-error is gone, but 2nd part still remains:
>>
>>   LD      .tmp_vmlinux1
>> mm/built-in.o: In function `calc_period_shift.part.10':
>> page-writeback.c:(.text+0x6458): undefined reference to `____ilog2_NaN'
>> make[4]: *** [.tmp_vmlinux1] Error 1
>>
>> I have attached some disasm-ed files.
>
> OK. I tried next-20110415 and your kconfig and still got no error.
>
> Please revert the last commit. It's not necessary anyway.
>
> commit 84a9890ddef487d9c6d70934c0a2addc65923bcf
> Author: Wu Fengguang <[email protected]>
> Date:   Sat Apr 16 18:38:41 2011 -0600
>
>    writeback: scale dirty proportions period with writeout bandwidth
>
>    CC: Peter Zijlstra <[email protected]>
>    Signed-off-by: Wu Fengguang <[email protected]>
>
>> Unfortunately, I don't see any new commits in your GIT tree.
>
> Yeah I cannot see it in the web interface, too:
>
> http://git.kernel.org/?p=linux/kernel/git/wfg/writeback.git;a=shortlog;h=refs/heads/dirty-throttling-v7
>
> But they are in the dirty-throttling-v7 branch at kernel.org:
>
> commit d0e30163e390d87387ec13e3b1c2168238c26793
> Author: Wu Fengguang <[email protected]>
> Date:   Sun Apr 17 11:59:12 2011 +0800
>
>    Revert "writeback: scale dirty proportions period with writeout bandwidth"
>
>    This reverts commit 84a9890ddef487d9c6d70934c0a2addc65923bcf.
>
>    [email protected]:
>
>        LD      .tmp_vmlinux1
>      mm/built-in.o: In function `calc_period_shift.part.10':
>      page-writeback.c:(.text+0x6458): undefined reference to `____ilog2_NaN'
>      make[4]: *** [.tmp_vmlinux1] Error 1
>
> commit fc5c8b04119a5bcc46865e66eec3e6133ecb56e9
> Author: Wu Fengguang <[email protected]>
> Date:   Sun Apr 17 09:22:41 2011 -0600
>
>    writeback: quick CONFIG_BLK_DEV_THROTTLING=n compile fix
>
>    Reported-by: Sedat Dilek <[email protected]>
>    Signed-off-by: Wu Fengguang <[email protected]>
>
> commit c4a7e3f48dcfae71d0e3d2c55dcc381b3def1919
> Author: Wu Fengguang <[email protected]>
> Date:   Sun Apr 17 09:04:44 2011 -0600
>
>    writeback: i386 compile fix
>
>    mm/built-in.o: In function `bdi_position_ratio':
>    page-writeback.c:(.text+0x5c83): undefined reference to `__udivdi3'
>    mm/built-in.o: In function `calc_period_shift.part.10':
>    page-writeback.c:(.text+0x6446): undefined reference to `____ilog2_NaN'
>    make[4]: *** [.tmp_vmlinux1] Error
>
>    Reported-by: Sedat Dilek <[email protected]>
>    Signed-off-by: Wu Fengguang <[email protected]>
>
>
> Thanks,
> Fengguang
>

The 2nd part disappears here, when I switch from gcc-4.6 to gcc-4.5.

- Sedat -

2011-04-17 06:46:11

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 00/12] IO-less dirty throttling v7

On Sun, Apr 17, 2011 at 6:46 AM, Sedat Dilek <[email protected]> wrote:
> On Sun, Apr 17, 2011 at 6:10 AM, Wu Fengguang <[email protected]> wrote:
>> On Sun, Apr 17, 2011 at 11:18:43AM +0800, Sedat Dilek wrote:
>>> On Sun, Apr 17, 2011 at 3:44 AM, Wu Fengguang <[email protected]> wrote:
>>> > Hi Sedat,
>>> >
>>> > On Sun, Apr 17, 2011 at 12:27:58AM +0800, Sedat Dilek wrote:
>>> >
>>> >> I pulled your tree into linux-next (next-20110415) on an i386 Debian host.
>>> >>
>>> >> My build breaks here:
>>> >> ...
>>> >>   MODPOST vmlinux.o
>>> >>   GEN     .version
>>> >>   CHK     include/generated/compile.h
>>> >>   UPD     include/generated/compile.h
>>> >>   CC      init/version.o
>>> >>   LD      init/built-in.o
>>> >>   LD      .tmp_vmlinux1
>>> >> mm/built-in.o: In function `bdi_position_ratio':
>>> >> page-writeback.c:(.text+0x5c83): undefined reference to `__udivdi3'
>>> >
>>> > Yes it can be fixed by the attached patch.
>>> >
>>> >> mm/built-in.o: In function `calc_period_shift.part.10':
>>> >> page-writeback.c:(.text+0x6446): undefined reference to `____ilog2_NaN'
>>> >
>>> > I cannot reproduce this error. In the git tree, calc_period_shift() is
>>> > actually quite simple:
>>> >
>>> > static int calc_period_shift(void)
>>> > {
>>> >        return 2 + ilog2(default_backing_dev_info.avg_write_bandwidth);
>>> > }
>>> >
>>> > where avg_write_bandwidth is of type "unsigned long".
>>> >
>>> >> make[4]: *** [.tmp_vmlinux1] Error
>>> >>
>>> >> BTW, which kernel-config options have to be set besides
>>> >> CONFIG_BLK_DEV_THROTTLING=y?
>>> >
>>> > No. I used your kconfig on 2.6.39-rc3 and it compiles OK for i386.
>>> >
>>> > I've pushed two patches into the git tree fixing the compile errors.
>>> > Thank you for trying it out and report!
>>> >
>>> > Thanks,
>>> > Fengguang
>>> >
>>>
>>> Thanks for your patch.
>>>
>>> The 1st part of the build-error is gone, but 2nd part still remains:
>>>
>>>   LD      .tmp_vmlinux1
>>> mm/built-in.o: In function `calc_period_shift.part.10':
>>> page-writeback.c:(.text+0x6458): undefined reference to `____ilog2_NaN'
>>> make[4]: *** [.tmp_vmlinux1] Error 1
>>>
>>> I have attached some disasm-ed files.
>>
>> OK. I tried next-20110415 and your kconfig and still got no error.
>>
>> Please revert the last commit. It's not necessary anyway.
>>
>> commit 84a9890ddef487d9c6d70934c0a2addc65923bcf
>> Author: Wu Fengguang <[email protected]>
>> Date:   Sat Apr 16 18:38:41 2011 -0600
>>
>>    writeback: scale dirty proportions period with writeout bandwidth
>>
>>    CC: Peter Zijlstra <[email protected]>
>>    Signed-off-by: Wu Fengguang <[email protected]>
>>
>>> Unfortunately, I don't see any new commits in your GIT tree.
>>
>> Yeah I cannot see it in the web interface, too:
>>
>> http://git.kernel.org/?p=linux/kernel/git/wfg/writeback.git;a=shortlog;h=refs/heads/dirty-throttling-v7
>>
>> But they are in the dirty-throttling-v7 branch at kernel.org:
>>
>> commit d0e30163e390d87387ec13e3b1c2168238c26793
>> Author: Wu Fengguang <[email protected]>
>> Date:   Sun Apr 17 11:59:12 2011 +0800
>>
>>    Revert "writeback: scale dirty proportions period with writeout bandwidth"
>>
>>    This reverts commit 84a9890ddef487d9c6d70934c0a2addc65923bcf.
>>
>>    [email protected]:
>>
>>        LD      .tmp_vmlinux1
>>      mm/built-in.o: In function `calc_period_shift.part.10':
>>      page-writeback.c:(.text+0x6458): undefined reference to `____ilog2_NaN'
>>      make[4]: *** [.tmp_vmlinux1] Error 1
>>
>> commit fc5c8b04119a5bcc46865e66eec3e6133ecb56e9
>> Author: Wu Fengguang <[email protected]>
>> Date:   Sun Apr 17 09:22:41 2011 -0600
>>
>>    writeback: quick CONFIG_BLK_DEV_THROTTLING=n compile fix
>>
>>    Reported-by: Sedat Dilek <[email protected]>
>>    Signed-off-by: Wu Fengguang <[email protected]>
>>
>> commit c4a7e3f48dcfae71d0e3d2c55dcc381b3def1919
>> Author: Wu Fengguang <[email protected]>
>> Date:   Sun Apr 17 09:04:44 2011 -0600
>>
>>    writeback: i386 compile fix
>>
>>    mm/built-in.o: In function `bdi_position_ratio':
>>    page-writeback.c:(.text+0x5c83): undefined reference to `__udivdi3'
>>    mm/built-in.o: In function `calc_period_shift.part.10':
>>    page-writeback.c:(.text+0x6446): undefined reference to `____ilog2_NaN'
>>    make[4]: *** [.tmp_vmlinux1] Error
>>
>>    Reported-by: Sedat Dilek <[email protected]>
>>    Signed-off-by: Wu Fengguang <[email protected]>
>>
>>
>> Thanks,
>> Fengguang
>>
>
> The 2nd part disappears here, when I switch from gcc-4.6 to gcc-4.5.
>
> - Sedat -
>

Just FYI:

Build with gcc-4.6 is OK, now.

(+) OK writeback-dirty-throttling-v7-fix/writeback-i386-compile-fix.patch
(+) OK writeback-dirty-throttling-v7-fix/0001-Revert-writeback-scale-dirty-proportions-period-with.patch

In case of the ilog2 error I have g00gled a bit and found gcc-bug
#36359 and looked also at include/linux/log2.h [2].
Not sure if this is a gcc-4.6 bug and I shall open a ticket (or a
problem in your code?).
I have tried the testcase from [3], it gives same output for nm.
[4] I could not follow.

- Sedat -

[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36359
[2] http://git.us.kernel.org/?p=linux/kernel/git/wfg/writeback.git;a=blob;f=include/linux/log2.h#l18
[3] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36359#c12
[4] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36359#c17


Attachments:
testcase_gcc-bug-36359.tar.xz (948.00 B)
testcase_gcc-bug-36359.tar.xz.sha256sum (96.00 B)
Download all attachments

2011-04-17 07:37:48

by Marco Stornelli

[permalink] [raw]
Subject: Re: [PATCH 00/12] IO-less dirty throttling v7

Il 16/04/2011 15:25, Wu Fengguang ha scritto:
> Andrew,
>
> This revision undergoes a number of simplifications, cleanups and fixes.
> Independent patches are separated out. The core patches (07, 08) now have
> easier to understand changelog. Detailed rationals can be found in patch 08.
>
> In response to the complexity complaints, an introduction document is
> written explaining the rationals, algorithm and visual case studies:
>
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/slides/smooth-dirty-throttling.pdf

It'd be great if you wrote a summary in the kernel documentation.

Marco

2011-04-17 09:31:01

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 00/12] IO-less dirty throttling v7

On Sun, Apr 17, 2011 at 03:31:54PM +0800, Marco Stornelli wrote:
> Il 16/04/2011 15:25, Wu Fengguang ha scritto:
> > Andrew,
> >
> > This revision undergoes a number of simplifications, cleanups and fixes.
> > Independent patches are separated out. The core patches (07, 08) now have
> > easier to understand changelog. Detailed rationals can be found in patch 08.
> >
> > In response to the complexity complaints, an introduction document is
> > written explaining the rationals, algorithm and visual case studies:
> >
> > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/slides/smooth-dirty-throttling.pdf
>
> It'd be great if you wrote a summary in the kernel documentation.

Perhaps not in this stage. That will only frighten people away I'm
afraid. The main concerns now are "why the complexities?". People at
this time perhaps won't bother looking into any lengthy documents at
all.

The slides with both description text and graphs should be much easier
for the readers to establish good feelings and understandings, as well
as trust. Seeing is believing, when you see 80ms vs. 30s pause times
in the bumpy NFS workload (pages 29, 30), fast rampup when suddenly
starting 10 or 100 dd tasks (pages 38, 32), and 5ms pause time in
stable workload (page 20), don't you feel the graphs much more
striking than boring texts? :)

That said, the changelog in patches 07 and 08 do offer some text based
introductions, if you are interested in reading more.

Thanks,
Fengguang

2011-04-17 17:50:14

by Marco Stornelli

[permalink] [raw]
Subject: Re: [PATCH 00/12] IO-less dirty throttling v7

Il 17/04/2011 11:30, Wu Fengguang ha scritto:
> On Sun, Apr 17, 2011 at 03:31:54PM +0800, Marco Stornelli wrote:
>> Il 16/04/2011 15:25, Wu Fengguang ha scritto:
>>> Andrew,
>>>
>>> This revision undergoes a number of simplifications, cleanups and fixes.
>>> Independent patches are separated out. The core patches (07, 08) now have
>>> easier to understand changelog. Detailed rationals can be found in patch 08.
>>>
>>> In response to the complexity complaints, an introduction document is
>>> written explaining the rationals, algorithm and visual case studies:
>>>
>>> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/slides/smooth-dirty-throttling.pdf
>>
>> It'd be great if you wrote a summary in the kernel documentation.
>
> Perhaps not in this stage. That will only frighten people away I'm
> afraid. The main concerns now are "why the complexities?". People at
> this time perhaps won't bother looking into any lengthy documents at
> all.
>

For the moment ok if you think we are in a not-ready-for-mainline yet.
But for the final version the documentation would be welcome, maybe with
the pdf as reference. The documentation is always the last thing but
it's important! :)

Marco

2011-04-17 23:32:04

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 00/12] IO-less dirty throttling v7

On Mon, Apr 18, 2011 at 01:44:22AM +0800, Marco Stornelli wrote:
> Il 17/04/2011 11:30, Wu Fengguang ha scritto:
> > On Sun, Apr 17, 2011 at 03:31:54PM +0800, Marco Stornelli wrote:
> >> Il 16/04/2011 15:25, Wu Fengguang ha scritto:
> >>> Andrew,
> >>>
> >>> This revision undergoes a number of simplifications, cleanups and fixes.
> >>> Independent patches are separated out. The core patches (07, 08) now have
> >>> easier to understand changelog. Detailed rationals can be found in patch 08.
> >>>
> >>> In response to the complexity complaints, an introduction document is
> >>> written explaining the rationals, algorithm and visual case studies:
> >>>
> >>> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/slides/smooth-dirty-throttling.pdf
> >>
> >> It'd be great if you wrote a summary in the kernel documentation.
> >
> > Perhaps not in this stage. That will only frighten people away I'm
> > afraid. The main concerns now are "why the complexities?". People at
> > this time perhaps won't bother looking into any lengthy documents at
> > all.
> >
>
> For the moment ok if you think we are in a not-ready-for-mainline yet.
> But for the final version the documentation would be welcome, maybe with
> the pdf as reference. The documentation is always the last thing but
> it's important! :)

No problem. I hope it still get the chance to get upstreamed :)

Thanks,
Fengguang

2011-04-18 00:13:45

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 00/12] IO-less dirty throttling v7

Hi Sedat,

> Please revert the last commit. It's not necessary anyway.
>
> commit 84a9890ddef487d9c6d70934c0a2addc65923bcf
> Author: Wu Fengguang <[email protected]>
> Date: Sat Apr 16 18:38:41 2011 -0600
>
> writeback: scale dirty proportions period with writeout bandwidth
>
> CC: Peter Zijlstra <[email protected]>
> Signed-off-by: Wu Fengguang <[email protected]>

Please do revert that commit, because I found a sleep-inside-spinlock
bug with it. Here is the fixed one (but you don't have to track this
optional patch).

Thanks,
Fengguang
---
Subject: writeback: scale dirty proportions period with writeout bandwidth
Date: Sat Apr 16 18:38:41 CST 2011

CC: Peter Zijlstra <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>
---
mm/page-writeback.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)

--- linux-next.orig/mm/page-writeback.c 2011-04-17 20:52:13.000000000 +0800
+++ linux-next/mm/page-writeback.c 2011-04-18 07:57:01.000000000 +0800
@@ -121,20 +121,13 @@ static struct prop_descriptor vm_complet
static struct prop_descriptor vm_dirties;

/*
- * couple the period to the dirty_ratio:
+ * couple the period to global write throughput:
*
- * period/2 ~ roundup_pow_of_two(dirty limit)
+ * period/2 ~ roundup_pow_of_two(write IO throughput)
*/
static int calc_period_shift(void)
{
- unsigned long dirty_total;
-
- if (vm_dirty_bytes)
- dirty_total = vm_dirty_bytes / PAGE_SIZE;
- else
- dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
- 100;
- return 2 + ilog2(dirty_total - 1);
+ return 2 + ilog2(default_backing_dev_info.avg_write_bandwidth);
}

/*
@@ -143,6 +136,13 @@ static int calc_period_shift(void)
static void update_completion_period(void)
{
int shift = calc_period_shift();
+
+ if (shift > PROP_MAX_SHIFT)
+ shift = PROP_MAX_SHIFT;
+
+ if (abs(shift - vm_completions.pg[0].shift) <= 1)
+ return;
+
prop_change_shift(&vm_completions, shift);
prop_change_shift(&vm_dirties, shift);
}
@@ -180,7 +180,6 @@ int dirty_ratio_handler(struct ctl_table

ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (ret == 0 && write && vm_dirty_ratio != old_ratio) {
- update_completion_period();
vm_dirty_bytes = 0;
}
return ret;
@@ -196,7 +195,6 @@ int dirty_bytes_handler(struct ctl_table

ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
if (ret == 0 && write && vm_dirty_bytes != old_bytes) {
- update_completion_period();
vm_dirty_ratio = 0;
}
return ret;
@@ -1044,6 +1042,8 @@ snapshot:
bdi->bw_time_stamp = now;
unlock:
spin_unlock(&dirty_lock);
+ if (gbdi->bw_time_stamp == now)
+ update_completion_period();
}

static unsigned long max_pause(struct backing_dev_info *bdi,

2011-04-18 06:57:48

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 00/12] IO-less dirty throttling v7

On Mon, Apr 18, 2011 at 2:13 AM, Wu Fengguang <[email protected]> wrote:
> Hi Sedat,
>
>> Please revert the last commit. It's not necessary anyway.
>>
>> commit 84a9890ddef487d9c6d70934c0a2addc65923bcf
>> Author: Wu Fengguang <[email protected]>
>> Date:   Sat Apr 16 18:38:41 2011 -0600
>>
>>     writeback: scale dirty proportions period with writeout bandwidth
>>
>>     CC: Peter Zijlstra <[email protected]>
>>     Signed-off-by: Wu Fengguang <[email protected]>
>
> Please do revert that commit, because I found a sleep-inside-spinlock
> bug with it. Here is the fixed one (but you don't have to track this
> optional patch).
>
> Thanks,
> Fengguang
> ---
> Subject: writeback: scale dirty proportions period with writeout bandwidth
> Date: Sat Apr 16 18:38:41 CST 2011
>
> CC: Peter Zijlstra <[email protected]>
> Signed-off-by: Wu Fengguang <[email protected]>
> ---
>  mm/page-writeback.c |   24 ++++++++++++------------
>  1 file changed, 12 insertions(+), 12 deletions(-)
>
> --- linux-next.orig/mm/page-writeback.c 2011-04-17 20:52:13.000000000 +0800
> +++ linux-next/mm/page-writeback.c      2011-04-18 07:57:01.000000000 +0800
> @@ -121,20 +121,13 @@ static struct prop_descriptor vm_complet
>  static struct prop_descriptor vm_dirties;
>
>  /*
> - * couple the period to the dirty_ratio:
> + * couple the period to global write throughput:
>  *
> - *   period/2 ~ roundup_pow_of_two(dirty limit)
> + *   period/2 ~ roundup_pow_of_two(write IO throughput)
>  */
>  static int calc_period_shift(void)
>  {
> -       unsigned long dirty_total;
> -
> -       if (vm_dirty_bytes)
> -               dirty_total = vm_dirty_bytes / PAGE_SIZE;
> -       else
> -               dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
> -                               100;
> -       return 2 + ilog2(dirty_total - 1);
> +       return 2 + ilog2(default_backing_dev_info.avg_write_bandwidth);
>  }
>
>  /*
> @@ -143,6 +136,13 @@ static int calc_period_shift(void)
>  static void update_completion_period(void)
>  {
>        int shift = calc_period_shift();
> +
> +       if (shift > PROP_MAX_SHIFT)
> +               shift = PROP_MAX_SHIFT;
> +
> +       if (abs(shift - vm_completions.pg[0].shift) <= 1)
> +               return;
> +
>        prop_change_shift(&vm_completions, shift);
>        prop_change_shift(&vm_dirties, shift);
>  }
> @@ -180,7 +180,6 @@ int dirty_ratio_handler(struct ctl_table
>
>        ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>        if (ret == 0 && write && vm_dirty_ratio != old_ratio) {
> -               update_completion_period();
>                vm_dirty_bytes = 0;
>        }
>        return ret;
> @@ -196,7 +195,6 @@ int dirty_bytes_handler(struct ctl_table
>
>        ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
>        if (ret == 0 && write && vm_dirty_bytes != old_bytes) {
> -               update_completion_period();
>                vm_dirty_ratio = 0;
>        }
>        return ret;
> @@ -1044,6 +1042,8 @@ snapshot:
>        bdi->bw_time_stamp = now;
>  unlock:
>        spin_unlock(&dirty_lock);
> +       if (gbdi->bw_time_stamp == now)
> +               update_completion_period();
>  }
>
>  static unsigned long max_pause(struct backing_dev_info *bdi,
>

Unfortunately, this "v2" patch still breaks with gcc-4.6 here:

LD .tmp_vmlinux1
mm/built-in.o: In function `calc_period_shift.part.10':
page-writeback.c:(.text+0x6458): undefined reference to `____ilog2_NaN'
make[4]: *** [.tmp_vmlinux1] Error 1

My patchset against next-20110415 looks like this:

(+) OK writeback-dirty-throttling-v7/writeback-dirty-throttling-v7.patch
(+) OK writeback-dirty-throttling-post-v7/0001-writeback-i386-compile-fix.patch
(+) OK writeback-dirty-throttling-post-v7/0002-writeback-quick-CONFIG_BLK_DEV_THROTTLING-n-compile-.patch
(+) OK writeback-dirty-throttling-post-v7/0003-Revert-writeback-scale-dirty-proportions-period-with.patch
(+) OK writeback-dirty-throttling-v7-fix/writeback-scale-dirty-proportions-period-with-writeout-bandwidth-v2.patch

Attached are the disasm of mm/page-writeback.o (v2, gcc-4.6) and the
disasm of yesterday's experiments with gcc-4.5.

[ gcc-4.5 ]

00006574 <calc_period_shift>:
6574: a1 90 00 00 00 mov 0x90,%eax 6575:
R_386_32 default_backing_dev_info
6579: 55 push %ebp
657a: 89 e5 mov %esp,%ebp
657c: e8 02 f8 ff ff call 5d83 <__ilog2_u32>
6581: 5d pop %ebp
6582: 83 c0 02 add $0x2,%eax
6585: c3 ret

[ gcc-4.6 ]

000008c9 <calc_period_shift.part.10>:
8c9: 8b 15 90 00 00 00 mov 0x90,%edx 8cb:
R_386_32 default_backing_dev_info
8cf: 55 push %ebp
8d0: 89 e5 mov %esp,%ebp
8d2: 85 d2 test %edx,%edx
8d4: 0f 88 46 01 00 00 js a20
<calc_period_shift.part.10+0x157>
8da: f7 c2 00 00 00 40 test $0x40000000,%edx
8e0: b8 20 00 00 00 mov $0x20,%eax
8e5: 0f 85 3a 01 00 00 jne a25
<calc_period_shift.part.10+0x15c>
8eb: f7 c2 00 00 00 20 test $0x20000000,%edx
8f1: b0 1f mov $0x1f,%al
8f3: 0f 85 2c 01 00 00 jne a25
<calc_period_shift.part.10+0x15c>
8f9: f7 c2 00 00 00 10 test $0x10000000,%edx
8ff: b0 1e mov $0x1e,%al
901: 0f 85 1e 01 00 00 jne a25
<calc_period_shift.part.10+0x15c>
907: f7 c2 00 00 00 08 test $0x8000000,%edx
90d: b0 1d mov $0x1d,%al
90f: 0f 85 10 01 00 00 jne a25
<calc_period_shift.part.10+0x15c>
915: f7 c2 00 00 00 04 test $0x4000000,%edx
91b: b0 1c mov $0x1c,%al
91d: 0f 85 02 01 00 00 jne a25
<calc_period_shift.part.10+0x15c>
923: f7 c2 00 00 00 02 test $0x2000000,%edx
929: b0 1b mov $0x1b,%al
92b: 0f 85 f4 00 00 00 jne a25
<calc_period_shift.part.10+0x15c>
931: f7 c2 00 00 00 01 test $0x1000000,%edx
937: b0 1a mov $0x1a,%al
939: 0f 85 e6 00 00 00 jne a25
<calc_period_shift.part.10+0x15c>
93f: f7 c2 00 00 80 00 test $0x800000,%edx
945: b0 19 mov $0x19,%al
947: 0f 85 d8 00 00 00 jne a25
<calc_period_shift.part.10+0x15c>
94d: f7 c2 00 00 40 00 test $0x400000,%edx
953: b0 18 mov $0x18,%al
955: 0f 85 ca 00 00 00 jne a25
<calc_period_shift.part.10+0x15c>
95b: f7 c2 00 00 20 00 test $0x200000,%edx
961: b0 17 mov $0x17,%al
963: 0f 85 bc 00 00 00 jne a25
<calc_period_shift.part.10+0x15c>
969: f7 c2 00 00 10 00 test $0x100000,%edx
96f: b0 16 mov $0x16,%al
971: 0f 85 ae 00 00 00 jne a25
<calc_period_shift.part.10+0x15c>
977: f7 c2 00 00 08 00 test $0x80000,%edx
97d: b0 15 mov $0x15,%al
97f: 0f 85 a0 00 00 00 jne a25
<calc_period_shift.part.10+0x15c>
985: f7 c2 00 00 04 00 test $0x40000,%edx
98b: b0 14 mov $0x14,%al
98d: 0f 85 92 00 00 00 jne a25
<calc_period_shift.part.10+0x15c>
993: f7 c2 00 00 02 00 test $0x20000,%edx
999: b0 13 mov $0x13,%al
99b: 0f 85 84 00 00 00 jne a25
<calc_period_shift.part.10+0x15c>
9a1: f7 c2 00 00 01 00 test $0x10000,%edx
9a7: b0 12 mov $0x12,%al
9a9: 75 7a jne a25
<calc_period_shift.part.10+0x15c>
9ab: f6 c6 80 test $0x80,%dh
9ae: b0 11 mov $0x11,%al
9b0: 75 73 jne a25
<calc_period_shift.part.10+0x15c>
9b2: f6 c6 40 test $0x40,%dh
9b5: b0 10 mov $0x10,%al
9b7: 75 6c jne a25
<calc_period_shift.part.10+0x15c>
9b9: f6 c6 20 test $0x20,%dh
9bc: b0 0f mov $0xf,%al
9be: 75 65 jne a25
<calc_period_shift.part.10+0x15c>
9c0: f6 c6 10 test $0x10,%dh
9c3: b0 0e mov $0xe,%al
9c5: 75 5e jne a25
<calc_period_shift.part.10+0x15c>
9c7: f6 c6 08 test $0x8,%dh
9ca: b0 0d mov $0xd,%al
9cc: 75 57 jne a25
<calc_period_shift.part.10+0x15c>
9ce: f6 c6 04 test $0x4,%dh
9d1: b0 0c mov $0xc,%al
9d3: 75 50 jne a25
<calc_period_shift.part.10+0x15c>
9d5: f6 c6 02 test $0x2,%dh
9d8: b0 0b mov $0xb,%al
9da: 75 49 jne a25
<calc_period_shift.part.10+0x15c>
9dc: f6 c6 01 test $0x1,%dh
9df: b0 0a mov $0xa,%al
9e1: 75 42 jne a25
<calc_period_shift.part.10+0x15c>
9e3: f6 c2 80 test $0x80,%dl
9e6: b0 09 mov $0x9,%al
9e8: 75 3b jne a25
<calc_period_shift.part.10+0x15c>
9ea: f6 c2 40 test $0x40,%dl
9ed: b0 08 mov $0x8,%al
9ef: 75 34 jne a25
<calc_period_shift.part.10+0x15c>
9f1: f6 c2 20 test $0x20,%dl
9f4: b0 07 mov $0x7,%al
9f6: 75 2d jne a25
<calc_period_shift.part.10+0x15c>
9f8: f6 c2 10 test $0x10,%dl
9fb: b0 06 mov $0x6,%al
9fd: 75 26 jne a25
<calc_period_shift.part.10+0x15c>
9ff: f6 c2 08 test $0x8,%dl
a02: b0 05 mov $0x5,%al
a04: 75 1f jne a25
<calc_period_shift.part.10+0x15c>
a06: f6 c2 04 test $0x4,%dl
a09: b0 04 mov $0x4,%al
a0b: 75 18 jne a25
<calc_period_shift.part.10+0x15c>
a0d: f6 c2 02 test $0x2,%dl
a10: b0 03 mov $0x3,%al
a12: 75 11 jne a25
<calc_period_shift.part.10+0x15c>
a14: 80 e2 01 and $0x1,%dl
a17: b0 02 mov $0x2,%al
a19: 75 0a jne a25
<calc_period_shift.part.10+0x15c>
a1b: e8 fc ff ff ff call a1c
<calc_period_shift.part.10+0x153> a1c: R_386_PC32 ____ilog2_NaN
a20: b8 21 00 00 00 mov $0x21,%eax
a25: 5d pop %ebp
a26: c3 ret

00000a27 <calc_period_shift>:
a27: 55 push %ebp
a28: 83 ca ff or $0xffffffff,%edx
a2b: 89 e5 mov %esp,%ebp
a2d: 0f bd 05 90 00 00 00 bsr 0x90,%eax a30:
R_386_32 default_backing_dev_info
a34: 0f 44 c2 cmove %edx,%eax
a37: 5d pop %ebp
a38: 83 c0 02 add $0x2,%eax
a3b: c3 ret

- EOT -

- Sedat -


Attachments:
mm_page-writeback.o_v2.disasm.xz (19.80 kB)
gcc-4.5_mm_page-writeback.o.disasm.xz (19.16 kB)
Download all attachments

2011-04-18 08:18:22

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 00/12] IO-less dirty throttling v7

> Unfortunately, this "v2" patch still breaks with gcc-4.6 here:
>
> LD .tmp_vmlinux1
> mm/built-in.o: In function `calc_period_shift.part.10':
> page-writeback.c:(.text+0x6458): undefined reference to `____ilog2_NaN'
> make[4]: *** [.tmp_vmlinux1] Error 1
>
> My patchset against next-20110415 looks like this:
>
> (+) OK writeback-dirty-throttling-v7/writeback-dirty-throttling-v7.patch
> (+) OK writeback-dirty-throttling-post-v7/0001-writeback-i386-compile-fix.patch
> (+) OK writeback-dirty-throttling-post-v7/0002-writeback-quick-CONFIG_BLK_DEV_THROTTLING-n-compile-.patch
> (+) OK writeback-dirty-throttling-post-v7/0003-Revert-writeback-scale-dirty-proportions-period-with.patch
> (+) OK writeback-dirty-throttling-v7-fix/writeback-scale-dirty-proportions-period-with-writeout-bandwidth-v2.patch
>
> Attached are the disasm of mm/page-writeback.o (v2, gcc-4.6) and the
> disasm of yesterday's experiments with gcc-4.5.
>
> [ gcc-4.5 ]
>
> 00006574 <calc_period_shift>:
> 6574: a1 90 00 00 00 mov 0x90,%eax 6575:
> R_386_32 default_backing_dev_info
> 6579: 55 push %ebp
> 657a: 89 e5 mov %esp,%ebp
> 657c: e8 02 f8 ff ff call 5d83 <__ilog2_u32>
> 6581: 5d pop %ebp
> 6582: 83 c0 02 add $0x2,%eax
> 6585: c3 ret

gcc-4.5 is generating the right code, while the below silly macro
expansion seems like __builtin_constant_p(n) wrongly evaluating to true.
I wonder if the attached patch will workaround the gcc's bug. It adds a
local variable in calc_period_shift() for passing to ilog2().

Thanks,
Fengguang

> [ gcc-4.6 ]
>
> 000008c9 <calc_period_shift.part.10>:
> 8c9: 8b 15 90 00 00 00 mov 0x90,%edx 8cb:
> R_386_32 default_backing_dev_info
> 8cf: 55 push %ebp
> 8d0: 89 e5 mov %esp,%ebp
> 8d2: 85 d2 test %edx,%edx
> 8d4: 0f 88 46 01 00 00 js a20
> <calc_period_shift.part.10+0x157>
> 8da: f7 c2 00 00 00 40 test $0x40000000,%edx
> 8e0: b8 20 00 00 00 mov $0x20,%eax
> 8e5: 0f 85 3a 01 00 00 jne a25
> <calc_period_shift.part.10+0x15c>
> 8eb: f7 c2 00 00 00 20 test $0x20000000,%edx
> 8f1: b0 1f mov $0x1f,%al
> 8f3: 0f 85 2c 01 00 00 jne a25
> <calc_period_shift.part.10+0x15c>
> 8f9: f7 c2 00 00 00 10 test $0x10000000,%edx
> 8ff: b0 1e mov $0x1e,%al
> 901: 0f 85 1e 01 00 00 jne a25
> <calc_period_shift.part.10+0x15c>
> 907: f7 c2 00 00 00 08 test $0x8000000,%edx
> 90d: b0 1d mov $0x1d,%al
> 90f: 0f 85 10 01 00 00 jne a25
> <calc_period_shift.part.10+0x15c>
> 915: f7 c2 00 00 00 04 test $0x4000000,%edx
> 91b: b0 1c mov $0x1c,%al
> 91d: 0f 85 02 01 00 00 jne a25
> <calc_period_shift.part.10+0x15c>
> 923: f7 c2 00 00 00 02 test $0x2000000,%edx
> 929: b0 1b mov $0x1b,%al
> 92b: 0f 85 f4 00 00 00 jne a25
> <calc_period_shift.part.10+0x15c>
> 931: f7 c2 00 00 00 01 test $0x1000000,%edx
> 937: b0 1a mov $0x1a,%al
> 939: 0f 85 e6 00 00 00 jne a25
> <calc_period_shift.part.10+0x15c>
> 93f: f7 c2 00 00 80 00 test $0x800000,%edx
> 945: b0 19 mov $0x19,%al
> 947: 0f 85 d8 00 00 00 jne a25
> <calc_period_shift.part.10+0x15c>
> 94d: f7 c2 00 00 40 00 test $0x400000,%edx
> 953: b0 18 mov $0x18,%al
> 955: 0f 85 ca 00 00 00 jne a25
> <calc_period_shift.part.10+0x15c>
> 95b: f7 c2 00 00 20 00 test $0x200000,%edx
> 961: b0 17 mov $0x17,%al
> 963: 0f 85 bc 00 00 00 jne a25
> <calc_period_shift.part.10+0x15c>
> 969: f7 c2 00 00 10 00 test $0x100000,%edx
> 96f: b0 16 mov $0x16,%al
> 971: 0f 85 ae 00 00 00 jne a25
> <calc_period_shift.part.10+0x15c>
> 977: f7 c2 00 00 08 00 test $0x80000,%edx
> 97d: b0 15 mov $0x15,%al
> 97f: 0f 85 a0 00 00 00 jne a25
> <calc_period_shift.part.10+0x15c>
> 985: f7 c2 00 00 04 00 test $0x40000,%edx
> 98b: b0 14 mov $0x14,%al
> 98d: 0f 85 92 00 00 00 jne a25
> <calc_period_shift.part.10+0x15c>
> 993: f7 c2 00 00 02 00 test $0x20000,%edx
> 999: b0 13 mov $0x13,%al
> 99b: 0f 85 84 00 00 00 jne a25
> <calc_period_shift.part.10+0x15c>
> 9a1: f7 c2 00 00 01 00 test $0x10000,%edx
> 9a7: b0 12 mov $0x12,%al
> 9a9: 75 7a jne a25
> <calc_period_shift.part.10+0x15c>
> 9ab: f6 c6 80 test $0x80,%dh
> 9ae: b0 11 mov $0x11,%al
> 9b0: 75 73 jne a25
> <calc_period_shift.part.10+0x15c>
> 9b2: f6 c6 40 test $0x40,%dh
> 9b5: b0 10 mov $0x10,%al
> 9b7: 75 6c jne a25
> <calc_period_shift.part.10+0x15c>
> 9b9: f6 c6 20 test $0x20,%dh
> 9bc: b0 0f mov $0xf,%al
> 9be: 75 65 jne a25
> <calc_period_shift.part.10+0x15c>
> 9c0: f6 c6 10 test $0x10,%dh
> 9c3: b0 0e mov $0xe,%al
> 9c5: 75 5e jne a25
> <calc_period_shift.part.10+0x15c>
> 9c7: f6 c6 08 test $0x8,%dh
> 9ca: b0 0d mov $0xd,%al
> 9cc: 75 57 jne a25
> <calc_period_shift.part.10+0x15c>
> 9ce: f6 c6 04 test $0x4,%dh
> 9d1: b0 0c mov $0xc,%al
> 9d3: 75 50 jne a25
> <calc_period_shift.part.10+0x15c>
> 9d5: f6 c6 02 test $0x2,%dh
> 9d8: b0 0b mov $0xb,%al
> 9da: 75 49 jne a25
> <calc_period_shift.part.10+0x15c>
> 9dc: f6 c6 01 test $0x1,%dh
> 9df: b0 0a mov $0xa,%al
> 9e1: 75 42 jne a25
> <calc_period_shift.part.10+0x15c>
> 9e3: f6 c2 80 test $0x80,%dl
> 9e6: b0 09 mov $0x9,%al
> 9e8: 75 3b jne a25
> <calc_period_shift.part.10+0x15c>
> 9ea: f6 c2 40 test $0x40,%dl
> 9ed: b0 08 mov $0x8,%al
> 9ef: 75 34 jne a25
> <calc_period_shift.part.10+0x15c>
> 9f1: f6 c2 20 test $0x20,%dl
> 9f4: b0 07 mov $0x7,%al
> 9f6: 75 2d jne a25
> <calc_period_shift.part.10+0x15c>
> 9f8: f6 c2 10 test $0x10,%dl
> 9fb: b0 06 mov $0x6,%al
> 9fd: 75 26 jne a25
> <calc_period_shift.part.10+0x15c>
> 9ff: f6 c2 08 test $0x8,%dl
> a02: b0 05 mov $0x5,%al
> a04: 75 1f jne a25
> <calc_period_shift.part.10+0x15c>
> a06: f6 c2 04 test $0x4,%dl
> a09: b0 04 mov $0x4,%al
> a0b: 75 18 jne a25
> <calc_period_shift.part.10+0x15c>
> a0d: f6 c2 02 test $0x2,%dl
> a10: b0 03 mov $0x3,%al
> a12: 75 11 jne a25
> <calc_period_shift.part.10+0x15c>
> a14: 80 e2 01 and $0x1,%dl
> a17: b0 02 mov $0x2,%al
> a19: 75 0a jne a25
> <calc_period_shift.part.10+0x15c>
> a1b: e8 fc ff ff ff call a1c
> <calc_period_shift.part.10+0x153> a1c: R_386_PC32 ____ilog2_NaN
> a20: b8 21 00 00 00 mov $0x21,%eax
> a25: 5d pop %ebp
> a26: c3 ret
>
> 00000a27 <calc_period_shift>:
> a27: 55 push %ebp
> a28: 83 ca ff or $0xffffffff,%edx
> a2b: 89 e5 mov %esp,%ebp
> a2d: 0f bd 05 90 00 00 00 bsr 0x90,%eax a30:
> R_386_32 default_backing_dev_info
> a34: 0f 44 c2 cmove %edx,%eax
> a37: 5d pop %ebp
> a38: 83 c0 02 add $0x2,%eax
> a3b: c3 ret
>
> - EOT -
>
> - Sedat -




Attachments:
(No filename) (9.29 kB)
writeback-adaptive-proportions-shift.patch (2.23 kB)
Download all attachments

2011-04-18 10:22:36

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH 00/12] IO-less dirty throttling v7

On Mon, Apr 18, 2011 at 10:18 AM, Wu Fengguang <[email protected]> wrote:
>> Unfortunately, this "v2" patch still breaks with gcc-4.6 here:
>>
>>   LD      .tmp_vmlinux1
>> mm/built-in.o: In function `calc_period_shift.part.10':
>> page-writeback.c:(.text+0x6458): undefined reference to `____ilog2_NaN'
>> make[4]: *** [.tmp_vmlinux1] Error 1
>>
>> My patchset against next-20110415 looks like this:
>>
>>   (+) OK   writeback-dirty-throttling-v7/writeback-dirty-throttling-v7.patch
>>   (+) OK   writeback-dirty-throttling-post-v7/0001-writeback-i386-compile-fix.patch
>>   (+) OK   writeback-dirty-throttling-post-v7/0002-writeback-quick-CONFIG_BLK_DEV_THROTTLING-n-compile-.patch
>>   (+) OK   writeback-dirty-throttling-post-v7/0003-Revert-writeback-scale-dirty-proportions-period-with.patch
>>   (+) OK   writeback-dirty-throttling-v7-fix/writeback-scale-dirty-proportions-period-with-writeout-bandwidth-v2.patch
>>
>> Attached are the disasm of mm/page-writeback.o (v2, gcc-4.6) and the
>> disasm of yesterday's experiments with gcc-4.5.
>>
>> [ gcc-4.5 ]
>>
>> 00006574 <calc_period_shift>:
>>     6574:       a1 90 00 00 00          mov    0x90,%eax        6575:
>> R_386_32  default_backing_dev_info
>>     6579:       55                      push   %ebp
>>     657a:       89 e5                   mov    %esp,%ebp
>>     657c:       e8 02 f8 ff ff          call   5d83 <__ilog2_u32>
>>     6581:       5d                      pop    %ebp
>>     6582:       83 c0 02                add    $0x2,%eax
>>     6585:       c3                      ret
>
> gcc-4.5 is generating the right code, while the below silly macro
> expansion seems like __builtin_constant_p(n) wrongly evaluating to true.
> I wonder if the attached patch will workaround the gcc's bug. It adds a
> local variable in calc_period_shift() for passing to ilog2().
>
> Thanks,
> Fengguang
>

Thanks for the patch, that "gcc-4.6 workaround" works here.
Feel free to add:

Tested-by: Sedat Dilek <[email protected]>.

- Sedat -

P.S.: Attached disasm v3 with gcc-4.6

>> [ gcc-4.6 ]
>>
>> 000008c9 <calc_period_shift.part.10>:
>>      8c9:       8b 15 90 00 00 00       mov    0x90,%edx        8cb:
>> R_386_32   default_backing_dev_info
>>      8cf:       55                      push   %ebp
>>      8d0:       89 e5                   mov    %esp,%ebp
>>      8d2:       85 d2                   test   %edx,%edx
>>      8d4:       0f 88 46 01 00 00       js     a20
>> <calc_period_shift.part.10+0x157>
>>      8da:       f7 c2 00 00 00 40       test   $0x40000000,%edx
>>      8e0:       b8 20 00 00 00          mov    $0x20,%eax
>>      8e5:       0f 85 3a 01 00 00       jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      8eb:       f7 c2 00 00 00 20       test   $0x20000000,%edx
>>      8f1:       b0 1f                   mov    $0x1f,%al
>>      8f3:       0f 85 2c 01 00 00       jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      8f9:       f7 c2 00 00 00 10       test   $0x10000000,%edx
>>      8ff:       b0 1e                   mov    $0x1e,%al
>>      901:       0f 85 1e 01 00 00       jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      907:       f7 c2 00 00 00 08       test   $0x8000000,%edx
>>      90d:       b0 1d                   mov    $0x1d,%al
>>      90f:       0f 85 10 01 00 00       jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      915:       f7 c2 00 00 00 04       test   $0x4000000,%edx
>>      91b:       b0 1c                   mov    $0x1c,%al
>>      91d:       0f 85 02 01 00 00       jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      923:       f7 c2 00 00 00 02       test   $0x2000000,%edx
>>      929:       b0 1b                   mov    $0x1b,%al
>>      92b:       0f 85 f4 00 00 00       jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      931:       f7 c2 00 00 00 01       test   $0x1000000,%edx
>>      937:       b0 1a                   mov    $0x1a,%al
>>      939:       0f 85 e6 00 00 00       jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      93f:       f7 c2 00 00 80 00       test   $0x800000,%edx
>>      945:       b0 19                   mov    $0x19,%al
>>      947:       0f 85 d8 00 00 00       jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      94d:       f7 c2 00 00 40 00       test   $0x400000,%edx
>>      953:       b0 18                   mov    $0x18,%al
>>      955:       0f 85 ca 00 00 00       jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      95b:       f7 c2 00 00 20 00       test   $0x200000,%edx
>>      961:       b0 17                   mov    $0x17,%al
>>      963:       0f 85 bc 00 00 00       jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      969:       f7 c2 00 00 10 00       test   $0x100000,%edx
>>      96f:       b0 16                   mov    $0x16,%al
>>      971:       0f 85 ae 00 00 00       jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      977:       f7 c2 00 00 08 00       test   $0x80000,%edx
>>      97d:       b0 15                   mov    $0x15,%al
>>      97f:       0f 85 a0 00 00 00       jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      985:       f7 c2 00 00 04 00       test   $0x40000,%edx
>>      98b:       b0 14                   mov    $0x14,%al
>>      98d:       0f 85 92 00 00 00       jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      993:       f7 c2 00 00 02 00       test   $0x20000,%edx
>>      999:       b0 13                   mov    $0x13,%al
>>      99b:       0f 85 84 00 00 00       jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      9a1:       f7 c2 00 00 01 00       test   $0x10000,%edx
>>      9a7:       b0 12                   mov    $0x12,%al
>>      9a9:       75 7a                   jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      9ab:       f6 c6 80                test   $0x80,%dh
>>      9ae:       b0 11                   mov    $0x11,%al
>>      9b0:       75 73                   jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      9b2:       f6 c6 40                test   $0x40,%dh
>>      9b5:       b0 10                   mov    $0x10,%al
>>      9b7:       75 6c                   jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      9b9:       f6 c6 20                test   $0x20,%dh
>>      9bc:       b0 0f                   mov    $0xf,%al
>>      9be:       75 65                   jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      9c0:       f6 c6 10                test   $0x10,%dh
>>      9c3:       b0 0e                   mov    $0xe,%al
>>      9c5:       75 5e                   jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      9c7:       f6 c6 08                test   $0x8,%dh
>>      9ca:       b0 0d                   mov    $0xd,%al
>>      9cc:       75 57                   jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      9ce:       f6 c6 04                test   $0x4,%dh
>>      9d1:       b0 0c                   mov    $0xc,%al
>>      9d3:       75 50                   jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      9d5:       f6 c6 02                test   $0x2,%dh
>>      9d8:       b0 0b                   mov    $0xb,%al
>>      9da:       75 49                   jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      9dc:       f6 c6 01                test   $0x1,%dh
>>      9df:       b0 0a                   mov    $0xa,%al
>>      9e1:       75 42                   jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      9e3:       f6 c2 80                test   $0x80,%dl
>>      9e6:       b0 09                   mov    $0x9,%al
>>      9e8:       75 3b                   jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      9ea:       f6 c2 40                test   $0x40,%dl
>>      9ed:       b0 08                   mov    $0x8,%al
>>      9ef:       75 34                   jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      9f1:       f6 c2 20                test   $0x20,%dl
>>      9f4:       b0 07                   mov    $0x7,%al
>>      9f6:       75 2d                   jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      9f8:       f6 c2 10                test   $0x10,%dl
>>      9fb:       b0 06                   mov    $0x6,%al
>>      9fd:       75 26                   jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      9ff:       f6 c2 08                test   $0x8,%dl
>>      a02:       b0 05                   mov    $0x5,%al
>>      a04:       75 1f                   jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      a06:       f6 c2 04                test   $0x4,%dl
>>      a09:       b0 04                   mov    $0x4,%al
>>      a0b:       75 18                   jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      a0d:       f6 c2 02                test   $0x2,%dl
>>      a10:       b0 03                   mov    $0x3,%al
>>      a12:       75 11                   jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      a14:       80 e2 01                and    $0x1,%dl
>>      a17:       b0 02                   mov    $0x2,%al
>>      a19:       75 0a                   jne    a25
>> <calc_period_shift.part.10+0x15c>
>>      a1b:       e8 fc ff ff ff          call   a1c
>> <calc_period_shift.part.10+0x153>    a1c: R_386_PC32 ____ilog2_NaN
>>      a20:       b8 21 00 00 00          mov    $0x21,%eax
>>      a25:       5d                      pop    %ebp
>>      a26:       c3                      ret
>>
>> 00000a27 <calc_period_shift>:
>>      a27:       55                      push   %ebp
>>      a28:       83 ca ff                or     $0xffffffff,%edx
>>      a2b:       89 e5                   mov    %esp,%ebp
>>      a2d:       0f bd 05 90 00 00 00    bsr    0x90,%eax        a30:
>> R_386_32   default_backing_dev_info
>>      a34:       0f 44 c2                cmove  %edx,%eax
>>      a37:       5d                      pop    %ebp
>>      a38:       83 c0 02                add    $0x2,%eax
>>      a3b:       c3                      ret
>>
>> - EOT -
>>
>> - Sedat -
>
>
>
>


Attachments:
mm_page-writeback.o_v3.disasm.xz (19.25 kB)

2011-04-26 17:21:44

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 00/12] IO-less dirty throttling v7

On Sat, Apr 16, 2011 at 09:25:46PM +0800, Wu Fengguang wrote:
> Andrew,
>
> This revision undergoes a number of simplifications, cleanups and fixes.
> Independent patches are separated out. The core patches (07, 08) now have
> easier to understand changelog. Detailed rationals can be found in patch 08.
>
> In response to the complexity complaints, an introduction document is
> written explaining the rationals, algorithm and visual case studies:
>
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/slides/smooth-dirty-throttling.pdf
>

Hi Fenguang,

I went quickly browsed through above document and am trying to understand
the meaning of following lines and see how does it fit into the framework
of existing IO conroller.

- task IO controller endogenous
- cgroup IO controller well integrated
- proportional IO controller endogenous

You had sent me a link where you had prepared a patch to control the
async IO completely. So because this code is all about measuring the
bdi writeback rate and then coming up task ratelimit accoridingly, it
will never know about other IO going on in the cgroup. READS and direct
IO.

So IIUC, to make use of above logic for cgroup throttling, one shall have
to come up with explicity notion of async bandwidth per cgroup which does
not control other writes. Currently we have following when it comes to
throttling.

blkio.throttle_read_bps
blkio.throttle_write_bps

The intention is to be able to control the WRITE bandwidth of cgroup and
it could be any kind of WRITE (be it buffered WRITE or direct WRITES).
Currently we control only direct WRITES and question of how to also
control buffered writes is still on the table.

Because your patch does not know about other WRITES happening in the
system, one needs to create a way so that buffered WRITES and direct
WRITES can be accounted together against a group and throttled
accordingly.

What does "proportional IO controller endogenous" mean? Currently we do
all proportional IO division in CFQ. So are you proposing that for
buffered WRITES we come up with a different policy altogether in writeback
layer or somehow it is integrating with CFQ mechanism?

Thanks
Vivek

2011-04-28 14:27:56

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 00/12] IO-less dirty throttling v7

Hi Vivek,

On Wed, Apr 27, 2011 at 01:19:54AM +0800, Vivek Goyal wrote:
> On Sat, Apr 16, 2011 at 09:25:46PM +0800, Wu Fengguang wrote:
> > Andrew,
> >
> > This revision undergoes a number of simplifications, cleanups and fixes.
> > Independent patches are separated out. The core patches (07, 08) now have
> > easier to understand changelog. Detailed rationals can be found in patch 08.
> >
> > In response to the complexity complaints, an introduction document is
> > written explaining the rationals, algorithm and visual case studies:
> >
> > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/slides/smooth-dirty-throttling.pdf
> >
>
> Hi Fenguang,
>
> I went quickly browsed through above document and am trying to understand
> the meaning of following lines and see how does it fit into the framework
> of existing IO conroller.

Thanks for taking the look! Regarding this diff:

http://git.kernel.org/?p=linux/kernel/git/wfg/writeback.git;a=blobdiff;f=mm/page-writeback.c;h=0b579e7fd338fd1f59cc36bf15fda06ff6260634;hp=34dff9f0d28d0f4f0794eb41187f71b4ade6b8a2;hb=1a58ad99ce1f6a9df6618a4b92fa4859cc3e7e90;hpb=5b6fcb3125ea52ff04a2fad27a51307842deb1a0

> - task IO controller endogenous

Normally the bandwidth the current task to be throttled at (referred
to as task_bw below) is runtime calculated, however if there is an
interface (the patch reuses current->signal->rlim[RLIMIT_RSS].rlim_cur),
then it can just use that bandwidth to throttle the current task. No
extra code is needed. In this sense, it has the endogenous capability
to do per-task async write IO controller.

> - proportional IO controller endogenous

Sorry, "priority" could be more accurate than "proportional".
When task_bw is calculated in the normal way, you may further do

task_bw *= 2;

to grant it doubled bandwidth than the other tasks. Or do

task_bw *= current->async_write_priority;

to give it whatever configurable async write priority. When you do
this, the base bandwidth is smart enough to adapt to the new balance
point. In this sense, exact priority control is also endogenous.

> - cgroup IO controller well integrated

The async write cgroup IO controller is implemented in the same way as
the "global IO controller", in that it's also based on the "base
bandwidth" concept and is calculated with the same algorithm.

> You had sent me a link where you had prepared a patch to control the
> async IO completely. So because this code is all about measuring the
> bdi writeback rate and then coming up task ratelimit accoridingly, it
> will never know about other IO going on in the cgroup. READS and direct
> IO.

Right.

> So IIUC, to make use of above logic for cgroup throttling, one shall have
> to come up with explicity notion of async bandwidth per cgroup which does
> not control other writes. Currently we have following when it comes to
> throttling.
>
> blkio.throttle_read_bps
> blkio.throttle_write_bps
>
> The intention is to be able to control the WRITE bandwidth of cgroup and
> it could be any kind of WRITE (be it buffered WRITE or direct WRITES).
> Currently we control only direct WRITES and question of how to also
> control buffered writes is still on the table.
>
> Because your patch does not know about other WRITES happening in the
> system, one needs to create a way so that buffered WRITES and direct
> WRITES can be accounted together against a group and throttled
> accordingly.

Basically it is now possible to also send DIRECT writes to the new
balance_dirty_pages(), because it's RATE based rather than THRESHOLD
based. The DIRECT writes have nothing to do with dirty THRESHOLD, so
the legacy balance_dirty_pages() was not able to handle them at all.

Then there is the danger that DIRECT writes be double throttled --
explicitly in balance_dirty_pages() and implicitly in
get_request_wait(). But as long as the latter do not sleep for too
long time (< 500ms for now), it will be compensated in
balance_dirty_pages() (aka. think time compensation).

Or even safer, we may let DIRECT writes enter balance_dirty_pages()
only if it's to be cgroup throttled. The cgroup IO controller can be
enhanced to do "leak" control that can effectively account for all
get_request_wait() latencies.

> What does "proportional IO controller endogenous" mean? Currently we do
> all proportional IO division in CFQ. So are you proposing that for
> buffered WRITES we come up with a different policy altogether in writeback
> layer or somehow it is integrating with CFQ mechanism?

See above. It's not related to CFQ and totally within the scope of
(async) writes.

Thanks,
Fengguang