LinuxLists.cc - Re: [PATCH 18/23] io-controller: blkio_cgroup patches from Ryo to track async bios.

2009-09-01 07:00:04

Subject: Re: [PATCH 18/23] io-controller: blkio_cgroup patches from Ryo to track async bios.

Hi,

> > Hi Rik,
> >
> > Thanks for reviewing the patches. I wanted to have better understanding of
> > where all does it help to associate a bio to the group of process who
> > created/owned the page. Hence few thoughts.
> >
> > When a bio is submitted to IO scheduler, it needs to determine the group
> > bio belongs to and group which should be charged to. There seem to be two
> > methods.
> >
> > - Attribute the bio to cgroup submitting process belongs to.
> > - For async requests, track the original owner hence cgroup of the page
> > ?and charge that group for the bio.
> >
> > One can think of pros/cons of both the approaches.
> >
> > - The primary use case of tracking async context seems be that if a
> > ?process T1 in group G1 mmaps a big file and then another process T2 in
> > ?group G2, asks for memory and triggers reclaim and generates writes of
> > ?the file pages mapped by T1, then these writes should not be charged to
> > ?T2, hence blkio_cgroup pages.
> >
> > ?But the flip side of this might be that group G2 is a low weight group
> > ?and probably too busy also right now, which will delay the write out
> > ?and possibly T2 will wait longer for memory to be allocated.

In order to avoid this wait, dm-ioband issues IO which has a page with
PG_Reclaim as early as possible.

> > - At one point of time Andrew mentioned that buffered writes are generally a
> > ?big problem and one needs to map these to owner's group. Though I am not
> > ?very sure what specific problem he was referring to. Can we attribute
> > ?buffered writes to pdflush threads and move all pdflush threads in a
> > ?cgroup to limit system wide write out activity?

I think that buffered writes also should be controlled per cgroup as
well as synchronous writes.

> > - Somebody also gave an example where there is a memory hogging process and
> > ?possibly pushes out some processes to swap. It does not sound fair to
> > ?charge those proccess for that swap writeout. These processes never
> > ?requested swap IO.

I think that swap writeouts should be charged to the memory hogging
process, because the process consumes more resources and it should get
a penalty.

> > - If there are multiple buffered writers in the system, then those writers
> > ?can also be forced to writeout some pages to disk before they are
> > ?allowed to dirty more pages. As per the page cache design, any writer
> > ?can pick any inode and start writing out pages. So it can happen a
> > ?weight group task is writting out pages dirtied by a lower weight group
> > ?task. If, async bio is mapped to owner's group, it might happen that
> > ?higher weight group task might be made to sleep on lower weight group
> > ?task because request descriptors are all consumed up.

As mentioned above, in dm-ioband, the bio is charged to the page owner
and issued immediately.

> > It looks like there does not seem to be a clean way which covers all the
> > cases without issues. I am just trying to think, what is a simple way
> > which covers most of the cases. Can we just stick to using submitting task
> > context to determine a bio's group (as cfq does). Which can result in
> > following.
> >
> > - Less code and reduced complexity.
> >
> > - Buffered writes will be charged to pdflush and its group. If one wish to
> > ?limit buffered write activity for pdflush, one can move all the pdflush
> > ?threads into a group and assign desired weight. Writes submitted in
> > ?process context will continue to be charged to that process irrespective
> > ?of the fact who dirtied that page.
>
> What if we wanted to control buffered write activity per group? If a
> group keeps dirtying pages, we wouldn't want it to dominate the disk
> IO capacity at the expense of other cgroups (by dominating the writes
> sent down by pdflush).

Yes, I think that is true.

> > - swap activity will be charged to kswapd and its group. If swap writes
> > ?are coming from process context, it gets charged to process and its
> > ?group.
> >
> > - If one is worried about the case of one process being charged for write
> > ?out of file mapped by another process during reclaim, then we can
> > ?probably make use of memory controller and mount memory controller and
> > ?io controller together on same hierarchy. I am told that with memory
> > ?controller, group's memory will be reclaimed by the process requesting
> > ?more memory. If that's the case, then IO will automatically be charged
> > ?to right group if we use submitting task context.
> >
> > I just wanted to bring this point forward for more discussions to know
> > what is the right thing to do? Use bio tracking or not.

Thanks for bringing it forward.

Thanks,
Ryo Tsuruta

2009-09-01 14:14:18

by Vivek Goyal

[permalink] [raw]

Subject: Re: [PATCH 18/23] io-controller: blkio_cgroup patches from Ryo to track async bios.

On Tue, Sep 01, 2009 at 04:00:04PM +0900, Ryo Tsuruta wrote:
> Hi,
>
> > > Hi Rik,
> > >
> > > Thanks for reviewing the patches. I wanted to have better understanding of
> > > where all does it help to associate a bio to the group of process who
> > > created/owned the page. Hence few thoughts.
> > >
> > > When a bio is submitted to IO scheduler, it needs to determine the group
> > > bio belongs to and group which should be charged to. There seem to be two
> > > methods.
> > >
> > > - Attribute the bio to cgroup submitting process belongs to.
> > > - For async requests, track the original owner hence cgroup of the page
> > > ?and charge that group for the bio.
> > >
> > > One can think of pros/cons of both the approaches.
> > >
> > > - The primary use case of tracking async context seems be that if a
> > > ?process T1 in group G1 mmaps a big file and then another process T2 in
> > > ?group G2, asks for memory and triggers reclaim and generates writes of
> > > ?the file pages mapped by T1, then these writes should not be charged to
> > > ?T2, hence blkio_cgroup pages.
> > >
> > > ?But the flip side of this might be that group G2 is a low weight group
> > > ?and probably too busy also right now, which will delay the write out
> > > ?and possibly T2 will wait longer for memory to be allocated.
>
> In order to avoid this wait, dm-ioband issues IO which has a page with
> PG_Reclaim as early as possible.
>

So in above case IO is still charged to G2 but you keep a track if page is
PG_Reclaim then releae the this bio before other bios queued up in the
group?

> > > - At one point of time Andrew mentioned that buffered writes are generally a
> > > ?big problem and one needs to map these to owner's group. Though I am not
> > > ?very sure what specific problem he was referring to. Can we attribute
> > > ?buffered writes to pdflush threads and move all pdflush threads in a
> > > ?cgroup to limit system wide write out activity?
>
> I think that buffered writes also should be controlled per cgroup as
> well as synchronous writes.
>

But it is hard to achieve fairness for buffered writes becase we don't
create complete parallel IO paths and not necessarily higher weight
process dispatches more buffered writes to IO scheduler. (Due to page
cache buffered write logic).

So in some cases we might see buffered write fairness and in other cases
not. For example, run two dd processes in two groups doing buffered writes
and it is hard to achieve fairness between these.

That's why the idea that if we can't ensure Buffered write vs Buffered
write fairness in all the cases, then does it make sense to attribute
buffered writes to pdflush and put pdflush threads into a separate group
to limit system wide write out activity.

> > > - Somebody also gave an example where there is a memory hogging process and
> > > ?possibly pushes out some processes to swap. It does not sound fair to
> > > ?charge those proccess for that swap writeout. These processes never
> > > ?requested swap IO.
>
> I think that swap writeouts should be charged to the memory hogging
> process, because the process consumes more resources and it should get
> a penalty.
>

A process requesting memory gets IO penalty? IMHO, swapping is a kernel
mechanism and kernel's way of providing extended RAM. If we want to solve
the issue of memory hogging by a process then right way to solve is to use
memory controller and not by charging the process for IO activity.
Instead, proabably a more suitable way is to charge swap activity to root
group (where by default all the kernel related activity goes).

> > > - If there are multiple buffered writers in the system, then those writers
> > > ?can also be forced to writeout some pages to disk before they are
> > > ?allowed to dirty more pages. As per the page cache design, any writer
> > > ?can pick any inode and start writing out pages. So it can happen a
> > > ?weight group task is writting out pages dirtied by a lower weight group
> > > ?task. If, async bio is mapped to owner's group, it might happen that
> > > ?higher weight group task might be made to sleep on lower weight group
> > > ?task because request descriptors are all consumed up.
>
> As mentioned above, in dm-ioband, the bio is charged to the page owner
> and issued immediately.

But you are doing it only for selected pages and not for all buffered
writes?

>
> > > It looks like there does not seem to be a clean way which covers all the
> > > cases without issues. I am just trying to think, what is a simple way
> > > which covers most of the cases. Can we just stick to using submitting task
> > > context to determine a bio's group (as cfq does). Which can result in
> > > following.
> > >
> > > - Less code and reduced complexity.
> > >
> > > - Buffered writes will be charged to pdflush and its group. If one wish to
> > > ?limit buffered write activity for pdflush, one can move all the pdflush
> > > ?threads into a group and assign desired weight. Writes submitted in
> > > ?process context will continue to be charged to that process irrespective
> > > ?of the fact who dirtied that page.
> >
> > What if we wanted to control buffered write activity per group? If a
> > group keeps dirtying pages, we wouldn't want it to dominate the disk
> > IO capacity at the expense of other cgroups (by dominating the writes
> > sent down by pdflush).
>
> Yes, I think that is true.
>

But anyway we are not able to gurantee this isolation in all the cases.
Again I go back to example of two dd threads doing buffered writes in two
groups.

I don't mind keeping it. Just wanted to make sure that we agree and
understand that keeping it does not mean that we get buffered write vs
buffered write isolation/fairness in all the cases.

> > > - swap activity will be charged to kswapd and its group. If swap writes
> > > ?are coming from process context, it gets charged to process and its
> > > ?group.
> > >
> > > - If one is worried about the case of one process being charged for write
> > > ?out of file mapped by another process during reclaim, then we can
> > > ?probably make use of memory controller and mount memory controller and
> > > ?io controller together on same hierarchy. I am told that with memory
> > > ?controller, group's memory will be reclaimed by the process requesting
> > > ?more memory. If that's the case, then IO will automatically be charged
> > > ?to right group if we use submitting task context.
> > >
> > > I just wanted to bring this point forward for more discussions to know
> > > what is the right thing to do? Use bio tracking or not.
>
> Thanks for bringing it forward.
>

Thanks
Vivek

2009-09-01 14:54:41

by Rik van Riel

[permalink] [raw]

Subject: Re: [PATCH 18/23] io-controller: blkio_cgroup patches from Ryo to track async bios.

Vivek Goyal wrote:
> On Tue, Sep 01, 2009 at 04:00:04PM +0900, Ryo Tsuruta wrote:

>> I think that swap writeouts should be charged to the memory hogging
>> process, because the process consumes more resources and it should get
>> a penalty.
>
> A process requesting memory gets IO penalty?

There is no easy answer here.

On the one hand, you want to charge the process that uses
the resources.

On the other hand, if a lower resource use / higher priority
process tries to free up some of those resources, it should
not have its IO requests penalized (and get slowed down)
because of something the first process did...

--
All rights reversed.

2009-09-01 18:02:30

by Nauman Rafique

[permalink] [raw]

Subject: Re: [PATCH 18/23] io-controller: blkio_cgroup patches from Ryo to track async bios.

On Tue, Sep 1, 2009 at 7:11 AM, Vivek Goyal<[email protected]> wrote:
> On Tue, Sep 01, 2009 at 04:00:04PM +0900, Ryo Tsuruta wrote:
>> Hi,
>>
>> > > Hi Rik,
>> > >
>> > > Thanks for reviewing the patches. I wanted to have better understanding of
>> > > where all does it help to associate a bio to the group of process who
>> > > created/owned the page. Hence few thoughts.
>> > >
>> > > When a bio is submitted to IO scheduler, it needs to determine the group
>> > > bio belongs to and group which should be charged to. There seem to be two
>> > > methods.
>> > >
>> > > - Attribute the bio to cgroup submitting process belongs to.
>> > > - For async requests, track the original owner hence cgroup of the page
>> > > ?and charge that group for the bio.
>> > >
>> > > One can think of pros/cons of both the approaches.
>> > >
>> > > - The primary use case of tracking async context seems be that if a
>> > > ?process T1 in group G1 mmaps a big file and then another process T2 in
>> > > ?group G2, asks for memory and triggers reclaim and generates writes of
>> > > ?the file pages mapped by T1, then these writes should not be charged to
>> > > ?T2, hence blkio_cgroup pages.
>> > >
>> > > ?But the flip side of this might be that group G2 is a low weight group
>> > > ?and probably too busy also right now, which will delay the write out
>> > > ?and possibly T2 will wait longer for memory to be allocated.
>>
>> In order to avoid this wait, dm-ioband issues IO which has a page with
>> PG_Reclaim as early as possible.
>>
>
> So in above case IO is still charged to G2 but you keep a track if page is
> PG_Reclaim then releae the this bio before other bios queued up in the
> group?
>
>> > > - At one point of time Andrew mentioned that buffered writes are generally a
>> > > ?big problem and one needs to map these to owner's group. Though I am not
>> > > ?very sure what specific problem he was referring to. Can we attribute
>> > > ?buffered writes to pdflush threads and move all pdflush threads in a
>> > > ?cgroup to limit system wide write out activity?
>>
>> I think that buffered writes also should be controlled per cgroup as
>> well as synchronous writes.
>>
>
> But it is hard to achieve fairness for buffered writes becase we don't
> create complete parallel IO paths and not necessarily higher weight
> process dispatches more buffered writes to IO scheduler. (Due to page
> cache buffered write logic).
>
> So in some cases we might see buffered write fairness and in other cases
> not. For example, run two dd processes in two groups doing buffered writes
> and it is hard to achieve fairness between these.

If something is broken, we don't necessarily have to break it further.
Instead, we should be thinking about why its hard to achieve fairness
with buffered write back. Is there a way to change the writeback path
to send down a constant stream of IO, instead of sending down bursts?

>
> That's why the idea that if we can't ensure Buffered write vs Buffered
> write fairness in all the cases, then does it make sense to attribute
> buffered writes to pdflush and put pdflush threads into a separate group
> to limit system wide write out activity.
>
>> > > - Somebody also gave an example where there is a memory hogging process and
>> > > ?possibly pushes out some processes to swap. It does not sound fair to
>> > > ?charge those proccess for that swap writeout. These processes never
>> > > ?requested swap IO.
>>
>> I think that swap writeouts should be charged to the memory hogging
>> process, because the process consumes more resources and it should get
>> a penalty.
>>
>
> A process requesting memory gets IO penalty? IMHO, swapping is a kernel
> mechanism and kernel's way of providing extended RAM. If we want to solve
> the issue of memory hogging by a process then right way to solve is to use
> memory controller and not by charging the process for IO activity.
> Instead, proabably a more suitable way is to charge swap activity to root
> group (where by default all the kernel related activity goes).
>
>> > > - If there are multiple buffered writers in the system, then those writers
>> > > ?can also be forced to writeout some pages to disk before they are
>> > > ?allowed to dirty more pages. As per the page cache design, any writer
>> > > ?can pick any inode and start writing out pages. So it can happen a
>> > > ?weight group task is writting out pages dirtied by a lower weight group
>> > > ?task. If, async bio is mapped to owner's group, it might happen that
>> > > ?higher weight group task might be made to sleep on lower weight group
>> > > ?task because request descriptors are all consumed up.
>>
>> As mentioned above, in dm-ioband, the bio is charged to the page owner
>> and issued immediately.
>
> But you are doing it only for selected pages and not for all buffered
> writes?
>
>>
>> > > It looks like there does not seem to be a clean way which covers all the
>> > > cases without issues. I am just trying to think, what is a simple way
>> > > which covers most of the cases. Can we just stick to using submitting task
>> > > context to determine a bio's group (as cfq does). Which can result in
>> > > following.
>> > >
>> > > - Less code and reduced complexity.
>> > >
>> > > - Buffered writes will be charged to pdflush and its group. If one wish to
>> > > ?limit buffered write activity for pdflush, one can move all the pdflush
>> > > ?threads into a group and assign desired weight. Writes submitted in
>> > > ?process context will continue to be charged to that process irrespective
>> > > ?of the fact who dirtied that page.
>> >
>> > What if we wanted to control buffered write activity per group? If a
>> > group keeps dirtying pages, we wouldn't want it to dominate the disk
>> > IO capacity at the expense of other cgroups (by dominating the writes
>> > sent down by pdflush).
>>
>> Yes, I think that is true.
>>
>
> But anyway we are not able to gurantee this isolation in all the cases.
> Again I go back to example of two dd threads doing buffered writes in two
> groups.
>
> I don't mind keeping it. Just wanted to make sure that we agree and
> understand that keeping it does not mean that we get buffered write vs
> buffered write isolation/fairness in all the cases.
>
>> > > - swap activity will be charged to kswapd and its group. If swap writes
>> > > ?are coming from process context, it gets charged to process and its
>> > > ?group.
>> > >
>> > > - If one is worried about the case of one process being charged for write
>> > > ?out of file mapped by another process during reclaim, then we can
>> > > ?probably make use of memory controller and mount memory controller and
>> > > ?io controller together on same hierarchy. I am told that with memory
>> > > ?controller, group's memory will be reclaimed by the process requesting
>> > > ?more memory. If that's the case, then IO will automatically be charged
>> > > ?to right group if we use submitting task context.
>> > >
>> > > I just wanted to bring this point forward for more discussions to know
>> > > what is the right thing to do? Use bio tracking or not.
>>
>> Thanks for bringing it forward.
>>
>
> Thanks
> Vivek
>

2009-09-02 01:01:27

by Kamezawa Hiroyuki

[permalink] [raw]

Subject: Re: [PATCH 18/23] io-controller: blkio_cgroup patches from Ryo to track async bios.

On Tue, 1 Sep 2009 10:11:42 -0400
Vivek Goyal <[email protected]> wrote:
> > > > - Somebody also gave an example where there is a memory hogging process and
> > > > possibly pushes out some processes to swap. It does not sound fair to
> > > > charge those proccess for that swap writeout. These processes never
> > > > requested swap IO.
> >
> > I think that swap writeouts should be charged to the memory hogging
> > process, because the process consumes more resources and it should get
> > a penalty.
> >
>
> A process requesting memory gets IO penalty? IMHO, swapping is a kernel
> mechanism and kernel's way of providing extended RAM. If we want to solve
> the issue of memory hogging by a process then right way to solve is to use
> memory controller and not by charging the process for IO activity.
> Instead, proabably a more suitable way is to charge swap activity to root
> group (where by default all the kernel related activity goes).
>

I agree. It't memcg's job.
(Support dirty_ratio in memcg is necessary, I think)

background-write-out-to-swap-for-memory-shortage should be handled
as kernel I/O. If swap-out-by-memcg bacause of its limit is a problem,
dirty_ratio for memcg should be implemetned.

Thanks,
-Kame

2009-09-02 03:12:34

by Balbir Singh

[permalink] [raw]

Subject: Re: [PATCH 18/23] io-controller: blkio_cgroup patches from Ryo to track async bios.

On Wed, Sep 2, 2009 at 6:29 AM, KAMEZAWA
Hiroyuki<[email protected]> wrote:
> On Tue, 1 Sep 2009 10:11:42 -0400
> Vivek Goyal <[email protected]> wrote:
>> > > > - Somebody also gave an example where there is a memory hogging process and
>> > > > ?possibly pushes out some processes to swap. It does not sound fair to
>> > > > ?charge those proccess for that swap writeout. These processes never
>> > > > ?requested swap IO.
>> >
>> > I think that swap writeouts should be charged to the memory hogging
>> > process, because the process consumes more resources and it should get
>> > a penalty.
>> >
>>
>> A process requesting memory gets IO penalty? IMHO, swapping is a kernel
>> mechanism and kernel's way of providing extended RAM. If we want to solve
>> the issue of memory hogging by a process then right way to solve is to use
>> memory controller and not by charging the process for IO activity.
>> Instead, proabably a more suitable way is to charge swap activity to root
>> group (where by default all the kernel related activity goes).
>>
>
> I agree. It't memcg's job.
> (Support dirty_ratio in memcg is necessary, I think)
>
> background-write-out-to-swap-for-memory-shortage should be handled
> as kernel I/O. If swap-out-by-memcg bacause of its limit is a problem,
> dirty_ratio for memcg should be implemetned.

I tend to agree, looks like dirty_ratio will become important along
with overcommit support in the future.

Balbir Singh.

2009-09-02 09:52:50

by Ryo Tsuruta

[permalink] [raw]

Subject: Re: [PATCH 18/23] io-controller: blkio_cgroup patches from Ryo to track async bios.

Hi Vivek,

> > > > - The primary use case of tracking async context seems be that if a
> > > > ?process T1 in group G1 mmaps a big file and then another process T2 in
> > > > ?group G2, asks for memory and triggers reclaim and generates writes of
> > > > ?the file pages mapped by T1, then these writes should not be charged to
> > > > ?T2, hence blkio_cgroup pages.
> > > >
> > > > ?But the flip side of this might be that group G2 is a low weight group
> > > > ?and probably too busy also right now, which will delay the write out
> > > > ?and possibly T2 will wait longer for memory to be allocated.
> >
> > In order to avoid this wait, dm-ioband issues IO which has a page with
> > PG_Reclaim as early as possible.
> >
>
> So in above case IO is still charged to G2 but you keep a track if page is
> PG_Reclaim then releae the this bio before other bios queued up in the
> group?

Yes, the bio with PG_Reclaim page is given priority over the other bios.

> > > > - At one point of time Andrew mentioned that buffered writes are generally a
> > > > ?big problem and one needs to map these to owner's group. Though I am not
> > > > ?very sure what specific problem he was referring to. Can we attribute
> > > > ?buffered writes to pdflush threads and move all pdflush threads in a
> > > > ?cgroup to limit system wide write out activity?
> >
> > I think that buffered writes also should be controlled per cgroup as
> > well as synchronous writes.
> >
>
> But it is hard to achieve fairness for buffered writes becase we don't
> create complete parallel IO paths and not necessarily higher weight
> process dispatches more buffered writes to IO scheduler. (Due to page
> cache buffered write logic).
>
> So in some cases we might see buffered write fairness and in other cases
> not. For example, run two dd processes in two groups doing buffered writes
> and it is hard to achieve fairness between these.
>
> That's why the idea that if we can't ensure Buffered write vs Buffered
> write fairness in all the cases, then does it make sense to attribute
> buffered writes to pdflush and put pdflush threads into a separate group
> to limit system wide write out activity.

If all buffered writes are treated as system wide activities, it does
not mean that bandwidth is being controlled. It is true that pdflush
doesn't do I/O according to weight, but bandwidth (including for
bufferd writes) should be reserved for each cgroup.

> > > > - Somebody also gave an example where there is a memory hogging process and
> > > > ?possibly pushes out some processes to swap. It does not sound fair to
> > > > ?charge those proccess for that swap writeout. These processes never
> > > > ?requested swap IO.
> >
> > I think that swap writeouts should be charged to the memory hogging
> > process, because the process consumes more resources and it should get
> > a penalty.
> >
>
> A process requesting memory gets IO penalty? IMHO, swapping is a kernel
> mechanism and kernel's way of providing extended RAM. If we want to solve
> the issue of memory hogging by a process then right way to solve is to use
> memory controller and not by charging the process for IO activity.
> Instead, proabably a more suitable way is to charge swap activity to root
> group (where by default all the kernel related activity goes).

No. In the current blkio-cgroup, a process which uses a large amount
of memory gets penalty, not a memory requester.

As you wrote, using both io-controller and memory controller are
required to prevent swap-out caused by memory consumption on another
cgroup.

> > > > - If there are multiple buffered writers in the system, then those writers
> > > > ?can also be forced to writeout some pages to disk before they are
> > > > ?allowed to dirty more pages. As per the page cache design, any writer
> > > > ?can pick any inode and start writing out pages. So it can happen a
> > > > ?weight group task is writting out pages dirtied by a lower weight group
> > > > ?task. If, async bio is mapped to owner's group, it might happen that
> > > > ?higher weight group task might be made to sleep on lower weight group
> > > > ?task because request descriptors are all consumed up.
> >
> > As mentioned above, in dm-ioband, the bio is charged to the page owner
> > and issued immediately.
>
> But you are doing it only for selected pages and not for all buffered
> writes?

I'm sorry, I wrote wrong on the previous mail, IO for writing out
page-cache pages is not issued immediately, it is throttled by
dm-ioband.

Anyway, there is a case where a higher weight group task is made
to sleep, but if we reserve the memory for each cgroup by memory
controller in advance, we can avoid the task put to sleep.

Thanks,
Ryo Tsuruta

2009-09-02 13:59:10

by Vivek Goyal

[permalink] [raw]

Subject: Re: [PATCH 18/23] io-controller: blkio_cgroup patches from Ryo to track async bios.

On Wed, Sep 02, 2009 at 06:52:51PM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
>
> > > > > - The primary use case of tracking async context seems be that if a
> > > > > ?process T1 in group G1 mmaps a big file and then another process T2 in
> > > > > ?group G2, asks for memory and triggers reclaim and generates writes of
> > > > > ?the file pages mapped by T1, then these writes should not be charged to
> > > > > ?T2, hence blkio_cgroup pages.
> > > > >
> > > > > ?But the flip side of this might be that group G2 is a low weight group
> > > > > ?and probably too busy also right now, which will delay the write out
> > > > > ?and possibly T2 will wait longer for memory to be allocated.
> > >
> > > In order to avoid this wait, dm-ioband issues IO which has a page with
> > > PG_Reclaim as early as possible.
> > >
> >
> > So in above case IO is still charged to G2 but you keep a track if page is
> > PG_Reclaim then releae the this bio before other bios queued up in the
> > group?
>
> Yes, the bio with PG_Reclaim page is given priority over the other bios.
>
> > > > > - At one point of time Andrew mentioned that buffered writes are generally a
> > > > > ?big problem and one needs to map these to owner's group. Though I am not
> > > > > ?very sure what specific problem he was referring to. Can we attribute
> > > > > ?buffered writes to pdflush threads and move all pdflush threads in a
> > > > > ?cgroup to limit system wide write out activity?
> > >
> > > I think that buffered writes also should be controlled per cgroup as
> > > well as synchronous writes.
> > >
> >
> > But it is hard to achieve fairness for buffered writes becase we don't
> > create complete parallel IO paths and not necessarily higher weight
> > process dispatches more buffered writes to IO scheduler. (Due to page
> > cache buffered write logic).
> >
> > So in some cases we might see buffered write fairness and in other cases
> > not. For example, run two dd processes in two groups doing buffered writes
> > and it is hard to achieve fairness between these.
> >
> > That's why the idea that if we can't ensure Buffered write vs Buffered
> > write fairness in all the cases, then does it make sense to attribute
> > buffered writes to pdflush and put pdflush threads into a separate group
> > to limit system wide write out activity.
>
> If all buffered writes are treated as system wide activities, it does
> not mean that bandwidth is being controlled. It is true that pdflush
> doesn't do I/O according to weight, but bandwidth (including for
> bufferd writes) should be reserved for each cgroup.
>
> > > > > - Somebody also gave an example where there is a memory hogging process and
> > > > > ?possibly pushes out some processes to swap. It does not sound fair to
> > > > > ?charge those proccess for that swap writeout. These processes never
> > > > > ?requested swap IO.
> > >
> > > I think that swap writeouts should be charged to the memory hogging
> > > process, because the process consumes more resources and it should get
> > > a penalty.
> > >
> >
> > A process requesting memory gets IO penalty? IMHO, swapping is a kernel
> > mechanism and kernel's way of providing extended RAM. If we want to solve
> > the issue of memory hogging by a process then right way to solve is to use
> > memory controller and not by charging the process for IO activity.
> > Instead, proabably a more suitable way is to charge swap activity to root
> > group (where by default all the kernel related activity goes).
>
> No. In the current blkio-cgroup, a process which uses a large amount
> of memory gets penalty, not a memory requester.
>

At ioband level you just get to see bio and page. How do you decide wheter
this bio is being issued by a process which is a memory hog?

In fact requester of memory could be anybody. It could be memory hog or a
different process. So are you saying that you got a mechanism where you
can detect that a process is memory hog and charge swap activity to it.
IOW, if there are two processes A and B and assume A is the memory hog and
then B requests for memory which triggers lot of swap IO, then you can
charge all that IO to memory hog A?

Can you please point me to the relevant code in dm-ioband?

IMHO, to keep things simple, all swapping activity should be charged to
root group and be considered as kernel activity and user space not be
charged for that.

Thanks
Vivek

> As you wrote, using both io-controller and memory controller are
> required to prevent swap-out caused by memory consumption on another
> cgroup.
>
> > > > > - If there are multiple buffered writers in the system, then those writers
> > > > > ?can also be forced to writeout some pages to disk before they are
> > > > > ?allowed to dirty more pages. As per the page cache design, any writer
> > > > > ?can pick any inode and start writing out pages. So it can happen a
> > > > > ?weight group task is writting out pages dirtied by a lower weight group
> > > > > ?task. If, async bio is mapped to owner's group, it might happen that
> > > > > ?higher weight group task might be made to sleep on lower weight group
> > > > > ?task because request descriptors are all consumed up.
> > >
> > > As mentioned above, in dm-ioband, the bio is charged to the page owner
> > > and issued immediately.
> >
> > But you are doing it only for selected pages and not for all buffered
> > writes?
>
> I'm sorry, I wrote wrong on the previous mail, IO for writing out
> page-cache pages is not issued immediately, it is throttled by
> dm-ioband.
>
> Anyway, there is a case where a higher weight group task is made
> to sleep, but if we reserve the memory for each cgroup by memory
> controller in advance, we can avoid the task put to sleep.
>
> Thanks,
> Ryo Tsuruta

2009-09-03 02:24:22

by Ryo Tsuruta

[permalink] [raw]

Subject: Re: [PATCH 18/23] io-controller: blkio_cgroup patches from Ryo to track async bios.

Hi Vivek,

Vivek Goyal <[email protected]> wrote:
> > > > > > - Somebody also gave an example where there is a memory hogging process and
> > > > > > ?possibly pushes out some processes to swap. It does not sound fair to
> > > > > > ?charge those proccess for that swap writeout. These processes never
> > > > > > ?requested swap IO.
> > > >
> > > > I think that swap writeouts should be charged to the memory hogging
> > > > process, because the process consumes more resources and it should get
> > > > a penalty.
> > > >
> > >
> > > A process requesting memory gets IO penalty? IMHO, swapping is a kernel
> > > mechanism and kernel's way of providing extended RAM. If we want to solve
> > > the issue of memory hogging by a process then right way to solve is to use
> > > memory controller and not by charging the process for IO activity.
> > > Instead, proabably a more suitable way is to charge swap activity to root
> > > group (where by default all the kernel related activity goes).
> >
> > No. In the current blkio-cgroup, a process which uses a large amount
> > of memory gets penalty, not a memory requester.
> >
>
> At ioband level you just get to see bio and page. How do you decide wheter
> this bio is being issued by a process which is a memory hog?
>
> In fact requester of memory could be anybody. It could be memory hog or a
> different process. So are you saying that you got a mechanism where you
> can detect that a process is memory hog and charge swap activity to it.
> IOW, if there are two processes A and B and assume A is the memory hog and
> then B requests for memory which triggers lot of swap IO, then you can
> charge all that IO to memory hog A?

When an annoymou page is allocated, blkio-cgroup sets an ID to the
page. And then when the page is going to swap out, dm-ioband can know
who the owner of the page is by retrieving ID from the page.

In the above case, since the pages of the process A are swapped-out,
dm-ioband charges swap IO to the process A.

> Can you please point me to the relevant code in dm-ioband?
>
> IMHO, to keep things simple, all swapping activity should be charged to
> root group and be considered as kernel activity and user space not be
> charged for that.

Thanks,
Ryo Tsuruta

2009-09-03 02:41:59

by Vivek Goyal

[permalink] [raw]

Subject: Re: [PATCH 18/23] io-controller: blkio_cgroup patches from Ryo to track async bios.

On Thu, Sep 03, 2009 at 11:24:23AM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
>
> Vivek Goyal <[email protected]> wrote:
> > > > > > > - Somebody also gave an example where there is a memory hogging process and
> > > > > > > ?possibly pushes out some processes to swap. It does not sound fair to
> > > > > > > ?charge those proccess for that swap writeout. These processes never
> > > > > > > ?requested swap IO.
> > > > >
> > > > > I think that swap writeouts should be charged to the memory hogging
> > > > > process, because the process consumes more resources and it should get
> > > > > a penalty.
> > > > >
> > > >
> > > > A process requesting memory gets IO penalty? IMHO, swapping is a kernel
> > > > mechanism and kernel's way of providing extended RAM. If we want to solve
> > > > the issue of memory hogging by a process then right way to solve is to use
> > > > memory controller and not by charging the process for IO activity.
> > > > Instead, proabably a more suitable way is to charge swap activity to root
> > > > group (where by default all the kernel related activity goes).
> > >
> > > No. In the current blkio-cgroup, a process which uses a large amount
> > > of memory gets penalty, not a memory requester.
> > >
> >
> > At ioband level you just get to see bio and page. How do you decide wheter
> > this bio is being issued by a process which is a memory hog?
> >
> > In fact requester of memory could be anybody. It could be memory hog or a
> > different process. So are you saying that you got a mechanism where you
> > can detect that a process is memory hog and charge swap activity to it.
> > IOW, if there are two processes A and B and assume A is the memory hog and
> > then B requests for memory which triggers lot of swap IO, then you can
> > charge all that IO to memory hog A?
>
> When an annoymou page is allocated, blkio-cgroup sets an ID to the
> page. And then when the page is going to swap out, dm-ioband can know
> who the owner of the page is by retrieving ID from the page.
>
> In the above case, since the pages of the process A are swapped-out,
> dm-ioband charges swap IO to the process A.
>

But this does not mean that in all cases memory hog is being charged for
swap IO, as you have said. So if a process A has done some anonymous page
allocations and later a memory hog B comes in and forces swap out of A,
you will charge A for swap activity which does not seem fair as B is
memory hog here?

Thanks
Vivek

> > Can you please point me to the relevant code in dm-ioband?
> >
> > IMHO, to keep things simple, all swapping activity should be charged to
> > root group and be considered as kernel activity and user space not be
> > charged for that.
>
> Thanks,
> Ryo Tsuruta

2009-09-03 03:41:44

by Ryo Tsuruta

[permalink] [raw]

Subject: Re: [PATCH 18/23] io-controller: blkio_cgroup patches from Ryo to track async bios.

Hi Vivek,

Vivek Goyal <[email protected]> wrote:
> > > At ioband level you just get to see bio and page. How do you decide wheter
> > > this bio is being issued by a process which is a memory hog?
> > >
> > > In fact requester of memory could be anybody. It could be memory hog or a
> > > different process. So are you saying that you got a mechanism where you
> > > can detect that a process is memory hog and charge swap activity to it.
> > > IOW, if there are two processes A and B and assume A is the memory hog and
> > > then B requests for memory which triggers lot of swap IO, then you can
> > > charge all that IO to memory hog A?
> >
> > When an annoymou page is allocated, blkio-cgroup sets an ID to the
> > page. And then when the page is going to swap out, dm-ioband can know
> > who the owner of the page is by retrieving ID from the page.
> >
> > In the above case, since the pages of the process A are swapped-out,
> > dm-ioband charges swap IO to the process A.
> >
>
> But this does not mean that in all cases memory hog is being charged for
> swap IO, as you have said. So if a process A has done some anonymous page
> allocations and later a memory hog B comes in and forces swap out of A,
> you will charge A for swap activity which does not seem fair as B is
> memory hog here?

I think this charging policy is not bad, but I think I can understand
you think it's not fair. Do you think it's fair if all IO is carged to B?

We should use both io and memory controller together, as you wrote,

Thanks,
Ryo Tsuruta