2010-04-14 06:56:27

by Greg Thelen

[permalink] [raw]
Subject: Re: [PATCH -mmotm 1/5] memcg: disable irq at page cgroup lock

On Thu, Mar 18, 2010 at 8:00 PM, KAMEZAWA Hiroyuki <[email protected]> wrote:
> On Fri, 19 Mar 2010 08:10:39 +0530
> Balbir Singh <[email protected]> wrote:
>
>> * KAMEZAWA Hiroyuki <[email protected]> [2010-03-19 10:23:32]:
>>
>> > On Thu, 18 Mar 2010 21:58:55 +0530
>> > Balbir Singh <[email protected]> wrote:
>> >
>> > > * KAMEZAWA Hiroyuki <[email protected]> [2010-03-18 13:35:27]:
>> >
>> > > > Then, no probelm. It's ok to add mem_cgroup_udpate_stat() indpendent from
>> > > > mem_cgroup_update_file_mapped(). The look may be messy but it's not your
>> > > > fault. But please write "why add new function" to patch description.
>> > > >
>> > > > I'm sorry for wasting your time.
>> > >
>> > > Do we need to go down this route? We could check the stat and do the
>> > > correct thing. In case of FILE_MAPPED, always grab page_cgroup_lock
>> > > and for others potentially look at trylock. It is OK for different
>> > > stats to be protected via different locks.
>> > >
>> >
>> > I _don't_ want to see a mixture of spinlock and trylock in a function.
>> >
>>
>> A well documented well written function can help. The other thing is to
>> of-course solve this correctly by introducing different locking around
>> the statistics. Are you suggesting the later?
>>
>
> No. As I wrote.
>        - don't modify codes around FILE_MAPPED in this series.
>        - add a new functions for new statistics
> Then,
>        - think about clean up later, after we confirm all things work as expected.

I have ported Andrea Righi's memcg dirty page accounting patches to latest
mmtom-2010-04-05-16-09. In doing so I have to address this locking issue. Does
the following look good? I will (of course) submit the entire patch for review,
but I wanted make sure I was aiming in the right direction.

void mem_cgroup_update_page_stat(struct page *page,
enum mem_cgroup_write_page_stat_item idx, bool charge)
{
static int seq;
struct page_cgroup *pc;

if (mem_cgroup_disabled())
return;
pc = lookup_page_cgroup(page);
if (!pc || mem_cgroup_is_root(pc->mem_cgroup))
return;

/*
* This routine does not disable irq when updating stats. So it is
* possible that a stat update from within interrupt routine, could
* deadlock. Use trylock_page_cgroup() to avoid such deadlock. This
* makes the memcg counters fuzzy. More complicated, or lower
* performing locking solutions avoid this fuzziness, but are not
* currently needed.
*/
if (irqs_disabled()) {
if (! trylock_page_cgroup(pc))
return;
} else
lock_page_cgroup(pc);

__mem_cgroup_update_page_stat(pc, idx, charge);
unlock_page_cgroup(pc);
}

__mem_cgroup_update_page_stat() has a switch statement that updates all of the
MEMCG_NR_FILE_{MAPPED,DIRTY,WRITEBACK,WRITEBACK_TEMP,UNSTABLE_NFS} counters
using the following form:
switch (idx) {
case MEMCG_NR_FILE_MAPPED:
if (charge) {
if (!PageCgroupFileMapped(pc))
SetPageCgroupFileMapped(pc);
else
val = 0;
} else {
if (PageCgroupFileMapped(pc))
ClearPageCgroupFileMapped(pc);
else
val = 0;
}
idx = MEM_CGROUP_STAT_FILE_MAPPED;
break;

...
}

/*
* Preemption is already disabled. We can use __this_cpu_xxx
*/
if (val > 0) {
__this_cpu_inc(mem->stat->count[idx]);
} else if (val < 0) {
__this_cpu_dec(mem->stat->count[idx]);
}

In my current tree, irq is never saved/restored by cgroup locking code. To
protect against interrupt reentrancy, trylock_page_cgroup() is used. As the
comment indicates, this makes the new counters fuzzy.

--
Greg


2010-04-14 09:33:14

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH -mmotm 1/5] memcg: disable irq at page cgroup lock

On Tue, 13 Apr 2010 23:55:12 -0700
Greg Thelen <[email protected]> wrote:

> On Thu, Mar 18, 2010 at 8:00 PM, KAMEZAWA Hiroyuki <[email protected]> wrote:
> > On Fri, 19 Mar 2010 08:10:39 +0530
> > Balbir Singh <[email protected]> wrote:
> >
> >> * KAMEZAWA Hiroyuki <[email protected]> [2010-03-19 10:23:32]:
> >>
> >> > On Thu, 18 Mar 2010 21:58:55 +0530
> >> > Balbir Singh <[email protected]> wrote:
> >> >
> >> > > * KAMEZAWA Hiroyuki <[email protected]> [2010-03-18 13:35:27]:
> >> >
> >> > > > Then, no probelm. It's ok to add mem_cgroup_udpate_stat() indpendent from
> >> > > > mem_cgroup_update_file_mapped(). The look may be messy but it's not your
> >> > > > fault. But please write "why add new function" to patch description.
> >> > > >
> >> > > > I'm sorry for wasting your time.
> >> > >
> >> > > Do we need to go down this route? We could check the stat and do the
> >> > > correct thing. In case of FILE_MAPPED, always grab page_cgroup_lock
> >> > > and for others potentially look at trylock. It is OK for different
> >> > > stats to be protected via different locks.
> >> > >
> >> >
> >> > I _don't_ want to see a mixture of spinlock and trylock in a function.
> >> >
> >>
> >> A well documented well written function can help. The other thing is to
> >> of-course solve this correctly by introducing different locking around
> >> the statistics. Are you suggesting the later?
> >>
> >
> > No. As I wrote.
> >        - don't modify codes around FILE_MAPPED in this series.
> >        - add a new functions for new statistics
> > Then,
> >        - think about clean up later, after we confirm all things work as expected.
>
> I have ported Andrea Righi's memcg dirty page accounting patches to latest
> mmtom-2010-04-05-16-09. In doing so I have to address this locking issue. Does
> the following look good? I will (of course) submit the entire patch for review,
> but I wanted make sure I was aiming in the right direction.
>
> void mem_cgroup_update_page_stat(struct page *page,
> enum mem_cgroup_write_page_stat_item idx, bool charge)
> {
> static int seq;
> struct page_cgroup *pc;
>
> if (mem_cgroup_disabled())
> return;
> pc = lookup_page_cgroup(page);
> if (!pc || mem_cgroup_is_root(pc->mem_cgroup))
> return;
>
> /*
> * This routine does not disable irq when updating stats. So it is
> * possible that a stat update from within interrupt routine, could
> * deadlock. Use trylock_page_cgroup() to avoid such deadlock. This
> * makes the memcg counters fuzzy. More complicated, or lower
> * performing locking solutions avoid this fuzziness, but are not
> * currently needed.
> */
> if (irqs_disabled()) {
> if (! trylock_page_cgroup(pc))
> return;
> } else
> lock_page_cgroup(pc);
>

I prefer trylock_page_cgroup() always.

I have another idea fixing this up _later_. (But I want to start from simple one.)

My rough idea is following. Similar to your idea which you gave me before.

==
DEFINE_PERCPU(account_move_ongoing);
DEFINE_MUTEX(move_account_mutex):

void memcg_start_account_move(void)
{
mutex_lock(&move_account_mutex);
for_each_online_cpu(cpu)
per_cpu(cpu, account_move_ongoing) += 1;
mutex_unlock(&move_account_mutex);
/* Wait until there are no lockless update */
synchronize_rcu();
return;
}

void memcg_end_account_move(void)
{
mutex_lock(&move_account_mutex);
for_each_online_cpu(cpu)
per_cpu(cpu, account_move_ongoing) -= 1;
mutex_unlock(&move_account_mutex);
}

/* return 1 when we took lock, return 0 if lockess OPs is guarantedd to be safe */
int memcg_start_filecache_accounting(struct page_cgroup *pc)
{
rcu_read_lock();
smp_rmb();
if (!this_cpu_read(move_account_ongoing))
return 0; /* no move account is ongoing */
lock_page_cgroup(pc);
return 1;
}

void memcg_end_filecache_accounting(struct page_cgroup *pc, int unlock)
{
if (unlock)
unlock_page_cgroup(pc);

rcu_read_unlock();
}

and call memcg_start_account_move()/end_account_move() in the start/end of
migrainting chunk of pages.

Bye.
-Kame



















2010-04-14 14:06:06

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH -mmotm 1/5] memcg: disable irq at page cgroup lock

On Wed, Apr 14, 2010 at 06:29:04PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 13 Apr 2010 23:55:12 -0700
> Greg Thelen <[email protected]> wrote:
>
> > On Thu, Mar 18, 2010 at 8:00 PM, KAMEZAWA Hiroyuki <[email protected]> wrote:
> > > On Fri, 19 Mar 2010 08:10:39 +0530
> > > Balbir Singh <[email protected]> wrote:
> > >
> > >> * KAMEZAWA Hiroyuki <[email protected]> [2010-03-19 10:23:32]:
> > >>
> > >> > On Thu, 18 Mar 2010 21:58:55 +0530
> > >> > Balbir Singh <[email protected]> wrote:
> > >> >
> > >> > > * KAMEZAWA Hiroyuki <[email protected]> [2010-03-18 13:35:27]:
> > >> >
> > >> > > > Then, no probelm. It's ok to add mem_cgroup_udpate_stat() indpendent from
> > >> > > > mem_cgroup_update_file_mapped(). The look may be messy but it's not your
> > >> > > > fault. But please write "why add new function" to patch description.
> > >> > > >
> > >> > > > I'm sorry for wasting your time.
> > >> > >
> > >> > > Do we need to go down this route? We could check the stat and do the
> > >> > > correct thing. In case of FILE_MAPPED, always grab page_cgroup_lock
> > >> > > and for others potentially look at trylock. It is OK for different
> > >> > > stats to be protected via different locks.
> > >> > >
> > >> >
> > >> > I _don't_ want to see a mixture of spinlock and trylock in a function.
> > >> >
> > >>
> > >> A well documented well written function can help. The other thing is to
> > >> of-course solve this correctly by introducing different locking around
> > >> the statistics. Are you suggesting the later?
> > >>
> > >
> > > No. As I wrote.
> > > ? ? ? ?- don't modify codes around FILE_MAPPED in this series.
> > > ? ? ? ?- add a new functions for new statistics
> > > Then,
> > > ? ? ? ?- think about clean up later, after we confirm all things work as expected.
> >
> > I have ported Andrea Righi's memcg dirty page accounting patches to latest
> > mmtom-2010-04-05-16-09. In doing so I have to address this locking issue. Does
> > the following look good? I will (of course) submit the entire patch for review,
> > but I wanted make sure I was aiming in the right direction.
> >
> > void mem_cgroup_update_page_stat(struct page *page,
> > enum mem_cgroup_write_page_stat_item idx, bool charge)
> > {
> > static int seq;
> > struct page_cgroup *pc;
> >
> > if (mem_cgroup_disabled())
> > return;
> > pc = lookup_page_cgroup(page);
> > if (!pc || mem_cgroup_is_root(pc->mem_cgroup))
> > return;
> >
> > /*
> > * This routine does not disable irq when updating stats. So it is
> > * possible that a stat update from within interrupt routine, could
> > * deadlock. Use trylock_page_cgroup() to avoid such deadlock. This
> > * makes the memcg counters fuzzy. More complicated, or lower
> > * performing locking solutions avoid this fuzziness, but are not
> > * currently needed.
> > */
> > if (irqs_disabled()) {
> > if (! trylock_page_cgroup(pc))
> > return;
> > } else
> > lock_page_cgroup(pc);
> >
>
> I prefer trylock_page_cgroup() always.
>
> I have another idea fixing this up _later_. (But I want to start from simple one.)
>
> My rough idea is following. Similar to your idea which you gave me before.
>
> ==
> DEFINE_PERCPU(account_move_ongoing);
> DEFINE_MUTEX(move_account_mutex):
>
> void memcg_start_account_move(void)
> {
> mutex_lock(&move_account_mutex);
> for_each_online_cpu(cpu)
> per_cpu(cpu, account_move_ongoing) += 1;
> mutex_unlock(&move_account_mutex);
> /* Wait until there are no lockless update */
> synchronize_rcu();
> return;
> }
>
> void memcg_end_account_move(void)
> {
> mutex_lock(&move_account_mutex);
> for_each_online_cpu(cpu)
> per_cpu(cpu, account_move_ongoing) -= 1;
> mutex_unlock(&move_account_mutex);
> }
>
> /* return 1 when we took lock, return 0 if lockess OPs is guarantedd to be safe */
> int memcg_start_filecache_accounting(struct page_cgroup *pc)
> {
> rcu_read_lock();
> smp_rmb();
> if (!this_cpu_read(move_account_ongoing))
> return 0; /* no move account is ongoing */
> lock_page_cgroup(pc);
> return 1;
> }
>
> void memcg_end_filecache_accounting(struct page_cgroup *pc, int unlock)
> {
> if (unlock)
> unlock_page_cgroup(pc);
>
> rcu_read_unlock();
> }
>
> and call memcg_start_account_move()/end_account_move() in the start/end of
> migrainting chunk of pages.
>

Hi Kame-san,

May be I am missing something but how does it solve the issue of making sure
lock_page_cgroup() is not held in interrupt context? IIUC, above code will
make sure that for file cache accouting, lock_page_cgroup() is taken only
if task migration is on. But say task migration is on, and then some IO
completes and we update WRITEBACK stat (i think this is the one which can
be called from interrupt context), then we will still take the
lock_page_cgroup() and again run into the issue of deadlocks?

Thanks
Vivek

2010-04-14 14:07:52

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH -mmotm 1/5] memcg: disable irq at page cgroup lock

On Tue, Apr 13, 2010 at 11:55:12PM -0700, Greg Thelen wrote:
> On Thu, Mar 18, 2010 at 8:00 PM, KAMEZAWA Hiroyuki <[email protected]> wrote:
> > On Fri, 19 Mar 2010 08:10:39 +0530
> > Balbir Singh <[email protected]> wrote:
> >
> >> * KAMEZAWA Hiroyuki <[email protected]> [2010-03-19 10:23:32]:
> >>
> >> > On Thu, 18 Mar 2010 21:58:55 +0530
> >> > Balbir Singh <[email protected]> wrote:
> >> >
> >> > > * KAMEZAWA Hiroyuki <[email protected]> [2010-03-18 13:35:27]:
> >> >
> >> > > > Then, no probelm. It's ok to add mem_cgroup_udpate_stat() indpendent from
> >> > > > mem_cgroup_update_file_mapped(). The look may be messy but it's not your
> >> > > > fault. But please write "why add new function" to patch description.
> >> > > >
> >> > > > I'm sorry for wasting your time.
> >> > >
> >> > > Do we need to go down this route? We could check the stat and do the
> >> > > correct thing. In case of FILE_MAPPED, always grab page_cgroup_lock
> >> > > and for others potentially look at trylock. It is OK for different
> >> > > stats to be protected via different locks.
> >> > >
> >> >
> >> > I _don't_ want to see a mixture of spinlock and trylock in a function.
> >> >
> >>
> >> A well documented well written function can help. The other thing is to
> >> of-course solve this correctly by introducing different locking around
> >> the statistics. Are you suggesting the later?
> >>
> >
> > No. As I wrote.
> > ? ? ? ?- don't modify codes around FILE_MAPPED in this series.
> > ? ? ? ?- add a new functions for new statistics
> > Then,
> > ? ? ? ?- think about clean up later, after we confirm all things work as expected.
>
> I have ported Andrea Righi's memcg dirty page accounting patches to latest
> mmtom-2010-04-05-16-09. In doing so I have to address this locking issue. Does
> the following look good? I will (of course) submit the entire patch for review,
> but I wanted make sure I was aiming in the right direction.
>
> void mem_cgroup_update_page_stat(struct page *page,
> enum mem_cgroup_write_page_stat_item idx, bool charge)
> {
> static int seq;
> struct page_cgroup *pc;
>
> if (mem_cgroup_disabled())
> return;
> pc = lookup_page_cgroup(page);
> if (!pc || mem_cgroup_is_root(pc->mem_cgroup))
> return;
>
> /*
> * This routine does not disable irq when updating stats. So it is
> * possible that a stat update from within interrupt routine, could
> * deadlock. Use trylock_page_cgroup() to avoid such deadlock. This
> * makes the memcg counters fuzzy. More complicated, or lower
> * performing locking solutions avoid this fuzziness, but are not
> * currently needed.
> */
> if (irqs_disabled()) {
^^^^^^^^^
Or may be in_interrupt()?

> if (! trylock_page_cgroup(pc))
> return;
> } else
> lock_page_cgroup(pc);
>
> __mem_cgroup_update_page_stat(pc, idx, charge);
> unlock_page_cgroup(pc);
> }

Vivek

2010-04-14 14:44:33

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH -mmotm 1/5] memcg: disable irq at page cgroup lock

* Greg Thelen <[email protected]> [2010-04-13 23:55:12]:

> On Thu, Mar 18, 2010 at 8:00 PM, KAMEZAWA Hiroyuki <[email protected]> wrote:
> > On Fri, 19 Mar 2010 08:10:39 +0530
> > Balbir Singh <[email protected]> wrote:
> >
> >> * KAMEZAWA Hiroyuki <[email protected]> [2010-03-19 10:23:32]:
> >>
> >> > On Thu, 18 Mar 2010 21:58:55 +0530
> >> > Balbir Singh <[email protected]> wrote:
> >> >
> >> > > * KAMEZAWA Hiroyuki <[email protected]> [2010-03-18 13:35:27]:
> >> >
> >> > > > Then, no probelm. It's ok to add mem_cgroup_udpate_stat() indpendent from
> >> > > > mem_cgroup_update_file_mapped(). The look may be messy but it's not your
> >> > > > fault. But please write "why add new function" to patch description.
> >> > > >
> >> > > > I'm sorry for wasting your time.
> >> > >
> >> > > Do we need to go down this route? We could check the stat and do the
> >> > > correct thing. In case of FILE_MAPPED, always grab page_cgroup_lock
> >> > > and for others potentially look at trylock. It is OK for different
> >> > > stats to be protected via different locks.
> >> > >
> >> >
> >> > I _don't_ want to see a mixture of spinlock and trylock in a function.
> >> >
> >>
> >> A well documented well written function can help. The other thing is to
> >> of-course solve this correctly by introducing different locking around
> >> the statistics. Are you suggesting the later?
> >>
> >
> > No. As I wrote.
> > ? ? ? ?- don't modify codes around FILE_MAPPED in this series.
> > ? ? ? ?- add a new functions for new statistics
> > Then,
> > ? ? ? ?- think about clean up later, after we confirm all things work as expected.
>
> I have ported Andrea Righi's memcg dirty page accounting patches to latest
> mmtom-2010-04-05-16-09. In doing so I have to address this locking issue. Does
> the following look good? I will (of course) submit the entire patch for review,
> but I wanted make sure I was aiming in the right direction.
>
> void mem_cgroup_update_page_stat(struct page *page,
> enum mem_cgroup_write_page_stat_item idx, bool charge)
> {
> static int seq;
> struct page_cgroup *pc;
>
> if (mem_cgroup_disabled())
> return;
> pc = lookup_page_cgroup(page);
> if (!pc || mem_cgroup_is_root(pc->mem_cgroup))
> return;
>
> /*
> * This routine does not disable irq when updating stats. So it is
> * possible that a stat update from within interrupt routine, could
> * deadlock. Use trylock_page_cgroup() to avoid such deadlock. This
> * makes the memcg counters fuzzy. More complicated, or lower
> * performing locking solutions avoid this fuzziness, but are not
> * currently needed.
> */
> if (irqs_disabled()) {
> if (! trylock_page_cgroup(pc))
> return;

Since this is just stats can we used deferred updates?
else
update a deferred structure

> } else
> lock_page_cgroup(pc);
>
> __mem_cgroup_update_page_stat(pc, idx, charge);

Do charging + any deferred charges pending in
__mem_cgroup_update_page_stat().

> unlock_page_cgroup(pc);
> }
>
> __mem_cgroup_update_page_stat() has a switch statement that updates all of the
> MEMCG_NR_FILE_{MAPPED,DIRTY,WRITEBACK,WRITEBACK_TEMP,UNSTABLE_NFS} counters
> using the following form:
> switch (idx) {
> case MEMCG_NR_FILE_MAPPED:
> if (charge) {
> if (!PageCgroupFileMapped(pc))
> SetPageCgroupFileMapped(pc);
> else
> val = 0;
> } else {
> if (PageCgroupFileMapped(pc))
> ClearPageCgroupFileMapped(pc);
> else
> val = 0;
> }
> idx = MEM_CGROUP_STAT_FILE_MAPPED;
> break;
>
> ...
> }
>
> /*
> * Preemption is already disabled. We can use __this_cpu_xxx
> */
> if (val > 0) {
> __this_cpu_inc(mem->stat->count[idx]);
> } else if (val < 0) {
> __this_cpu_dec(mem->stat->count[idx]);
> }
>
> In my current tree, irq is never saved/restored by cgroup locking code. To
> protect against interrupt reentrancy, trylock_page_cgroup() is used. As the
> comment indicates, this makes the new counters fuzzy.
>

--
Three Cheers,
Balbir

2010-04-14 16:23:11

by Greg Thelen

[permalink] [raw]
Subject: Re: [PATCH -mmotm 1/5] memcg: disable irq at page cgroup lock

On Wed, Apr 14, 2010 at 2:29 AM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Tue, 13 Apr 2010 23:55:12 -0700
> Greg Thelen <[email protected]> wrote:
>> On Thu, Mar 18, 2010 at 8:00 PM, KAMEZAWA Hiroyuki <[email protected]> wrote:
>> > On Fri, 19 Mar 2010 08:10:39 +0530
>> > Balbir Singh <[email protected]> wrote:
>> >> * KAMEZAWA Hiroyuki <[email protected]> [2010-03-19 10:23:32]:
>> >>
>> >> > On Thu, 18 Mar 2010 21:58:55 +0530
>> >> > Balbir Singh <[email protected]> wrote:
>> >> >
>> >> > > * KAMEZAWA Hiroyuki <[email protected]> [2010-03-18 13:35:27]:
>> >> >
>> >> > > > Then, no probelm. It's ok to add mem_cgroup_udpate_stat() indpendent from
>> >> > > > mem_cgroup_update_file_mapped(). The look may be messy but it's not your
>> >> > > > fault. But please write "why add new function" to patch description.
>> >> > > >
>> >> > > > I'm sorry for wasting your time.
>> >> > >
>> >> > > Do we need to go down this route? We could check the stat and do the
>> >> > > correct thing. In case of FILE_MAPPED, always grab page_cgroup_lock
>> >> > > and for others potentially look at trylock. It is OK for different
>> >> > > stats to be protected via different locks.
>> >> > >
>> >> >
>> >> > I _don't_ want to see a mixture of spinlock and trylock in a function.
>> >> >
>> >>
>> >> A well documented well written function can help. The other thing is to
>> >> of-course solve this correctly by introducing different locking around
>> >> the statistics. Are you suggesting the later?
>> >>
>> >
>> > No. As I wrote.
>> > ? ? ? ?- don't modify codes around FILE_MAPPED in this series.
>> > ? ? ? ?- add a new functions for new statistics
>> > Then,
>> > ? ? ? ?- think about clean up later, after we confirm all things work as expected.
>>
>> I have ported Andrea Righi's memcg dirty page accounting patches to latest
>> mmtom-2010-04-05-16-09. ?In doing so I have to address this locking issue. ?Does
>> the following look good? ?I will (of course) submit the entire patch for review,
>> but I wanted make sure I was aiming in the right direction.
>>
>> void mem_cgroup_update_page_stat(struct page *page,
>> ? ? ? ? ? ? ? ? ? ? ? enum mem_cgroup_write_page_stat_item idx, bool charge)
>> {
>> ? ? ? static int seq;
>> ? ? ? struct page_cgroup *pc;
>>
>> ? ? ? if (mem_cgroup_disabled())
>> ? ? ? ? ? ? ? return;
>> ? ? ? pc = lookup_page_cgroup(page);
>> ? ? ? if (!pc || mem_cgroup_is_root(pc->mem_cgroup))
>> ? ? ? ? ? ? ? return;
>>
>> ? ? ? /*
>> ? ? ? ?* This routine does not disable irq when updating stats. ?So it is
>> ? ? ? ?* possible that a stat update from within interrupt routine, could
>> ? ? ? ?* deadlock. ?Use trylock_page_cgroup() to avoid such deadlock. ?This
>> ? ? ? ?* makes the memcg counters fuzzy. ?More complicated, or lower
>> ? ? ? ?* performing locking solutions avoid this fuzziness, but are not
>> ? ? ? ?* currently needed.
>> ? ? ? ?*/
>> ? ? ? if (irqs_disabled()) {
>> ? ? ? ? ? ? ? if (! trylock_page_cgroup(pc))
>> ? ? ? ? ? ? ? ? ? ? ? return;
>> ? ? ? } else
>> ? ? ? ? ? ? ? lock_page_cgroup(pc);
>>
>
> I prefer trylock_page_cgroup() always.

What is your reason for preferring trylock_page_cgroup()? I assume
it's for code simplicity, but I wanted to check.

I had though about using trylock_page_cgroup() always, but I think
that would make file_mapped accounting even more fuzzy that it already
it is. I was trying to retain the current accuracy of file_mapped and
only make new counters, like writeback/dirty/etc (those obtained in
interrupt), fuzzy.

> I have another idea fixing this up _later_. (But I want to start from simple one.)
>
> My rough idea is following. ?Similar to your idea which you gave me before.

Hi Kame-san,

I like the general approach. The code I previously gave you appears
to work and is faster than non-root memcgs using mmotm due to mostly
being lockless.

> ==
> DEFINE_PERCPU(account_move_ongoing);

What's the reason for having a per-cpu account_move_ongoing flag?
Would a single system-wide global be sufficient? I assume the
majority of the time this value will not be changing because
accounting moves are rare.

Perhaps all of the per-cpu variables are packed within a per-cpu
cacheline making accessing it more likely to be local, but I'm not
sure if this is true.

--
Greg

2010-04-14 19:35:00

by Greg Thelen

[permalink] [raw]
Subject: Re: [PATCH -mmotm 1/5] memcg: disable irq at page cgroup lock

Vivek Goyal <[email protected]> writes:

> On Wed, Apr 14, 2010 at 06:29:04PM +0900, KAMEZAWA Hiroyuki wrote:
>> On Tue, 13 Apr 2010 23:55:12 -0700
>> Greg Thelen <[email protected]> wrote:
>>
>> > On Thu, Mar 18, 2010 at 8:00 PM, KAMEZAWA Hiroyuki <[email protected]> wrote:
>> > > On Fri, 19 Mar 2010 08:10:39 +0530
>> > > Balbir Singh <[email protected]> wrote:
>> > >
>> > >> * KAMEZAWA Hiroyuki <[email protected]> [2010-03-19 10:23:32]:
>> > >>
>> > >> > On Thu, 18 Mar 2010 21:58:55 +0530
>> > >> > Balbir Singh <[email protected]> wrote:
>> > >> >
>> > >> > > * KAMEZAWA Hiroyuki <[email protected]> [2010-03-18 13:35:27]:
>> > >> >
>> > >> > > > Then, no probelm. It's ok to add mem_cgroup_udpate_stat() indpendent from
>> > >> > > > mem_cgroup_update_file_mapped(). The look may be messy but it's not your
>> > >> > > > fault. But please write "why add new function" to patch description.
>> > >> > > >
>> > >> > > > I'm sorry for wasting your time.
>> > >> > >
>> > >> > > Do we need to go down this route? We could check the stat and do the
>> > >> > > correct thing. In case of FILE_MAPPED, always grab page_cgroup_lock
>> > >> > > and for others potentially look at trylock. It is OK for different
>> > >> > > stats to be protected via different locks.
>> > >> > >
>> > >> >
>> > >> > I _don't_ want to see a mixture of spinlock and trylock in a function.
>> > >> >
>> > >>
>> > >> A well documented well written function can help. The other thing is to
>> > >> of-course solve this correctly by introducing different locking around
>> > >> the statistics. Are you suggesting the later?
>> > >>
>> > >
>> > > No. As I wrote.
>> > >        - don't modify codes around FILE_MAPPED in this series.
>> > >        - add a new functions for new statistics
>> > > Then,
>> > >        - think about clean up later, after we confirm all things work as expected.
>> >
>> > I have ported Andrea Righi's memcg dirty page accounting patches to latest
>> > mmtom-2010-04-05-16-09. In doing so I have to address this locking issue. Does
>> > the following look good? I will (of course) submit the entire patch for review,
>> > but I wanted make sure I was aiming in the right direction.
>> >
>> > void mem_cgroup_update_page_stat(struct page *page,
>> > enum mem_cgroup_write_page_stat_item idx, bool charge)
>> > {
>> > static int seq;
>> > struct page_cgroup *pc;
>> >
>> > if (mem_cgroup_disabled())
>> > return;
>> > pc = lookup_page_cgroup(page);
>> > if (!pc || mem_cgroup_is_root(pc->mem_cgroup))
>> > return;
>> >
>> > /*
>> > * This routine does not disable irq when updating stats. So it is
>> > * possible that a stat update from within interrupt routine, could
>> > * deadlock. Use trylock_page_cgroup() to avoid such deadlock. This
>> > * makes the memcg counters fuzzy. More complicated, or lower
>> > * performing locking solutions avoid this fuzziness, but are not
>> > * currently needed.
>> > */
>> > if (irqs_disabled()) {
>> > if (! trylock_page_cgroup(pc))
>> > return;
>> > } else
>> > lock_page_cgroup(pc);
>> >
>>
>> I prefer trylock_page_cgroup() always.
>>
>> I have another idea fixing this up _later_. (But I want to start from simple one.)
>>
>> My rough idea is following. Similar to your idea which you gave me before.
>>
>> ==
>> DEFINE_PERCPU(account_move_ongoing);
>> DEFINE_MUTEX(move_account_mutex):
>>
>> void memcg_start_account_move(void)
>> {
>> mutex_lock(&move_account_mutex);
>> for_each_online_cpu(cpu)
>> per_cpu(cpu, account_move_ongoing) += 1;
>> mutex_unlock(&move_account_mutex);
>> /* Wait until there are no lockless update */
>> synchronize_rcu();
>> return;
>> }
>>
>> void memcg_end_account_move(void)
>> {
>> mutex_lock(&move_account_mutex);
>> for_each_online_cpu(cpu)
>> per_cpu(cpu, account_move_ongoing) -= 1;
>> mutex_unlock(&move_account_mutex);
>> }
>>
>> /* return 1 when we took lock, return 0 if lockess OPs is guarantedd to be safe */
>> int memcg_start_filecache_accounting(struct page_cgroup *pc)
>> {
>> rcu_read_lock();
>> smp_rmb();
>> if (!this_cpu_read(move_account_ongoing))
>> return 0; /* no move account is ongoing */
>> lock_page_cgroup(pc);
>> return 1;
>> }
>>
>> void memcg_end_filecache_accounting(struct page_cgroup *pc, int unlock)
>> {
>> if (unlock)
>> unlock_page_cgroup(pc);
>>
>> rcu_read_unlock();
>> }
>>
>> and call memcg_start_account_move()/end_account_move() in the start/end of
>> migrainting chunk of pages.
>
> Hi Kame-san,
>
> May be I am missing something but how does it solve the issue of making sure
> lock_page_cgroup() is not held in interrupt context? IIUC, above code will
> make sure that for file cache accouting, lock_page_cgroup() is taken only
> if task migration is on. But say task migration is on, and then some IO
> completes and we update WRITEBACK stat (i think this is the one which can
> be called from interrupt context), then we will still take the
> lock_page_cgroup() and again run into the issue of deadlocks?
>
> Thanks
> Vivek

I agree. I think the lock/unlock_page_cgrpoup() calls suggested by
Kame-san should also include local_irq_save/restore() calls to prevent
the interrupt context deadlock Vivek describes. These new
local_irq_save/restore() calls would only be used if
move_account_ongoing is set. They behave just like the optional calls
to lock/unlock_page_cgroup().

--
Greg

2010-04-14 20:16:15

by Greg Thelen

[permalink] [raw]
Subject: Re: [PATCH -mmotm 1/5] memcg: disable irq at page cgroup lock

Vivek Goyal <[email protected]> writes:

> On Tue, Apr 13, 2010 at 11:55:12PM -0700, Greg Thelen wrote:
>> On Thu, Mar 18, 2010 at 8:00 PM, KAMEZAWA Hiroyuki <[email protected]> wrote:
>> > On Fri, 19 Mar 2010 08:10:39 +0530
>> > Balbir Singh <[email protected]> wrote:
>> >
>> >> * KAMEZAWA Hiroyuki <[email protected]> [2010-03-19 10:23:32]:
>> >>
>> >> > On Thu, 18 Mar 2010 21:58:55 +0530
>> >> > Balbir Singh <[email protected]> wrote:
>> >> >
>> >> > > * KAMEZAWA Hiroyuki <[email protected]> [2010-03-18 13:35:27]:
>> >> >
>> >> > > > Then, no probelm. It's ok to add mem_cgroup_udpate_stat() indpendent from
>> >> > > > mem_cgroup_update_file_mapped(). The look may be messy but it's not your
>> >> > > > fault. But please write "why add new function" to patch description.
>> >> > > >
>> >> > > > I'm sorry for wasting your time.
>> >> > >
>> >> > > Do we need to go down this route? We could check the stat and do the
>> >> > > correct thing. In case of FILE_MAPPED, always grab page_cgroup_lock
>> >> > > and for others potentially look at trylock. It is OK for different
>> >> > > stats to be protected via different locks.
>> >> > >
>> >> >
>> >> > I _don't_ want to see a mixture of spinlock and trylock in a function.
>> >> >
>> >>
>> >> A well documented well written function can help. The other thing is to
>> >> of-course solve this correctly by introducing different locking around
>> >> the statistics. Are you suggesting the later?
>> >>
>> >
>> > No. As I wrote.
>> >        - don't modify codes around FILE_MAPPED in this series.
>> >        - add a new functions for new statistics
>> > Then,
>> >        - think about clean up later, after we confirm all things work as expected.
>>
>> I have ported Andrea Righi's memcg dirty page accounting patches to latest
>> mmtom-2010-04-05-16-09. In doing so I have to address this locking issue. Does
>> the following look good? I will (of course) submit the entire patch for review,
>> but I wanted make sure I was aiming in the right direction.
>>
>> void mem_cgroup_update_page_stat(struct page *page,
>> enum mem_cgroup_write_page_stat_item idx, bool charge)
>> {
>> static int seq;
>> struct page_cgroup *pc;
>>
>> if (mem_cgroup_disabled())
>> return;
>> pc = lookup_page_cgroup(page);
>> if (!pc || mem_cgroup_is_root(pc->mem_cgroup))
>> return;
>>
>> /*
>> * This routine does not disable irq when updating stats. So it is
>> * possible that a stat update from within interrupt routine, could
>> * deadlock. Use trylock_page_cgroup() to avoid such deadlock. This
>> * makes the memcg counters fuzzy. More complicated, or lower
>> * performing locking solutions avoid this fuzziness, but are not
>> * currently needed.
>> */
>> if (irqs_disabled()) {
> ^^^^^^^^^
> Or may be in_interrupt()?

Good catch. I will replace irqs_disabled() with in_interrupt().

Thank you.

--
Greg

2010-04-15 00:18:54

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH -mmotm 1/5] memcg: disable irq at page cgroup lock

On Wed, 14 Apr 2010 10:04:30 -0400
Vivek Goyal <[email protected]> wrote:

> On Wed, Apr 14, 2010 at 06:29:04PM +0900, KAMEZAWA Hiroyuki wrote:

> Hi Kame-san,
>
> May be I am missing something but how does it solve the issue of making sure
> lock_page_cgroup() is not held in interrupt context? IIUC, above code will
> make sure that for file cache accouting, lock_page_cgroup() is taken only
> if task migration is on. But say task migration is on, and then some IO
> completes and we update WRITEBACK stat (i think this is the one which can
> be called from interrupt context), then we will still take the
> lock_page_cgroup() and again run into the issue of deadlocks?
>
Yes and No.

At "Set", IIRC, almost all updates against DIRTY and WRITBACK accountings
can be done under mapping->tree_lock, which disables IRQ always.
(ex. I don't mention TestSetPageWriteback but account_page_diritied().)
Then, save/restore irq flags is just a cost and no benefit, in such cases.
Of course, there are cases irqs doesn't enabled.

About FILE_MAPPED, it's not updated under mapping->tree_lock.
So, we'll have race with charge/uncharge. We have to take lock_page_cgroup(), always.


So, I think we'll have 2 or 3 interfaces, finally.
mem_cgroup_update_stat_fast() // the caller must disable IRQ and lock_page()
and
mem_cgroup_update_stat_locked() // the caller has lock_page().
and
mem_cgroup_update_stat_safe() // the caller don't have to do anything.

Why update_stat_fast() is for avoiding _unnecessary_ costs.
When we lock a page and disables IRQ, we don't have to do anything.
There are no races.

But yes, it's complicated. I'd like to see what we can do.


Thanks,
-Kame

2010-04-15 00:26:59

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH -mmotm 1/5] memcg: disable irq at page cgroup lock

On Wed, 14 Apr 2010 09:22:41 -0700
Greg Thelen <[email protected]> wrote:

> On Wed, Apr 14, 2010 at 2:29 AM, KAMEZAWA Hiroyuki
> <[email protected]> wrote:
>
> >>       if (irqs_disabled()) {
> >>               if (! trylock_page_cgroup(pc))
> >>                       return;
> >>       } else
> >>               lock_page_cgroup(pc);
> >>
> >
> > I prefer trylock_page_cgroup() always.
>
> What is your reason for preferring trylock_page_cgroup()? I assume
> it's for code simplicity, but I wanted to check.
>
> I had though about using trylock_page_cgroup() always, but I think
> that would make file_mapped accounting even more fuzzy that it already
> it is. I was trying to retain the current accuracy of file_mapped and
> only make new counters, like writeback/dirty/etc (those obtained in
> interrupt), fuzzy.
>

file_mapped should have different interface as mem_cgroup_update_stat_verrrry_safe().
or some.

I don't think accuracy is important (if it's doesn't go minus) but if people want,
I agree to keep it accurate.


> > I have another idea fixing this up _later_. (But I want to start from simple one.)
> >
> > My rough idea is following.  Similar to your idea which you gave me before.
>
> Hi Kame-san,
>
> I like the general approach. The code I previously gave you appears
> to work and is faster than non-root memcgs using mmotm due to mostly
> being lockless.
>
I hope so.

> > ==
> > DEFINE_PERCPU(account_move_ongoing);
>
> What's the reason for having a per-cpu account_move_ongoing flag?
> Would a single system-wide global be sufficient? I assume the
> majority of the time this value will not be changing because
> accounting moves are rare.
>
> Perhaps all of the per-cpu variables are packed within a per-cpu
> cacheline making accessing it more likely to be local, but I'm not
> sure if this is true.
>

Yes. this value is rarely updated but update is not enough rare to put
this value to read_mostly section. We see cacheline ping-pong by random
placement of global variables. This is performance critical.
Recent updates for percpu variables accessor makes access to percpu
very efficient. I'd like to make use of it.

Thanks,
-Kame

2010-04-15 02:49:04

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [PATCH -mmotm 1/5] memcg: disable irq at page cgroup lock

On Wed, 14 Apr 2010 13:14:07 -0700, Greg Thelen <[email protected]> wrote:
> Vivek Goyal <[email protected]> writes:
>
> > On Tue, Apr 13, 2010 at 11:55:12PM -0700, Greg Thelen wrote:
> >> On Thu, Mar 18, 2010 at 8:00 PM, KAMEZAWA Hiroyuki <[email protected]> wrote:
> >> > On Fri, 19 Mar 2010 08:10:39 +0530
> >> > Balbir Singh <[email protected]> wrote:
> >> >
> >> >> * KAMEZAWA Hiroyuki <[email protected]> [2010-03-19 10:23:32]:
> >> >>
> >> >> > On Thu, 18 Mar 2010 21:58:55 +0530
> >> >> > Balbir Singh <[email protected]> wrote:
> >> >> >
> >> >> > > * KAMEZAWA Hiroyuki <[email protected]> [2010-03-18 13:35:27]:
> >> >> >
> >> >> > > > Then, no probelm. It's ok to add mem_cgroup_udpate_stat() indpendent from
> >> >> > > > mem_cgroup_update_file_mapped(). The look may be messy but it's not your
> >> >> > > > fault. But please write "why add new function" to patch description.
> >> >> > > >
> >> >> > > > I'm sorry for wasting your time.
> >> >> > >
> >> >> > > Do we need to go down this route? We could check the stat and do the
> >> >> > > correct thing. In case of FILE_MAPPED, always grab page_cgroup_lock
> >> >> > > and for others potentially look at trylock. It is OK for different
> >> >> > > stats to be protected via different locks.
> >> >> > >
> >> >> >
> >> >> > I _don't_ want to see a mixture of spinlock and trylock in a function.
> >> >> >
> >> >>
> >> >> A well documented well written function can help. The other thing is to
> >> >> of-course solve this correctly by introducing different locking around
> >> >> the statistics. Are you suggesting the later?
> >> >>
> >> >
> >> > No. As I wrote.
> >> >        - don't modify codes around FILE_MAPPED in this series.
> >> >        - add a new functions for new statistics
> >> > Then,
> >> >        - think about clean up later, after we confirm all things work as expected.
> >>
> >> I have ported Andrea Righi's memcg dirty page accounting patches to latest
> >> mmtom-2010-04-05-16-09. In doing so I have to address this locking issue. Does
> >> the following look good? I will (of course) submit the entire patch for review,
> >> but I wanted make sure I was aiming in the right direction.
> >>
> >> void mem_cgroup_update_page_stat(struct page *page,
> >> enum mem_cgroup_write_page_stat_item idx, bool charge)
> >> {
> >> static int seq;
> >> struct page_cgroup *pc;
> >>
> >> if (mem_cgroup_disabled())
> >> return;
> >> pc = lookup_page_cgroup(page);
> >> if (!pc || mem_cgroup_is_root(pc->mem_cgroup))
> >> return;
> >>
> >> /*
> >> * This routine does not disable irq when updating stats. So it is
> >> * possible that a stat update from within interrupt routine, could
> >> * deadlock. Use trylock_page_cgroup() to avoid such deadlock. This
> >> * makes the memcg counters fuzzy. More complicated, or lower
> >> * performing locking solutions avoid this fuzziness, but are not
> >> * currently needed.
> >> */
> >> if (irqs_disabled()) {
> > ^^^^^^^^^
> > Or may be in_interrupt()?
>
> Good catch. I will replace irqs_disabled() with in_interrupt().
>
I think you should check both. __remove_from_page_cache(), which will update
DIRTY, is called with irq disabled(iow, under mapping->tree_lock) but not in
interrupt context.

Anyway, I tend to agree with KAMEZAWA-san: use trylock always(except for FILE_MAPPED),
or add some new interfaces(e.g. mem_cgroup_update_stat_locked/safe...).

Thanks,
Daisuke Nishimura.

2010-04-15 04:49:24

by Greg Thelen

[permalink] [raw]
Subject: Re: [PATCH -mmotm 1/5] memcg: disable irq at page cgroup lock

On Wed, Apr 14, 2010 at 7:40 PM, Daisuke Nishimura
<[email protected]> wrote:
> On Wed, 14 Apr 2010 13:14:07 -0700, Greg Thelen <[email protected]> wrote:
>> Vivek Goyal <[email protected]> writes:
>>
>> > On Tue, Apr 13, 2010 at 11:55:12PM -0700, Greg Thelen wrote:
>> >> On Thu, Mar 18, 2010 at 8:00 PM, KAMEZAWA Hiroyuki <[email protected]> wrote:
>> >> > On Fri, 19 Mar 2010 08:10:39 +0530
>> >> > Balbir Singh <[email protected]> wrote:
>> >> >
>> >> >> * KAMEZAWA Hiroyuki <[email protected]> [2010-03-19 10:23:32]:
>> >> >>
>> >> >> > On Thu, 18 Mar 2010 21:58:55 +0530
>> >> >> > Balbir Singh <[email protected]> wrote:
>> >> >> >
>> >> >> > > * KAMEZAWA Hiroyuki <[email protected]> [2010-03-18 13:35:27]:
>> >> >> >
>> >> >> > > > Then, no probelm. It's ok to add mem_cgroup_udpate_stat() indpendent from
>> >> >> > > > mem_cgroup_update_file_mapped(). The look may be messy but it's not your
>> >> >> > > > fault. But please write "why add new function" to patch description.
>> >> >> > > >
>> >> >> > > > I'm sorry for wasting your time.
>> >> >> > >
>> >> >> > > Do we need to go down this route? We could check the stat and do the
>> >> >> > > correct thing. In case of FILE_MAPPED, always grab page_cgroup_lock
>> >> >> > > and for others potentially look at trylock. It is OK for different
>> >> >> > > stats to be protected via different locks.
>> >> >> > >
>> >> >> >
>> >> >> > I _don't_ want to see a mixture of spinlock and trylock in a function.
>> >> >> >
>> >> >>
>> >> >> A well documented well written function can help. The other thing is to
>> >> >> of-course solve this correctly by introducing different locking around
>> >> >> the statistics. Are you suggesting the later?
>> >> >>
>> >> >
>> >> > No. As I wrote.
>> >> > ? ? ? ?- don't modify codes around FILE_MAPPED in this series.
>> >> > ? ? ? ?- add a new functions for new statistics
>> >> > Then,
>> >> > ? ? ? ?- think about clean up later, after we confirm all things work as expected.
>> >>
>> >> I have ported Andrea Righi's memcg dirty page accounting patches to latest
>> >> mmtom-2010-04-05-16-09. ?In doing so I have to address this locking issue. ?Does
>> >> the following look good? ?I will (of course) submit the entire patch for review,
>> >> but I wanted make sure I was aiming in the right direction.
>> >>
>> >> void mem_cgroup_update_page_stat(struct page *page,
>> >> ? ? ? ? ? ? ? ? ? ?enum mem_cgroup_write_page_stat_item idx, bool charge)
>> >> {
>> >> ? ?static int seq;
>> >> ? ?struct page_cgroup *pc;
>> >>
>> >> ? ?if (mem_cgroup_disabled())
>> >> ? ? ? ? ? ?return;
>> >> ? ?pc = lookup_page_cgroup(page);
>> >> ? ?if (!pc || mem_cgroup_is_root(pc->mem_cgroup))
>> >> ? ? ? ? ? ?return;
>> >>
>> >> ? ?/*
>> >> ? ? * This routine does not disable irq when updating stats. ?So it is
>> >> ? ? * possible that a stat update from within interrupt routine, could
>> >> ? ? * deadlock. ?Use trylock_page_cgroup() to avoid such deadlock. ?This
>> >> ? ? * makes the memcg counters fuzzy. ?More complicated, or lower
>> >> ? ? * performing locking solutions avoid this fuzziness, but are not
>> >> ? ? * currently needed.
>> >> ? ? */
>> >> ? ?if (irqs_disabled()) {
>> > ? ? ? ? ? ? ^^^^^^^^^
>> > Or may be in_interrupt()?
>>
>> Good catch. ?I will replace irqs_disabled() with in_interrupt().
>>
> I think you should check both. __remove_from_page_cache(), which will update
> DIRTY, is called with irq disabled(iow, under mapping->tree_lock) but not in
> interrupt context.

The only reason to use trylock in this case is to prevent deadlock
when running in a context that may have preempted or interrupted a
routine that already holds the bit locked. In the
__remove_from_page_cache() irqs are disabled, but that does not imply
that a routine holding the spinlock has been preempted. When the bit
is locked, preemption is disabled. The only way to interrupt a holder
of the bit for an interrupt to occur (I/O, timer, etc). So I think
that in_interrupt() is sufficient. Am I missing something?

> Anyway, I tend to agree with KAMEZAWA-san: use trylock always(except for FILE_MAPPED),
> or add some new interfaces(e.g. mem_cgroup_update_stat_locked/safe...).

Thank you for the input. I'm thinking more on this.

--
Greg

2010-04-15 06:29:48

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: [PATCH -mmotm 1/5] memcg: disable irq at page cgroup lock

On Wed, 14 Apr 2010 21:48:25 -0700, Greg Thelen <[email protected]> wrote:
> On Wed, Apr 14, 2010 at 7:40 PM, Daisuke Nishimura
> <[email protected]> wrote:
> > On Wed, 14 Apr 2010 13:14:07 -0700, Greg Thelen <[email protected]> wrote:
> >> Vivek Goyal <[email protected]> writes:
> >>
> >> > On Tue, Apr 13, 2010 at 11:55:12PM -0700, Greg Thelen wrote:
> >> >> On Thu, Mar 18, 2010 at 8:00 PM, KAMEZAWA Hiroyuki <[email protected]> wrote:
> >> >> > On Fri, 19 Mar 2010 08:10:39 +0530
> >> >> > Balbir Singh <[email protected]> wrote:
> >> >> >
> >> >> >> * KAMEZAWA Hiroyuki <[email protected]> [2010-03-19 10:23:32]:
> >> >> >>
> >> >> >> > On Thu, 18 Mar 2010 21:58:55 +0530
> >> >> >> > Balbir Singh <[email protected]> wrote:
> >> >> >> >
> >> >> >> > > * KAMEZAWA Hiroyuki <[email protected]> [2010-03-18 13:35:27]:
> >> >> >> >
> >> >> >> > > > Then, no probelm. It's ok to add mem_cgroup_udpate_stat() indpendent from
> >> >> >> > > > mem_cgroup_update_file_mapped(). The look may be messy but it's not your
> >> >> >> > > > fault. But please write "why add new function" to patch description.
> >> >> >> > > >
> >> >> >> > > > I'm sorry for wasting your time.
> >> >> >> > >
> >> >> >> > > Do we need to go down this route? We could check the stat and do the
> >> >> >> > > correct thing. In case of FILE_MAPPED, always grab page_cgroup_lock
> >> >> >> > > and for others potentially look at trylock. It is OK for different
> >> >> >> > > stats to be protected via different locks.
> >> >> >> > >
> >> >> >> >
> >> >> >> > I _don't_ want to see a mixture of spinlock and trylock in a function.
> >> >> >> >
> >> >> >>
> >> >> >> A well documented well written function can help. The other thing is to
> >> >> >> of-course solve this correctly by introducing different locking around
> >> >> >> the statistics. Are you suggesting the later?
> >> >> >>
> >> >> >
> >> >> > No. As I wrote.
> >> >> >        - don't modify codes around FILE_MAPPED in this series.
> >> >> >        - add a new functions for new statistics
> >> >> > Then,
> >> >> >        - think about clean up later, after we confirm all things work as expected.
> >> >>
> >> >> I have ported Andrea Righi's memcg dirty page accounting patches to latest
> >> >> mmtom-2010-04-05-16-09.  In doing so I have to address this locking issue.  Does
> >> >> the following look good?  I will (of course) submit the entire patch for review,
> >> >> but I wanted make sure I was aiming in the right direction.
> >> >>
> >> >> void mem_cgroup_update_page_stat(struct page *page,
> >> >>                    enum mem_cgroup_write_page_stat_item idx, bool charge)
> >> >> {
> >> >>    static int seq;
> >> >>    struct page_cgroup *pc;
> >> >>
> >> >>    if (mem_cgroup_disabled())
> >> >>            return;
> >> >>    pc = lookup_page_cgroup(page);
> >> >>    if (!pc || mem_cgroup_is_root(pc->mem_cgroup))
> >> >>            return;
> >> >>
> >> >>    /*
> >> >>     * This routine does not disable irq when updating stats.  So it is
> >> >>     * possible that a stat update from within interrupt routine, could
> >> >>     * deadlock.  Use trylock_page_cgroup() to avoid such deadlock.  This
> >> >>     * makes the memcg counters fuzzy.  More complicated, or lower
> >> >>     * performing locking solutions avoid this fuzziness, but are not
> >> >>     * currently needed.
> >> >>     */
> >> >>    if (irqs_disabled()) {
> >> >             ^^^^^^^^^
> >> > Or may be in_interrupt()?
> >>
> >> Good catch.  I will replace irqs_disabled() with in_interrupt().
> >>
> > I think you should check both. __remove_from_page_cache(), which will update
> > DIRTY, is called with irq disabled(iow, under mapping->tree_lock) but not in
> > interrupt context.
>
> The only reason to use trylock in this case is to prevent deadlock
> when running in a context that may have preempted or interrupted a
> routine that already holds the bit locked. In the
> __remove_from_page_cache() irqs are disabled, but that does not imply
> that a routine holding the spinlock has been preempted. When the bit
> is locked, preemption is disabled. The only way to interrupt a holder
> of the bit for an interrupt to occur (I/O, timer, etc). So I think
> that in_interrupt() is sufficient. Am I missing something?
>
IIUC, it's would be enough to prevent deadlock where one CPU tries to acquire
the same page cgroup lock. But there is still some possibility where 2 CPUs
can cause dead lock each other(please see the commit e767e056).
IOW, my point is "don't call lock_page_cgroup() under mapping->tree_lock".

Thanks,
Daisuke Nishimura.

2010-04-15 06:39:06

by Greg Thelen

[permalink] [raw]
Subject: Re: [PATCH -mmotm 1/5] memcg: disable irq at page cgroup lock

On Wed, Apr 14, 2010 at 11:21 PM, Daisuke Nishimura
<[email protected]> wrote:
> On Wed, 14 Apr 2010 21:48:25 -0700, Greg Thelen <[email protected]> wrote:
>> On Wed, Apr 14, 2010 at 7:40 PM, Daisuke Nishimura
>> <[email protected]> wrote:
>> > On Wed, 14 Apr 2010 13:14:07 -0700, Greg Thelen <[email protected]> wrote:
>> >> Vivek Goyal <[email protected]> writes:
>> >>
>> >> > On Tue, Apr 13, 2010 at 11:55:12PM -0700, Greg Thelen wrote:
>> >> >> On Thu, Mar 18, 2010 at 8:00 PM, KAMEZAWA Hiroyuki <[email protected]> wrote:
>> >> >> > On Fri, 19 Mar 2010 08:10:39 +0530
>> >> >> > Balbir Singh <[email protected]> wrote:
>> >> >> >
>> >> >> >> * KAMEZAWA Hiroyuki <[email protected]> [2010-03-19 10:23:32]:
>> >> >> >>
>> >> >> >> > On Thu, 18 Mar 2010 21:58:55 +0530
>> >> >> >> > Balbir Singh <[email protected]> wrote:
>> >> >> >> >
>> >> >> >> > > * KAMEZAWA Hiroyuki <[email protected]> [2010-03-18 13:35:27]:
>> >> >> >> >
>> >> >> >> > > > Then, no probelm. It's ok to add mem_cgroup_udpate_stat() indpendent from
>> >> >> >> > > > mem_cgroup_update_file_mapped(). The look may be messy but it's not your
>> >> >> >> > > > fault. But please write "why add new function" to patch description.
>> >> >> >> > > >
>> >> >> >> > > > I'm sorry for wasting your time.
>> >> >> >> > >
>> >> >> >> > > Do we need to go down this route? We could check the stat and do the
>> >> >> >> > > correct thing. In case of FILE_MAPPED, always grab page_cgroup_lock
>> >> >> >> > > and for others potentially look at trylock. It is OK for different
>> >> >> >> > > stats to be protected via different locks.
>> >> >> >> > >
>> >> >> >> >
>> >> >> >> > I _don't_ want to see a mixture of spinlock and trylock in a function.
>> >> >> >> >
>> >> >> >>
>> >> >> >> A well documented well written function can help. The other thing is to
>> >> >> >> of-course solve this correctly by introducing different locking around
>> >> >> >> the statistics. Are you suggesting the later?
>> >> >> >>
>> >> >> >
>> >> >> > No. As I wrote.
>> >> >> > ? ? ? ?- don't modify codes around FILE_MAPPED in this series.
>> >> >> > ? ? ? ?- add a new functions for new statistics
>> >> >> > Then,
>> >> >> > ? ? ? ?- think about clean up later, after we confirm all things work as expected.
>> >> >>
>> >> >> I have ported Andrea Righi's memcg dirty page accounting patches to latest
>> >> >> mmtom-2010-04-05-16-09. ?In doing so I have to address this locking issue. ?Does
>> >> >> the following look good? ?I will (of course) submit the entire patch for review,
>> >> >> but I wanted make sure I was aiming in the right direction.
>> >> >>
>> >> >> void mem_cgroup_update_page_stat(struct page *page,
>> >> >> ? ? ? ? ? ? ? ? ? ?enum mem_cgroup_write_page_stat_item idx, bool charge)
>> >> >> {
>> >> >> ? ?static int seq;
>> >> >> ? ?struct page_cgroup *pc;
>> >> >>
>> >> >> ? ?if (mem_cgroup_disabled())
>> >> >> ? ? ? ? ? ?return;
>> >> >> ? ?pc = lookup_page_cgroup(page);
>> >> >> ? ?if (!pc || mem_cgroup_is_root(pc->mem_cgroup))
>> >> >> ? ? ? ? ? ?return;
>> >> >>
>> >> >> ? ?/*
>> >> >> ? ? * This routine does not disable irq when updating stats. ?So it is
>> >> >> ? ? * possible that a stat update from within interrupt routine, could
>> >> >> ? ? * deadlock. ?Use trylock_page_cgroup() to avoid such deadlock. ?This
>> >> >> ? ? * makes the memcg counters fuzzy. ?More complicated, or lower
>> >> >> ? ? * performing locking solutions avoid this fuzziness, but are not
>> >> >> ? ? * currently needed.
>> >> >> ? ? */
>> >> >> ? ?if (irqs_disabled()) {
>> >> > ? ? ? ? ? ? ^^^^^^^^^
>> >> > Or may be in_interrupt()?
>> >>
>> >> Good catch. ?I will replace irqs_disabled() with in_interrupt().
>> >>
>> > I think you should check both. __remove_from_page_cache(), which will update
>> > DIRTY, is called with irq disabled(iow, under mapping->tree_lock) but not in
>> > interrupt context.
>>
>> The only reason to use trylock in this case is to prevent deadlock
>> when running in a context that may have preempted or interrupted a
>> routine that already holds the bit locked. ?In the
>> __remove_from_page_cache() irqs are disabled, but that does not imply
>> that a routine holding the spinlock has been preempted. ?When the bit
>> is locked, preemption is disabled. ?The only way to interrupt a holder
>> of the bit for an interrupt to occur (I/O, timer, etc). ?So I think
>> that in_interrupt() is sufficient. ?Am I missing something?
>>
> IIUC, it's would be enough to prevent deadlock where one CPU tries to acquire
> the same page cgroup lock. But there is still some possibility where 2 CPUs
> can cause dead lock each other(please see the commit e767e056).
> IOW, my point is "don't call lock_page_cgroup() under mapping->tree_lock".

I see your point. Thank you for explaining.

--
Greg

2010-04-15 06:58:37

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH -mmotm 1/5] memcg: disable irq at page cgroup lock

On Thu, 15 Apr 2010 15:21:04 +0900
Daisuke Nishimura <[email protected]> wrote:
> > The only reason to use trylock in this case is to prevent deadlock
> > when running in a context that may have preempted or interrupted a
> > routine that already holds the bit locked. In the
> > __remove_from_page_cache() irqs are disabled, but that does not imply
> > that a routine holding the spinlock has been preempted. When the bit
> > is locked, preemption is disabled. The only way to interrupt a holder
> > of the bit for an interrupt to occur (I/O, timer, etc). So I think
> > that in_interrupt() is sufficient. Am I missing something?
> >
> IIUC, it's would be enough to prevent deadlock where one CPU tries to acquire
> the same page cgroup lock. But there is still some possibility where 2 CPUs
> can cause dead lock each other(please see the commit e767e056).
> IOW, my point is "don't call lock_page_cgroup() under mapping->tree_lock".
>
Hmm, maybe worth to try. We may be able to set/clear all DIRTY/WRITBACK bit
on page_cgroup without mapping->tree_lock.
In such case, of course, the page itself should be locked by lock_page().

But.Hmm..for example.

account_page_dirtied() is the best place to mark page_cgroup dirty. But
it's called under mapping->tree_lock.

Another thinking:
I wonder we may have to change our approach for dirty page acccounting.

Please see task_dirty_inc(). It's for per task dirty limiting.
And you'll notice soon that there is no task_dirty_dec().

Making use of lib/proportions.c's proportion calculation as task_dirty limit or
per-bdi dirty limit does is worth to be considered.
This is very simple and can be implemented without problems we have now.
(Need to think about algorithm itself, but it's used and works well.)
We'll never see complicated race condtions.

I know some guys wants "accurate" accounting, but I myself don't want too much.
Using propotions.c can offer us unified approach with per-task dirty accounting.
or per-bid dirty accouting.

If we do so, memcg will have interface like per-bdi dirty ratio (see below)
[kamezawa@bluextal kvm2]$ ls /sys/block/dm-0/bdi/
max_ratio min_ratio power read_ahead_kb subsystem uevent

Maybe
memory.min_ratio
memory.max_ratio

And use this instead of task_dirty_limit(current, pbdi_dirty); As

if (mem_cgroup_dirty_ratio_support(current)) // return 0 if root cgroup
memcg_dirty_limit(current, pbdi_dirty, xxxx?);
else
task_dirty_limit(current, pbdi_diryt)

To be honest, I don't want to increase caller of lock_page_cgroup() and don't
want to see complicated race conditions.

Thanks,
-Kame