2021-07-19 10:46:18

by Vasily Averin

[permalink] [raw]
Subject: [PATCH v5 02/16] memcg: enable accounting for IP address and routing-related objects

An netadmin inside container can use 'ip a a' and 'ip r a'
to assign a large number of ipv4/ipv6 addresses and routing entries
and force kernel to allocate megabytes of unaccounted memory
for long-lived per-netdevice related kernel objects:
'struct in_ifaddr', 'struct inet6_ifaddr', 'struct fib6_node',
'struct rt6_info', 'struct fib_rules' and ip_fib caches.

These objects can be manually removed, though usually they lives
in memory till destroy of its net namespace.

It makes sense to account for them to restrict the host's memory
consumption from inside the memcg-limited container.

One of such objects is the 'struct fib6_node' mostly allocated in
net/ipv6/route.c::__ip6_ins_rt() inside the lock_bh()/unlock_bh() section:

write_lock_bh(&table->tb6_lock);
err = fib6_add(&table->tb6_root, rt, info, mxc);
write_unlock_bh(&table->tb6_lock);

In this case it is not enough to simply add SLAB_ACCOUNT to corresponding
kmem cache. The proper memory cgroup still cannot be found due to the
incorrect 'in_interrupt()' check used in memcg_kmem_bypass().

Obsoleted in_interrupt() does not describe real execution context properly.
From include/linux/preempt.h:

The following macros are deprecated and should not be used in new code:
in_interrupt() - We're in NMI,IRQ,SoftIRQ context or have BH disabled

To verify the current execution context new macro should be used instead:
in_task() - We're in task context

Signed-off-by: Vasily Averin <[email protected]>
---
mm/memcontrol.c | 2 +-
net/core/fib_rules.c | 4 ++--
net/ipv4/devinet.c | 2 +-
net/ipv4/fib_trie.c | 4 ++--
net/ipv6/addrconf.c | 2 +-
net/ipv6/ip6_fib.c | 4 ++--
net/ipv6/route.c | 2 +-
7 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ae1f5d0..1bbf239 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -968,7 +968,7 @@ static __always_inline bool memcg_kmem_bypass(void)
return false;

/* Memcg to charge can't be determined. */
- if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD))
+ if (!in_task() || !current->mm || (current->flags & PF_KTHREAD))
return true;

return false;
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index a9f9379..79df7cd 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -57,7 +57,7 @@ int fib_default_rule_add(struct fib_rules_ops *ops,
{
struct fib_rule *r;

- r = kzalloc(ops->rule_size, GFP_KERNEL);
+ r = kzalloc(ops->rule_size, GFP_KERNEL_ACCOUNT);
if (r == NULL)
return -ENOMEM;

@@ -541,7 +541,7 @@ static int fib_nl2rule(struct sk_buff *skb, struct nlmsghdr *nlh,
goto errout;
}

- nlrule = kzalloc(ops->rule_size, GFP_KERNEL);
+ nlrule = kzalloc(ops->rule_size, GFP_KERNEL_ACCOUNT);
if (!nlrule) {
err = -ENOMEM;
goto errout;
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 73721a4..d38124b 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -215,7 +215,7 @@ static void devinet_sysctl_unregister(struct in_device *idev)

static struct in_ifaddr *inet_alloc_ifa(void)
{
- return kzalloc(sizeof(struct in_ifaddr), GFP_KERNEL);
+ return kzalloc(sizeof(struct in_ifaddr), GFP_KERNEL_ACCOUNT);
}

static void inet_rcu_free_ifa(struct rcu_head *head)
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 25cf387..8060524 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -2380,11 +2380,11 @@ void __init fib_trie_init(void)
{
fn_alias_kmem = kmem_cache_create("ip_fib_alias",
sizeof(struct fib_alias),
- 0, SLAB_PANIC, NULL);
+ 0, SLAB_PANIC | SLAB_ACCOUNT, NULL);

trie_leaf_kmem = kmem_cache_create("ip_fib_trie",
LEAF_SIZE,
- 0, SLAB_PANIC, NULL);
+ 0, SLAB_PANIC | SLAB_ACCOUNT, NULL);
}

struct fib_table *fib_trie_table(u32 id, struct fib_table *alias)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 3bf685f..8eaeade 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1080,7 +1080,7 @@ static int ipv6_add_addr_hash(struct net_device *dev, struct inet6_ifaddr *ifa)
goto out;
}

- ifa = kzalloc(sizeof(*ifa), gfp_flags);
+ ifa = kzalloc(sizeof(*ifa), gfp_flags | __GFP_ACCOUNT);
if (!ifa) {
err = -ENOBUFS;
goto out;
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 2d650dc..a8f118e 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -2449,8 +2449,8 @@ int __init fib6_init(void)
int ret = -ENOMEM;

fib6_node_kmem = kmem_cache_create("fib6_nodes",
- sizeof(struct fib6_node),
- 0, SLAB_HWCACHE_ALIGN,
+ sizeof(struct fib6_node), 0,
+ SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT,
NULL);
if (!fib6_node_kmem)
goto out;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 7b756a7..5f7286a 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -6638,7 +6638,7 @@ int __init ip6_route_init(void)
ret = -ENOMEM;
ip6_dst_ops_template.kmem_cachep =
kmem_cache_create("ip6_dst_cache", sizeof(struct rt6_info), 0,
- SLAB_HWCACHE_ALIGN, NULL);
+ SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, NULL);
if (!ip6_dst_ops_template.kmem_cachep)
goto out;

--
1.8.3.1


2021-07-19 14:16:36

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCH v5 02/16] memcg: enable accounting for IP address and routing-related objects

Hi Vasily,

On Mon, 19 Jul 2021 at 11:45, Vasily Averin <[email protected]> wrote:
[..]
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index ae1f5d0..1bbf239 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -968,7 +968,7 @@ static __always_inline bool memcg_kmem_bypass(void)
> return false;
>
> /* Memcg to charge can't be determined. */
> - if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD))
> + if (!in_task() || !current->mm || (current->flags & PF_KTHREAD))
> return true;

This seems to do two separate things in one patch.
Probably, it's better to separate them.
(I may miss how route changes are related to more generic
__alloc_pages() change)

Thanks,
Dmitry

2021-07-19 14:26:24

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCH v5 02/16] memcg: enable accounting for IP address and routing-related objects

On 7/19/21 3:22 PM, Shakeel Butt wrote:
> On Mon, Jul 19, 2021 at 7:00 AM Dmitry Safonov <[email protected]> wrote:
>>
>> Hi Vasily,
>>
>> On Mon, 19 Jul 2021 at 11:45, Vasily Averin <[email protected]> wrote:
>> [..]
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index ae1f5d0..1bbf239 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>>> @@ -968,7 +968,7 @@ static __always_inline bool memcg_kmem_bypass(void)
>>> return false;
>>>
>>> /* Memcg to charge can't be determined. */
>>> - if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD))
>>> + if (!in_task() || !current->mm || (current->flags & PF_KTHREAD))
>>> return true;
>>
>> This seems to do two separate things in one patch.
>> Probably, it's better to separate them.
>> (I may miss how route changes are related to more generic
>> __alloc_pages() change)
>>
>
> It was requested to squash them together in some previous versions.
> https://lore.kernel.org/linux-mm/[email protected]/
>

Ah, alright, never mind than.

Thanks,
Dmitry

2021-07-19 14:27:01

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH v5 02/16] memcg: enable accounting for IP address and routing-related objects

On Mon, Jul 19, 2021 at 7:00 AM Dmitry Safonov <[email protected]> wrote:
>
> Hi Vasily,
>
> On Mon, 19 Jul 2021 at 11:45, Vasily Averin <[email protected]> wrote:
> [..]
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index ae1f5d0..1bbf239 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -968,7 +968,7 @@ static __always_inline bool memcg_kmem_bypass(void)
> > return false;
> >
> > /* Memcg to charge can't be determined. */
> > - if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD))
> > + if (!in_task() || !current->mm || (current->flags & PF_KTHREAD))
> > return true;
>
> This seems to do two separate things in one patch.
> Probably, it's better to separate them.
> (I may miss how route changes are related to more generic
> __alloc_pages() change)
>

It was requested to squash them together in some previous versions.
https://lore.kernel.org/linux-mm/[email protected]/

2021-07-20 19:29:50

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH v5 02/16] memcg: enable accounting for IP address and routing-related objects

On Mon, Jul 19, 2021 at 3:44 AM Vasily Averin <[email protected]> wrote:
>
> An netadmin inside container can use 'ip a a' and 'ip r a'
> to assign a large number of ipv4/ipv6 addresses and routing entries
> and force kernel to allocate megabytes of unaccounted memory
> for long-lived per-netdevice related kernel objects:
> 'struct in_ifaddr', 'struct inet6_ifaddr', 'struct fib6_node',
> 'struct rt6_info', 'struct fib_rules' and ip_fib caches.
>
> These objects can be manually removed, though usually they lives
> in memory till destroy of its net namespace.
>
> It makes sense to account for them to restrict the host's memory
> consumption from inside the memcg-limited container.
>
> One of such objects is the 'struct fib6_node' mostly allocated in
> net/ipv6/route.c::__ip6_ins_rt() inside the lock_bh()/unlock_bh() section:
>
> write_lock_bh(&table->tb6_lock);
> err = fib6_add(&table->tb6_root, rt, info, mxc);
> write_unlock_bh(&table->tb6_lock);
>
> In this case it is not enough to simply add SLAB_ACCOUNT to corresponding
> kmem cache. The proper memory cgroup still cannot be found due to the
> incorrect 'in_interrupt()' check used in memcg_kmem_bypass().
>
> Obsoleted in_interrupt() does not describe real execution context properly.
> From include/linux/preempt.h:
>
> The following macros are deprecated and should not be used in new code:
> in_interrupt() - We're in NMI,IRQ,SoftIRQ context or have BH disabled
>
> To verify the current execution context new macro should be used instead:
> in_task() - We're in task context
>
> Signed-off-by: Vasily Averin <[email protected]>
> ---
> mm/memcontrol.c | 2 +-
> net/core/fib_rules.c | 4 ++--
> net/ipv4/devinet.c | 2 +-
> net/ipv4/fib_trie.c | 4 ++--
> net/ipv6/addrconf.c | 2 +-
> net/ipv6/ip6_fib.c | 4 ++--
> net/ipv6/route.c | 2 +-
> 7 files changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index ae1f5d0..1bbf239 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -968,7 +968,7 @@ static __always_inline bool memcg_kmem_bypass(void)
> return false;
>
> /* Memcg to charge can't be determined. */
> - if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD))
> + if (!in_task() || !current->mm || (current->flags & PF_KTHREAD))
> return true;
>
> return false;

Can you please also change in_interrupt() in active_memcg() as well?
There are other unrelated in_interrupt() in that file but the one in
active_memcg() should be coupled with this change.

2021-07-26 10:24:43

by Vasily Averin

[permalink] [raw]
Subject: Re: [PATCH v5 02/16] memcg: enable accounting for IP address and routing-related objects

On 7/20/21 10:26 PM, Shakeel Butt wrote:
> On Mon, Jul 19, 2021 at 3:44 AM Vasily Averin <[email protected]> wrote:
>>
>> An netadmin inside container can use 'ip a a' and 'ip r a'
>> to assign a large number of ipv4/ipv6 addresses and routing entries
>> and force kernel to allocate megabytes of unaccounted memory
>> for long-lived per-netdevice related kernel objects:
>> 'struct in_ifaddr', 'struct inet6_ifaddr', 'struct fib6_node',
>> 'struct rt6_info', 'struct fib_rules' and ip_fib caches.
>>
>> These objects can be manually removed, though usually they lives
>> in memory till destroy of its net namespace.
>>
>> It makes sense to account for them to restrict the host's memory
>> consumption from inside the memcg-limited container.
>>
>> One of such objects is the 'struct fib6_node' mostly allocated in
>> net/ipv6/route.c::__ip6_ins_rt() inside the lock_bh()/unlock_bh() section:
>>
>> write_lock_bh(&table->tb6_lock);
>> err = fib6_add(&table->tb6_root, rt, info, mxc);
>> write_unlock_bh(&table->tb6_lock);
>>
>> In this case it is not enough to simply add SLAB_ACCOUNT to corresponding
>> kmem cache. The proper memory cgroup still cannot be found due to the
>> incorrect 'in_interrupt()' check used in memcg_kmem_bypass().
>>
>> Obsoleted in_interrupt() does not describe real execution context properly.
>> From include/linux/preempt.h:
>>
>> The following macros are deprecated and should not be used in new code:
>> in_interrupt() - We're in NMI,IRQ,SoftIRQ context or have BH disabled
>>
>> To verify the current execution context new macro should be used instead:
>> in_task() - We're in task context
>>
>> Signed-off-by: Vasily Averin <[email protected]>
>> ---
>> mm/memcontrol.c | 2 +-
>> net/core/fib_rules.c | 4 ++--
>> net/ipv4/devinet.c | 2 +-
>> net/ipv4/fib_trie.c | 4 ++--
>> net/ipv6/addrconf.c | 2 +-
>> net/ipv6/ip6_fib.c | 4 ++--
>> net/ipv6/route.c | 2 +-
>> 7 files changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index ae1f5d0..1bbf239 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -968,7 +968,7 @@ static __always_inline bool memcg_kmem_bypass(void)
>> return false;
>>
>> /* Memcg to charge can't be determined. */
>> - if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD))
>> + if (!in_task() || !current->mm || (current->flags & PF_KTHREAD))
>> return true;
>>
>> return false;
>
> Can you please also change in_interrupt() in active_memcg() as well?
> There are other unrelated in_interrupt() in that file but the one in
> active_memcg() should be coupled with this change.

Could you please elaborate?
From my point of view active_memcg is paired with set_active_memcg() and is not related to this case.
active_memcg uses memcg that was set by set_active_memcg(), either from int_active_memcg per-cpu pointer
or from current->active_memcg pointer.
I'm agree, it in case of disabled BH it is incorrect to use int_active_memcg,
we still can use current->active_memcg. However it isn't a problem,
memcg will be properly provided in both cases.

I think it's better to fix set_active_memcg/active_memcg by separate patch.

Am I missed something perhaps?

Thank you,
Vasily Averin

2021-07-26 13:49:25

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH v5 02/16] memcg: enable accounting for IP address and routing-related objects

On Mon, Jul 26, 2021 at 3:23 AM Vasily Averin <[email protected]> wrote:
>
[...]
> >
> > Can you please also change in_interrupt() in active_memcg() as well?
> > There are other unrelated in_interrupt() in that file but the one in
> > active_memcg() should be coupled with this change.
>
> Could you please elaborate?
> From my point of view active_memcg is paired with set_active_memcg() and is not related to this case.
> active_memcg uses memcg that was set by set_active_memcg(), either from int_active_memcg per-cpu pointer
> or from current->active_memcg pointer.
> I'm agree, it in case of disabled BH it is incorrect to use int_active_memcg,
> we still can use current->active_memcg. However it isn't a problem,
> memcg will be properly provided in both cases.
>
> I think it's better to fix set_active_memcg/active_memcg by separate patch.
>
> Am I missed something perhaps?
>

No you are right. That should be a separate patch.

2021-07-26 17:10:13

by Vasily Averin

[permalink] [raw]
Subject: [PATCH] memcg: replace in_interrupt() by !in_task() in active_memcg()

set_active_memcg() uses in_interrupt() check to select proper storage for
cgroup: pointer on task struct or per-cpu pointer.

It isn't fully correct: obsoleted in_interrupt() includes tasks with disabled BH.
It's better to use '!in_task()' instead.

Link: https://lkml.org/lkml/2021/7/26/487
Fixes: 37d5985c003d ("mm: kmem: prepare remote memcg charging infra for interrupt contexts")
Signed-off-by: Vasily Averin <[email protected]>
---
include/linux/sched/mm.h | 2 +-
mm/memcontrol.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index e24b1fe348e3..9dd071f78dba 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -306,7 +306,7 @@ set_active_memcg(struct mem_cgroup *memcg)
{
struct mem_cgroup *old;

- if (in_interrupt()) {
+ if (!in_task()) {
old = this_cpu_read(int_active_memcg);
this_cpu_write(int_active_memcg, memcg);
} else {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ae1f5d0cb581..3ebf792ef2c0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -905,7 +905,7 @@ EXPORT_SYMBOL(mem_cgroup_from_task);

static __always_inline struct mem_cgroup *active_memcg(void)
{
- if (in_interrupt())
+ if (!in_task())
return this_cpu_read(int_active_memcg);
else
return current->active_memcg;
--
2.25.1

2021-07-26 17:25:45

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH] memcg: replace in_interrupt() by !in_task() in active_memcg()

On Mon, Jul 26, 2021 at 9:53 AM Vasily Averin <[email protected]> wrote:
>
> set_active_memcg() uses in_interrupt() check to select proper storage for
> cgroup: pointer on task struct or per-cpu pointer.
>
> It isn't fully correct: obsoleted in_interrupt() includes tasks with disabled BH.
> It's better to use '!in_task()' instead.
>
> Link: https://lkml.org/lkml/2021/7/26/487
> Fixes: 37d5985c003d ("mm: kmem: prepare remote memcg charging infra for interrupt contexts")
> Signed-off-by: Vasily Averin <[email protected]>

Thanks.

Reviewed-by: Shakeel Butt <[email protected]>