2021-04-01 03:07:03

by Muchun Song

[permalink] [raw]
Subject: [PATCH] mm: memcontrol: fix forget to obtain the ref to objcg in split_page_memcg

Christian Borntraeger reported a warning about "percpu ref
(obj_cgroup_release) <= 0 (-1) after switching to atomic".
Because we forgot to obtain the reference to the objcg and
wrongly obtain the reference of memcg.

Reported-by: Christian Borntraeger <[email protected]>
Signed-off-by: Muchun Song <[email protected]>
---
include/linux/memcontrol.h | 6 ++++++
mm/memcontrol.c | 6 +++++-
2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 0e8907957227..c960fd49c3e8 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -804,6 +804,12 @@ static inline void obj_cgroup_get(struct obj_cgroup *objcg)
percpu_ref_get(&objcg->refcnt);
}

+static inline void obj_cgroup_get_many(struct obj_cgroup *objcg,
+ unsigned long nr)
+{
+ percpu_ref_get_many(&objcg->refcnt, nr);
+}
+
static inline void obj_cgroup_put(struct obj_cgroup *objcg)
{
percpu_ref_put(&objcg->refcnt);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c0b83a396299..64ada9e650a5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3133,7 +3133,11 @@ void split_page_memcg(struct page *head, unsigned int nr)

for (i = 1; i < nr; i++)
head[i].memcg_data = head->memcg_data;
- css_get_many(&memcg->css, nr - 1);
+
+ if (PageMemcgKmem(head))
+ obj_cgroup_get_many(__page_objcg(head), nr - 1);
+ else
+ css_get_many(&memcg->css, nr - 1);
}

#ifdef CONFIG_MEMCG_SWAP
--
2.11.0


2021-04-01 03:38:36

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH] mm: memcontrol: fix forget to obtain the ref to objcg in split_page_memcg

On 2021/4/1 11:01, Muchun Song wrote:
> Christian Borntraeger reported a warning about "percpu ref
> (obj_cgroup_release) <= 0 (-1) after switching to atomic".
> Because we forgot to obtain the reference to the objcg and
> wrongly obtain the reference of memcg.
>
> Reported-by: Christian Borntraeger <[email protected]>
> Signed-off-by: Muchun Song <[email protected]>

Thanks for the patch.
Is a Fixes tag needed?

> ---
> include/linux/memcontrol.h | 6 ++++++
> mm/memcontrol.c | 6 +++++-
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 0e8907957227..c960fd49c3e8 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -804,6 +804,12 @@ static inline void obj_cgroup_get(struct obj_cgroup *objcg)
> percpu_ref_get(&objcg->refcnt);
> }
>
> +static inline void obj_cgroup_get_many(struct obj_cgroup *objcg,
> + unsigned long nr)
> +{
> + percpu_ref_get_many(&objcg->refcnt, nr);
> +}
> +
> static inline void obj_cgroup_put(struct obj_cgroup *objcg)
> {
> percpu_ref_put(&objcg->refcnt);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index c0b83a396299..64ada9e650a5 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3133,7 +3133,11 @@ void split_page_memcg(struct page *head, unsigned int nr)
>
> for (i = 1; i < nr; i++)
> head[i].memcg_data = head->memcg_data;
> - css_get_many(&memcg->css, nr - 1);
> +
> + if (PageMemcgKmem(head))
> + obj_cgroup_get_many(__page_objcg(head), nr - 1);
> + else
> + css_get_many(&memcg->css, nr - 1);
> }
>
> #ifdef CONFIG_MEMCG_SWAP
>

2021-04-01 03:39:55

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH] mm: memcontrol: fix forget to obtain the ref to objcg in split_page_memcg

On 2021/4/1 11:35, Roman Gushchin wrote:
> On Thu, Apr 01, 2021 at 11:31:16AM +0800, Miaohe Lin wrote:
>> On 2021/4/1 11:01, Muchun Song wrote:
>>> Christian Borntraeger reported a warning about "percpu ref
>>> (obj_cgroup_release) <= 0 (-1) after switching to atomic".
>>> Because we forgot to obtain the reference to the objcg and
>>> wrongly obtain the reference of memcg.
>>>
>>> Reported-by: Christian Borntraeger <[email protected]>
>>> Signed-off-by: Muchun Song <[email protected]>
>>
>> Thanks for the patch.
>> Is a Fixes tag needed?
>
> No, as the original patch hasn't been merged into the Linus's tree yet.
> So the fix can be simply squashed.
>
> Btw, the fix looks good to me.
>
> Acked-by: Roman Gushchin <[email protected]>
>

I see. Many thanks for explanation!

The code looks good to me.
Reviewed-by: Miaohe Lin <[email protected]>

>>
>>> ---
>>> include/linux/memcontrol.h | 6 ++++++
>>> mm/memcontrol.c | 6 +++++-
>>> 2 files changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>>> index 0e8907957227..c960fd49c3e8 100644
>>> --- a/include/linux/memcontrol.h
>>> +++ b/include/linux/memcontrol.h
>>> @@ -804,6 +804,12 @@ static inline void obj_cgroup_get(struct obj_cgroup *objcg)
>>> percpu_ref_get(&objcg->refcnt);
>>> }
>>>
>>> +static inline void obj_cgroup_get_many(struct obj_cgroup *objcg,
>>> + unsigned long nr)
>>> +{
>>> + percpu_ref_get_many(&objcg->refcnt, nr);
>>> +}
>>> +
>>> static inline void obj_cgroup_put(struct obj_cgroup *objcg)
>>> {
>>> percpu_ref_put(&objcg->refcnt);
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index c0b83a396299..64ada9e650a5 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>>> @@ -3133,7 +3133,11 @@ void split_page_memcg(struct page *head, unsigned int nr)
>>>
>>> for (i = 1; i < nr; i++)
>>> head[i].memcg_data = head->memcg_data;
>>> - css_get_many(&memcg->css, nr - 1);
>>> +
>>> + if (PageMemcgKmem(head))
>>> + obj_cgroup_get_many(__page_objcg(head), nr - 1);
>>> + else
>>> + css_get_many(&memcg->css, nr - 1);
>>> }
>>>
>>> #ifdef CONFIG_MEMCG_SWAP
>>>
>>
> .
>

2021-04-01 03:41:06

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH] mm: memcontrol: fix forget to obtain the ref to objcg in split_page_memcg

On Thu, Apr 01, 2021 at 11:31:16AM +0800, Miaohe Lin wrote:
> On 2021/4/1 11:01, Muchun Song wrote:
> > Christian Borntraeger reported a warning about "percpu ref
> > (obj_cgroup_release) <= 0 (-1) after switching to atomic".
> > Because we forgot to obtain the reference to the objcg and
> > wrongly obtain the reference of memcg.
> >
> > Reported-by: Christian Borntraeger <[email protected]>
> > Signed-off-by: Muchun Song <[email protected]>
>
> Thanks for the patch.
> Is a Fixes tag needed?

No, as the original patch hasn't been merged into the Linus's tree yet.
So the fix can be simply squashed.

Btw, the fix looks good to me.

Acked-by: Roman Gushchin <[email protected]>

>
> > ---
> > include/linux/memcontrol.h | 6 ++++++
> > mm/memcontrol.c | 6 +++++-
> > 2 files changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 0e8907957227..c960fd49c3e8 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -804,6 +804,12 @@ static inline void obj_cgroup_get(struct obj_cgroup *objcg)
> > percpu_ref_get(&objcg->refcnt);
> > }
> >
> > +static inline void obj_cgroup_get_many(struct obj_cgroup *objcg,
> > + unsigned long nr)
> > +{
> > + percpu_ref_get_many(&objcg->refcnt, nr);
> > +}
> > +
> > static inline void obj_cgroup_put(struct obj_cgroup *objcg)
> > {
> > percpu_ref_put(&objcg->refcnt);
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index c0b83a396299..64ada9e650a5 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -3133,7 +3133,11 @@ void split_page_memcg(struct page *head, unsigned int nr)
> >
> > for (i = 1; i < nr; i++)
> > head[i].memcg_data = head->memcg_data;
> > - css_get_many(&memcg->css, nr - 1);
> > +
> > + if (PageMemcgKmem(head))
> > + obj_cgroup_get_many(__page_objcg(head), nr - 1);
> > + else
> > + css_get_many(&memcg->css, nr - 1);
> > }
> >
> > #ifdef CONFIG_MEMCG_SWAP
> >
>

2021-04-03 01:11:11

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: memcontrol: fix forget to obtain the ref to objcg in split_page_memcg

On Wed, 31 Mar 2021 20:35:02 -0700 Roman Gushchin <[email protected]> wrote:

> On Thu, Apr 01, 2021 at 11:31:16AM +0800, Miaohe Lin wrote:
> > On 2021/4/1 11:01, Muchun Song wrote:
> > > Christian Borntraeger reported a warning about "percpu ref
> > > (obj_cgroup_release) <= 0 (-1) after switching to atomic".
> > > Because we forgot to obtain the reference to the objcg and
> > > wrongly obtain the reference of memcg.
> > >
> > > Reported-by: Christian Borntraeger <[email protected]>
> > > Signed-off-by: Muchun Song <[email protected]>
> >
> > Thanks for the patch.
> > Is a Fixes tag needed?
>
> No, as the original patch hasn't been merged into the Linus's tree yet.
> So the fix can be simply squashed.

Help. Which is "the original patch"?

> Btw, the fix looks good to me.
>
> Acked-by: Roman Gushchin <[email protected]>

2021-04-03 01:13:34

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH] mm: memcontrol: fix forget to obtain the ref to objcg in split_page_memcg

On Fri, Apr 02, 2021 at 06:04:54PM -0700, Andrew Morton wrote:
> On Wed, 31 Mar 2021 20:35:02 -0700 Roman Gushchin <[email protected]> wrote:
>
> > On Thu, Apr 01, 2021 at 11:31:16AM +0800, Miaohe Lin wrote:
> > > On 2021/4/1 11:01, Muchun Song wrote:
> > > > Christian Borntraeger reported a warning about "percpu ref
> > > > (obj_cgroup_release) <= 0 (-1) after switching to atomic".
> > > > Because we forgot to obtain the reference to the objcg and
> > > > wrongly obtain the reference of memcg.
> > > >
> > > > Reported-by: Christian Borntraeger <[email protected]>
> > > > Signed-off-by: Muchun Song <[email protected]>
> > >
> > > Thanks for the patch.
> > > Is a Fixes tag needed?
> >
> > No, as the original patch hasn't been merged into the Linus's tree yet.
> > So the fix can be simply squashed.
>
> Help. Which is "the original patch"?

"mm: memcontrol: use obj_cgroup APIs to charge kmem pages"

2021-04-03 01:14:37

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH] mm: memcontrol: fix forget to obtain the ref to objcg in split_page_memcg

On Fri, Apr 2, 2021 at 6:04 PM Andrew Morton <[email protected]> wrote:
>
> On Wed, 31 Mar 2021 20:35:02 -0700 Roman Gushchin <[email protected]> wrote:
>
> > On Thu, Apr 01, 2021 at 11:31:16AM +0800, Miaohe Lin wrote:
> > > On 2021/4/1 11:01, Muchun Song wrote:
> > > > Christian Borntraeger reported a warning about "percpu ref
> > > > (obj_cgroup_release) <= 0 (-1) after switching to atomic".
> > > > Because we forgot to obtain the reference to the objcg and
> > > > wrongly obtain the reference of memcg.
> > > >
> > > > Reported-by: Christian Borntraeger <[email protected]>
> > > > Signed-off-by: Muchun Song <[email protected]>
> > >
> > > Thanks for the patch.
> > > Is a Fixes tag needed?
> >
> > No, as the original patch hasn't been merged into the Linus's tree yet.
> > So the fix can be simply squashed.
>
> Help. Which is "the original patch"?

"mm: memcontrol: use obj_cgroup APIs to charge kmem pages"

2021-04-12 11:06:25

by Christian Borntraeger

[permalink] [raw]
Subject: RE: [PATCH] mm: memcontrol: fix forget to obtain the ref to objcg in split_page_memcg



On 12.04.21 12:53, Muchun Song wrote:
> On Mon, Apr 12, 2021 at 6:42 PM Christian Borntraeger
> <[email protected]> wrote:
>>
>> FWIW, I was away the last week, and I checked yesterdays next (e99d8a849517) regression runs.
>> I still do see errors in our CI system:
>>
>> [ 2263.021681] ------------[ cut here ]------------
>> [ 2263.021697] percpu ref (obj_cgroup_release) <= 0 (0) after switching to atomic
>> [ 2263.021748] WARNING: CPU: 4 PID: 0 at lib/percpu-refcount.c:196 percpu_ref_switch_to_atomic_rcu+0x1ea/0x1f8
>> [ 2263.021756] Modules linked in: scsi_debug vfio_pci irqbypass vfio_virqfd kvm vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink dm_service_time zfcp scsi_transport_fc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm rdma_cm iw_cm ib_cm mlx5_ib dm_mod ib_uverbs ib_core s390_trng vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio eadm_sch zcrypt_cex4 sch_fq_codel configfs ip_tables x_tables ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 mlx5_core sha512_s390 sha256_s390 sha1_s390 sha_common nvme nvme_core pkey zcrypt rng_core autofs4 [last unloaded: vfio_ap]
>> [ 2263.021820] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.12.0-20210412.rc6.git0.e99d8a849517.300.fc33.s390x+next #1
>> [ 2263.021823] Hardware name: IBM 8561 T01 703 (LPAR)
>> [ 2263.021825] Krnl PSW : 0704c00180000000 000000025b234c1e (percpu_ref_switch_to_atomic_rcu+0x1ee/0x1f8)
>> [ 2263.021829] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>> [ 2263.021832] Krnl GPRS: c0000000fffeffff 00000002f7212818 0000000000000042 00000000fffeffff
>> [ 2263.021834] 00000000ffffffea 0000038000000001 0000000000000000 000003800000017c
>> [ 2263.021836] 000000025b980988 00000000b774d0e0 000003fee191d5d8 8000000000000000
>> [ 2263.021838] 000000008034c000 00000002f7227570 000000025b234c1a 00000380000aba28
>> [ 2263.021849] Krnl Code: 000000025b234c0e: e3309fe8ff04 lg %r3,-24(%r9)
>> 000000025b234c14: c0e5001ebe92 brasl %r14,000000025b60c938
>> #000000025b234c1a: af000000 mc 0,0
>> >000000025b234c1e: a7f4ffcc brc 15,000000025b234bb6
>> 000000025b234c22: 0707 bcr 0,%r7
>> 000000025b234c24: 0707 bcr 0,%r7
>> 000000025b234c26: 0707 bcr 0,%r7
>> 000000025b234c28: eb6ff0480024 stmg %r6,%r15,72(%r15)
>> [ 2263.021912] Call Trace:
>> [ 2263.021914] [<000000025b234c1e>] percpu_ref_switch_to_atomic_rcu+0x1ee/0x1f8
>> [ 2263.021917] ([<000000025b234c1a>] percpu_ref_switch_to_atomic_rcu+0x1ea/0x1f8)
>> [ 2263.021919] [<000000025abe16fe>] rcu_do_batch+0x146/0x608
>> [ 2263.021924] [<000000025abe5ff4>] rcu_core+0x124/0x1d0
>> [ 2263.021926] [<000000025b62a222>] __do_softirq+0x13a/0x3c8
>> [ 2263.021930] [<000000025ab5d3f6>] irq_exit+0xce/0xf8
>> [ 2263.021934] [<000000025b61a5f6>] do_ext_irq+0xd6/0x160
>> [ 2263.021937] [<000000025b627c3c>] ext_int_handler+0xc4/0xf4
>> [ 2263.021939] [<0000000000000000>] 0x0
>> [ 2263.021943] [<000000025b62775a>] default_idle_call+0x42/0x110
>> [ 2263.021945] [<000000025ab99328>] do_idle+0xd8/0x168
>> [ 2263.021949] [<000000025ab99576>] cpu_startup_entry+0x36/0x40
>> [ 2263.021952] [<000000025ab1f33a>] smp_start_secondary+0x82/0x88
>> [ 2263.021955] Last Breaking-Event-Address:
>> [ 2263.021955] [<000000025abc8828>] vprintk_emit+0xa8/0x110
>> [ 2263.021961] Kernel panic - not syncing: panic_on_warn set ...
>> [ 2263.021962] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.12.0-20210412.rc6.git0.e99d8a849517.300.fc33.s390x+next #1
>> [ 2263.021964] Hardware name: IBM 8561 T01 703 (LPAR)
>> [ 2263.021965] Call Trace:
>> [ 2263.021966] [<000000025b60bc9a>] show_stack+0x92/0xd8
>> [ 2263.021972] [<000000025b6161c0>] dump_stack+0x90/0xc0
>> [ 2263.021975] [<000000025b60cab2>] panic+0x112/0x308
>> [ 2263.021977] [<000000025ab5571a>] __warn+0xc2/0x158
>> [ 2263.021981] [<000000025b2a5e4a>] report_bug+0xb2/0x130
>> [ 2263.021984] [<000000025ab09ef4>] monitor_event_exception+0x44/0xc0
>> [ 2263.021986] [<000000025b61a1e8>] __do_pgm_check+0xe0/0x1f0
>> [ 2263.021988] [<000000025b627b30>] pgm_check_handler+0x118/0x160
>> [ 2263.021990] [<000000025b234c1e>] percpu_ref_switch_to_atomic_rcu+0x1ee/0x1f8
>> [ 2263.021992] ([<000000025b234c1a>] percpu_ref_switch_to_atomic_rcu+0x1ea/0x1f8)
>> [ 2263.021993] [<000000025abe16fe>] rcu_do_batch+0x146/0x608
>> [ 2263.021995] [<000000025abe5ff4>] rcu_core+0x124/0x1d0
>> [ 2263.021997] [<000000025b62a222>] __do_softirq+0x13a/0x3c8
>> [ 2263.021998] [<000000025ab5d3f6>] irq_exit+0xce/0xf8
>> [ 2263.022000] [<000000025b61a5f6>] do_ext_irq+0xd6/0x160
>> [ 2263.022001] [<000000025b627c3c>] ext_int_handler+0xc4/0xf4
>> [ 2263.022003] [<0000000000000000>] 0x0
>> [ 2263.022004] [<000000025b62775a>] default_idle_call+0x42/0x110
>> [ 2263.022006] [<000000025ab99328>] do_idle+0xd8/0x168
>> [ 2263.022008] [<000000025ab99576>] cpu_startup_entry+0x36/0x40
>>
>> So either the fix was not complete or it is still missing in next.
>
> The fix now is on the mm-tree. I guess the branch you
> tested does not contain this fix patch. You can check if
> the function of obj_cgroup_get_many() exists. If it
> doesn't exist, this means my guess is correct.

Right, the next tree from april 9th does not yet contain obj_cgroup_get_many.

2021-04-12 23:32:42

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [PATCH] mm: memcontrol: fix forget to obtain the ref to objcg in split_page_memcg

FWIW, I was away the last week, and I checked yesterdays next (e99d8a849517) regression runs.
I still do see errors in our CI system:

[ 2263.021681] ------------[ cut here ]------------
[ 2263.021697] percpu ref (obj_cgroup_release) <= 0 (0) after switching to atomic
[ 2263.021748] WARNING: CPU: 4 PID: 0 at lib/percpu-refcount.c:196 percpu_ref_switch_to_atomic_rcu+0x1ea/0x1f8
[ 2263.021756] Modules linked in: scsi_debug vfio_pci irqbypass vfio_virqfd kvm vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink dm_service_time zfcp scsi_transport_fc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm rdma_cm iw_cm ib_cm mlx5_ib dm_mod ib_uverbs ib_core s390_trng vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio eadm_sch zcrypt_cex4 sch_fq_codel configfs ip_tables x_tables ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 mlx5_core sha512_s390 sha256_s390 sha1_s390 sha_common nvme nvme_core pkey zcrypt rng_core autofs4 [last unloaded: vfio_ap]
[ 2263.021820] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.12.0-20210412.rc6.git0.e99d8a849517.300.fc33.s390x+next #1
[ 2263.021823] Hardware name: IBM 8561 T01 703 (LPAR)
[ 2263.021825] Krnl PSW : 0704c00180000000 000000025b234c1e (percpu_ref_switch_to_atomic_rcu+0x1ee/0x1f8)
[ 2263.021829] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[ 2263.021832] Krnl GPRS: c0000000fffeffff 00000002f7212818 0000000000000042 00000000fffeffff
[ 2263.021834] 00000000ffffffea 0000038000000001 0000000000000000 000003800000017c
[ 2263.021836] 000000025b980988 00000000b774d0e0 000003fee191d5d8 8000000000000000
[ 2263.021838] 000000008034c000 00000002f7227570 000000025b234c1a 00000380000aba28
[ 2263.021849] Krnl Code: 000000025b234c0e: e3309fe8ff04 lg %r3,-24(%r9)
000000025b234c14: c0e5001ebe92 brasl %r14,000000025b60c938
#000000025b234c1a: af000000 mc 0,0
>000000025b234c1e: a7f4ffcc brc 15,000000025b234bb6
000000025b234c22: 0707 bcr 0,%r7
000000025b234c24: 0707 bcr 0,%r7
000000025b234c26: 0707 bcr 0,%r7
000000025b234c28: eb6ff0480024 stmg %r6,%r15,72(%r15)
[ 2263.021912] Call Trace:
[ 2263.021914] [<000000025b234c1e>] percpu_ref_switch_to_atomic_rcu+0x1ee/0x1f8
[ 2263.021917] ([<000000025b234c1a>] percpu_ref_switch_to_atomic_rcu+0x1ea/0x1f8)
[ 2263.021919] [<000000025abe16fe>] rcu_do_batch+0x146/0x608
[ 2263.021924] [<000000025abe5ff4>] rcu_core+0x124/0x1d0
[ 2263.021926] [<000000025b62a222>] __do_softirq+0x13a/0x3c8
[ 2263.021930] [<000000025ab5d3f6>] irq_exit+0xce/0xf8
[ 2263.021934] [<000000025b61a5f6>] do_ext_irq+0xd6/0x160
[ 2263.021937] [<000000025b627c3c>] ext_int_handler+0xc4/0xf4
[ 2263.021939] [<0000000000000000>] 0x0
[ 2263.021943] [<000000025b62775a>] default_idle_call+0x42/0x110
[ 2263.021945] [<000000025ab99328>] do_idle+0xd8/0x168
[ 2263.021949] [<000000025ab99576>] cpu_startup_entry+0x36/0x40
[ 2263.021952] [<000000025ab1f33a>] smp_start_secondary+0x82/0x88
[ 2263.021955] Last Breaking-Event-Address:
[ 2263.021955] [<000000025abc8828>] vprintk_emit+0xa8/0x110
[ 2263.021961] Kernel panic - not syncing: panic_on_warn set ...
[ 2263.021962] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.12.0-20210412.rc6.git0.e99d8a849517.300.fc33.s390x+next #1
[ 2263.021964] Hardware name: IBM 8561 T01 703 (LPAR)
[ 2263.021965] Call Trace:
[ 2263.021966] [<000000025b60bc9a>] show_stack+0x92/0xd8
[ 2263.021972] [<000000025b6161c0>] dump_stack+0x90/0xc0
[ 2263.021975] [<000000025b60cab2>] panic+0x112/0x308
[ 2263.021977] [<000000025ab5571a>] __warn+0xc2/0x158
[ 2263.021981] [<000000025b2a5e4a>] report_bug+0xb2/0x130
[ 2263.021984] [<000000025ab09ef4>] monitor_event_exception+0x44/0xc0
[ 2263.021986] [<000000025b61a1e8>] __do_pgm_check+0xe0/0x1f0
[ 2263.021988] [<000000025b627b30>] pgm_check_handler+0x118/0x160
[ 2263.021990] [<000000025b234c1e>] percpu_ref_switch_to_atomic_rcu+0x1ee/0x1f8
[ 2263.021992] ([<000000025b234c1a>] percpu_ref_switch_to_atomic_rcu+0x1ea/0x1f8)
[ 2263.021993] [<000000025abe16fe>] rcu_do_batch+0x146/0x608
[ 2263.021995] [<000000025abe5ff4>] rcu_core+0x124/0x1d0
[ 2263.021997] [<000000025b62a222>] __do_softirq+0x13a/0x3c8
[ 2263.021998] [<000000025ab5d3f6>] irq_exit+0xce/0xf8
[ 2263.022000] [<000000025b61a5f6>] do_ext_irq+0xd6/0x160
[ 2263.022001] [<000000025b627c3c>] ext_int_handler+0xc4/0xf4
[ 2263.022003] [<0000000000000000>] 0x0
[ 2263.022004] [<000000025b62775a>] default_idle_call+0x42/0x110
[ 2263.022006] [<000000025ab99328>] do_idle+0xd8/0x168
[ 2263.022008] [<000000025ab99576>] cpu_startup_entry+0x36/0x40

So either the fix was not complete or it is still missing in next.

2021-04-12 23:55:54

by Muchun Song

[permalink] [raw]
Subject: Re: [External] Re: [PATCH] mm: memcontrol: fix forget to obtain the ref to objcg in split_page_memcg

On Mon, Apr 12, 2021 at 6:42 PM Christian Borntraeger
<[email protected]> wrote:
>
> FWIW, I was away the last week, and I checked yesterdays next (e99d8a849517) regression runs.
> I still do see errors in our CI system:
>
> [ 2263.021681] ------------[ cut here ]------------
> [ 2263.021697] percpu ref (obj_cgroup_release) <= 0 (0) after switching to atomic
> [ 2263.021748] WARNING: CPU: 4 PID: 0 at lib/percpu-refcount.c:196 percpu_ref_switch_to_atomic_rcu+0x1ea/0x1f8
> [ 2263.021756] Modules linked in: scsi_debug vfio_pci irqbypass vfio_virqfd kvm vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink dm_service_time zfcp scsi_transport_fc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua rpcrdma sunrpc rdma_ucm rdma_cm iw_cm ib_cm mlx5_ib dm_mod ib_uverbs ib_core s390_trng vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio eadm_sch zcrypt_cex4 sch_fq_codel configfs ip_tables x_tables ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 mlx5_core sha512_s390 sha256_s390 sha1_s390 sha_common nvme nvme_core pkey zcrypt rng_core autofs4 [last unloaded: vfio_ap]
> [ 2263.021820] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.12.0-20210412.rc6.git0.e99d8a849517.300.fc33.s390x+next #1
> [ 2263.021823] Hardware name: IBM 8561 T01 703 (LPAR)
> [ 2263.021825] Krnl PSW : 0704c00180000000 000000025b234c1e (percpu_ref_switch_to_atomic_rcu+0x1ee/0x1f8)
> [ 2263.021829] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [ 2263.021832] Krnl GPRS: c0000000fffeffff 00000002f7212818 0000000000000042 00000000fffeffff
> [ 2263.021834] 00000000ffffffea 0000038000000001 0000000000000000 000003800000017c
> [ 2263.021836] 000000025b980988 00000000b774d0e0 000003fee191d5d8 8000000000000000
> [ 2263.021838] 000000008034c000 00000002f7227570 000000025b234c1a 00000380000aba28
> [ 2263.021849] Krnl Code: 000000025b234c0e: e3309fe8ff04 lg %r3,-24(%r9)
> 000000025b234c14: c0e5001ebe92 brasl %r14,000000025b60c938
> #000000025b234c1a: af000000 mc 0,0
> >000000025b234c1e: a7f4ffcc brc 15,000000025b234bb6
> 000000025b234c22: 0707 bcr 0,%r7
> 000000025b234c24: 0707 bcr 0,%r7
> 000000025b234c26: 0707 bcr 0,%r7
> 000000025b234c28: eb6ff0480024 stmg %r6,%r15,72(%r15)
> [ 2263.021912] Call Trace:
> [ 2263.021914] [<000000025b234c1e>] percpu_ref_switch_to_atomic_rcu+0x1ee/0x1f8
> [ 2263.021917] ([<000000025b234c1a>] percpu_ref_switch_to_atomic_rcu+0x1ea/0x1f8)
> [ 2263.021919] [<000000025abe16fe>] rcu_do_batch+0x146/0x608
> [ 2263.021924] [<000000025abe5ff4>] rcu_core+0x124/0x1d0
> [ 2263.021926] [<000000025b62a222>] __do_softirq+0x13a/0x3c8
> [ 2263.021930] [<000000025ab5d3f6>] irq_exit+0xce/0xf8
> [ 2263.021934] [<000000025b61a5f6>] do_ext_irq+0xd6/0x160
> [ 2263.021937] [<000000025b627c3c>] ext_int_handler+0xc4/0xf4
> [ 2263.021939] [<0000000000000000>] 0x0
> [ 2263.021943] [<000000025b62775a>] default_idle_call+0x42/0x110
> [ 2263.021945] [<000000025ab99328>] do_idle+0xd8/0x168
> [ 2263.021949] [<000000025ab99576>] cpu_startup_entry+0x36/0x40
> [ 2263.021952] [<000000025ab1f33a>] smp_start_secondary+0x82/0x88
> [ 2263.021955] Last Breaking-Event-Address:
> [ 2263.021955] [<000000025abc8828>] vprintk_emit+0xa8/0x110
> [ 2263.021961] Kernel panic - not syncing: panic_on_warn set ...
> [ 2263.021962] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.12.0-20210412.rc6.git0.e99d8a849517.300.fc33.s390x+next #1
> [ 2263.021964] Hardware name: IBM 8561 T01 703 (LPAR)
> [ 2263.021965] Call Trace:
> [ 2263.021966] [<000000025b60bc9a>] show_stack+0x92/0xd8
> [ 2263.021972] [<000000025b6161c0>] dump_stack+0x90/0xc0
> [ 2263.021975] [<000000025b60cab2>] panic+0x112/0x308
> [ 2263.021977] [<000000025ab5571a>] __warn+0xc2/0x158
> [ 2263.021981] [<000000025b2a5e4a>] report_bug+0xb2/0x130
> [ 2263.021984] [<000000025ab09ef4>] monitor_event_exception+0x44/0xc0
> [ 2263.021986] [<000000025b61a1e8>] __do_pgm_check+0xe0/0x1f0
> [ 2263.021988] [<000000025b627b30>] pgm_check_handler+0x118/0x160
> [ 2263.021990] [<000000025b234c1e>] percpu_ref_switch_to_atomic_rcu+0x1ee/0x1f8
> [ 2263.021992] ([<000000025b234c1a>] percpu_ref_switch_to_atomic_rcu+0x1ea/0x1f8)
> [ 2263.021993] [<000000025abe16fe>] rcu_do_batch+0x146/0x608
> [ 2263.021995] [<000000025abe5ff4>] rcu_core+0x124/0x1d0
> [ 2263.021997] [<000000025b62a222>] __do_softirq+0x13a/0x3c8
> [ 2263.021998] [<000000025ab5d3f6>] irq_exit+0xce/0xf8
> [ 2263.022000] [<000000025b61a5f6>] do_ext_irq+0xd6/0x160
> [ 2263.022001] [<000000025b627c3c>] ext_int_handler+0xc4/0xf4
> [ 2263.022003] [<0000000000000000>] 0x0
> [ 2263.022004] [<000000025b62775a>] default_idle_call+0x42/0x110
> [ 2263.022006] [<000000025ab99328>] do_idle+0xd8/0x168
> [ 2263.022008] [<000000025ab99576>] cpu_startup_entry+0x36/0x40
>
> So either the fix was not complete or it is still missing in next.

The fix now is on the mm-tree. I guess the branch you
tested does not contain this fix patch. You can check if
the function of obj_cgroup_get_many() exists. If it
doesn't exist, this means my guess is correct.

Thanks.