2022-04-06 08:38:29

by Michal Hocko

[permalink] [raw]
Subject: Re: mm: swap: locking in release_pages()

On Tue 05-04-22 13:36:09, Alexander Sverdlin wrote:
> Hello Michal,
>
> thanks for the quick reply!
>
> On 05/04/2022 12:43, Michal Hocko wrote:
> >> 1. Crash of v5.4.170 on an ARM32 machine:
> >>
> >> Unable to handle kernel NULL pointer dereference at virtual address 00000104
> >> pgd = e138149d
> >> [00000104] *pgd=84d2fd003, *pmd=8ffd6f003
> >> Internal error: Oops: a07 [#1] PREEMPT SMP ARM
> >> ...
> >> CPU: 1 PID: 6172 Comm: AaSysInfoRColle Tainted: G B O 5.4.170-... #1
> >> Hardware name: Keystone
> >> PC is at release_pages+0x194/0x358
> >> LR is at release_pages+0x10c/0x358
> > Which LOC does this correspond to? (faddr2line should give you a nice
> > output).
>
> Sorry, I forgot this info in the initial report:
>
> this is indeed the del_page_from_lru_list() in this crash.

Could you be more specific please? Is the problem in list_del or
update_lru_size?

--
Michal Hocko
SUSE Labs


2022-04-06 09:47:26

by Alexander Sverdlin

[permalink] [raw]
Subject: Re: mm: swap: locking in release_pages()

Hello Michal!

On 05/04/2022 13:45, Michal Hocko wrote:
>>>> 1. Crash of v5.4.170 on an ARM32 machine:
>>>>
>>>> Unable to handle kernel NULL pointer dereference at virtual address 00000104
>>>> pgd = e138149d
>>>> [00000104] *pgd=84d2fd003, *pmd=8ffd6f003
>>>> Internal error: Oops: a07 [#1] PREEMPT SMP ARM
>>>> ...
>>>> CPU: 1 PID: 6172 Comm: AaSysInfoRColle Tainted: G B O 5.4.170-... #1
>>>> Hardware name: Keystone
>>>> PC is at release_pages+0x194/0x358
>>>> LR is at release_pages+0x10c/0x358
>>> Which LOC does this correspond to? (faddr2line should give you a nice
>>> output).
>> Sorry, I forgot this info in the initial report:
>>
>> this is indeed the del_page_from_lru_list() in this crash.
> Could you be more specific please? Is the problem in list_del or
> update_lru_size?

static inline void __list_del(struct list_head * prev, struct list_head * next)
{
next->prev = prev; <--

--
Best regards,
Alexander Sverdlin.

2022-04-06 12:46:18

by Michal Hocko

[permalink] [raw]
Subject: Re: mm: swap: locking in release_pages()

On Tue 05-04-22 16:00:54, Alexander Sverdlin wrote:
> Hello Michal!
>
> On 05/04/2022 13:45, Michal Hocko wrote:
> >>>> 1. Crash of v5.4.170 on an ARM32 machine:
> >>>>
> >>>> Unable to handle kernel NULL pointer dereference at virtual address 00000104
> >>>> pgd = e138149d
> >>>> [00000104] *pgd=84d2fd003, *pmd=8ffd6f003
> >>>> Internal error: Oops: a07 [#1] PREEMPT SMP ARM
> >>>> ...
> >>>> CPU: 1 PID: 6172 Comm: AaSysInfoRColle Tainted: G B O 5.4.170-... #1
> >>>> Hardware name: Keystone
> >>>> PC is at release_pages+0x194/0x358
> >>>> LR is at release_pages+0x10c/0x358
> >>> Which LOC does this correspond to? (faddr2line should give you a nice
> >>> output).
> >> Sorry, I forgot this info in the initial report:
> >>
> >> this is indeed the del_page_from_lru_list() in this crash.
> > Could you be more specific please? Is the problem in list_del or
> > update_lru_size?
>
> static inline void __list_del(struct list_head * prev, struct list_head * next)
> {
> next->prev = prev; <--

OK, I see. AFAICS this means that entry->next is NULL which doesn't look
like somebody else has done list_del as that would leave poison values
behind. Maybe somebody has clobbered the page state.

In any case I would recommend reproducing without stable patches and/or
with the current Linus tree.
--
Michal Hocko
SUSE Labs