2022-06-27 16:51:16

by Charan Teja Kalla

[permalink] [raw]
Subject: Discussion on race between freed page_ext access and memory offline operation

The below race between page_ext and online/offline of the respective
memory blocks will cause use-after-free on the access of page_ext structure.

process1 process2
--------- ---------
a)doing /proc/page_owner doing memory offline
through offline_pages

b)PageBuddy check is failed
thus proceed to get the
page_owner information
through page_ext access.
page_ext = lookup_page_ext(page);

migrate_pages();
................
Since all pages are successfully
migrated as part of the offline
operation,send MEM_OFFLINE notification
where for page_ext it calls:
offline_page_ext()-->
__free_page_ext()-->
free_page_ext()-->
vfree(ms->page_ext)
mem_section->page_ext = NULL

c) Check for the PAGE_EXT flags
in the page_ext->flags access
results into the use-after-free(leading
to the translation faults).

As mentioned above, there is really no synchronization between page_ext
access and its freeing in the memory_offline. The above is just one
example but the problem persists in the other paths too involving
page_ext->flags access(eg: page_is_idle()).

The memory offline steps(roughly) on a memory block is as below:
1) Isolate all the pages
2) while(1)
try free the pages to buddy.(->free_list[MIGRATE_ISOLATE])
3) delete the pages from this buddy list.
4) Then free page_ext.(Note: The struct page is still alive as it is
freed only during hot remove of the memory which frees the memmap, which
steps the user might not perform).

This design leads to the state where struct page is alive but the struct
page_ext is freed, where the later is ideally part of the former which
just representing the page_flags. This seems to be a wrong design where
'struct page' as a whole is not accessible(Thanks to Minchan for
pointing this out).

Some solutions we think are:
----------------------------
1) Take the mem_hotplug_lock read_lock every time page_ext access.

2) Take the extra refcount on the page every time page_ext access is
made, so that parallel offline operation can't free the page to buddy.

3) Change the design where the page_ext is valid as long as the struct
page is alive.

Any other inputs here?

PS: This bug is uncovered while fixing the same page_ext access issue
with the page
pinner(https://lore.kernel.org/linux-mm/[email protected]/)
on Andorid.


Thanks,
Charan


2022-06-27 16:52:48

by David Hildenbrand

[permalink] [raw]
Subject: Re: Discussion on race between freed page_ext access and memory offline operation

On 27.06.22 18:09, Charan Teja Kalla wrote:
> The below race between page_ext and online/offline of the respective
> memory blocks will cause use-after-free on the access of page_ext structure.
>
> process1 process2
> --------- ---------
> a)doing /proc/page_owner doing memory offline
> through offline_pages
>
> b)PageBuddy check is failed
> thus proceed to get the
> page_owner information
> through page_ext access.
> page_ext = lookup_page_ext(page);
>
> migrate_pages();
> ................
> Since all pages are successfully
> migrated as part of the offline
> operation,send MEM_OFFLINE notification
> where for page_ext it calls:
> offline_page_ext()-->
> __free_page_ext()-->
> free_page_ext()-->
> vfree(ms->page_ext)
> mem_section->page_ext = NULL
>
> c) Check for the PAGE_EXT flags
> in the page_ext->flags access
> results into the use-after-free(leading
> to the translation faults).
>
> As mentioned above, there is really no synchronization between page_ext
> access and its freeing in the memory_offline. The above is just one
> example but the problem persists in the other paths too involving
> page_ext->flags access(eg: page_is_idle()).
>
> The memory offline steps(roughly) on a memory block is as below:
> 1) Isolate all the pages
> 2) while(1)
> try free the pages to buddy.(->free_list[MIGRATE_ISOLATE])
> 3) delete the pages from this buddy list.
> 4) Then free page_ext.(Note: The struct page is still alive as it is
> freed only during hot remove of the memory which frees the memmap, which
> steps the user might not perform).
>
> This design leads to the state where struct page is alive but the struct
> page_ext is freed, where the later is ideally part of the former which
> just representing the page_flags. This seems to be a wrong design where
> 'struct page' as a whole is not accessible(Thanks to Minchan for
> pointing this out).

Accessing the struct page -- including any extensions -- is invalid if
the memory section is marked offline.

Usual PFN walkers use pfn_to_online_page() to make sure we have PFN with
an actual meaning in it.

There is no real synchronization between pfn_to_online_page() and memory
offline code. For now it wasn't required because it was never relevant
in practice.

After pfn_to_online_page() it takes quite a long time until memory is
actually offlined and then, the memmap is removed. Maybe it's different
for page_ext.


It smells like page_ext should use some mechanism during MEM_OFFLINE to
synchronize against any users of its metadata. Generic memory offlining
code might be the wrong place for that.

>
> Some solutions we think are:
> ----------------------------
> 1) Take the mem_hotplug_lock read_lock every time page_ext access.

That would be the big hammer. But it feels wrong, because page_ext is
another subsystem that's synchronized from generic memory offlining code
via the notifier.

>
> 2) Take the extra refcount on the page every time page_ext access is
> made, so that parallel offline operation can't free the page to buddy.

No, that's no good. Just racy.

>
> 3) Change the design where the page_ext is valid as long as the struct
> page is alive.

:/ Doesn't spark joy.

>
> Any other inputs here?


page_ext needs a mechanism to synchronize against any users of the data
it manages. Maybe RCU can help?


--
Thanks,

David / dhildenb

2022-06-28 14:16:56

by Charan Teja Kalla

[permalink] [raw]
Subject: Re: Discussion on race between freed page_ext access and memory offline operation

Thanks David for the inputs!!

On 6/27/2022 10:05 PM, David Hildenbrand wrote:
> On 27.06.22 18:09, Charan Teja Kalla wrote:
>> The below race between page_ext and online/offline of the respective
>> memory blocks will cause use-after-free on the access of page_ext structure.
>>
>> process1 process2
>> --------- ---------
>> a)doing /proc/page_owner doing memory offline
>> through offline_pages
>>
>> b)PageBuddy check is failed
>> thus proceed to get the
>> page_owner information
>> through page_ext access.
>> page_ext = lookup_page_ext(page);
>>
>> migrate_pages();
>> ................
>> Since all pages are successfully
>> migrated as part of the offline
>> operation,send MEM_OFFLINE notification
>> where for page_ext it calls:
>> offline_page_ext()-->
>> __free_page_ext()-->
>> free_page_ext()-->
>> vfree(ms->page_ext)
>> mem_section->page_ext = NULL
>>
>> c) Check for the PAGE_EXT flags
>> in the page_ext->flags access
>> results into the use-after-free(leading
>> to the translation faults).
>>
>> As mentioned above, there is really no synchronization between page_ext
>> access and its freeing in the memory_offline. The above is just one
>> example but the problem persists in the other paths too involving
>> page_ext->flags access(eg: page_is_idle()).
>>
>> The memory offline steps(roughly) on a memory block is as below:
>> 1) Isolate all the pages
>> 2) while(1)
>> try free the pages to buddy.(->free_list[MIGRATE_ISOLATE])
>> 3) delete the pages from this buddy list.
>> 4) Then free page_ext.(Note: The struct page is still alive as it is
>> freed only during hot remove of the memory which frees the memmap, which
>> steps the user might not perform).
>>
>> This design leads to the state where struct page is alive but the struct
>> page_ext is freed, where the later is ideally part of the former which
>> just representing the page_flags. This seems to be a wrong design where
>> 'struct page' as a whole is not accessible(Thanks to Minchan for
>> pointing this out).
> Accessing the struct page -- including any extensions -- is invalid if
> the memory section is marked offline.
>
> Usual PFN walkers use pfn_to_online_page() to make sure we have PFN with
> an actual meaning in it.

Is there such enforcement from the kernel side to use the
pfn_to_online_page() while doing the pfn walk? Eg: In the same
read_page_owner()(Not sure of the other places), it is not used while
doing the pfn walk.

>
> There is no real synchronization between pfn_to_online_page() and memory
> offline code. For now it wasn't required because it was never relevant
> in practice.
>

Isn't the race here makes the code to still use the page despite it got
offlined parallel there by making the statement 'Accessing the struct
page -- including any extensions -- is invalid' applicable here. Eg: In
the same read_page_owner(), it can go and try to dump the page_owner of
a page(agree that it dumps the proper page_owner) in print_page_owner(),
where it accesses the page->flags?

> After pfn_to_online_page() it takes quite a long time until memory is
> actually offlined and then, the memmap is removed. Maybe it's different
> for page_ext.
>
As you already well aware, the memmap will not be removed as long as we
are playing just with the offline/online operation but page_ext is freed
even during the offline operation making **part of the struct page is
mapped and the other part is not**.

>
> It smells like page_ext should use some mechanism during MEM_OFFLINE to
> synchronize against any users of its metadata. Generic memory offlining
> code might be the wrong place for that.
>
> page_ext needs a mechanism to synchronize against any users of the data
> it manages. Maybe RCU can help?

Let me give a thought about the feasibility of this. But this requires
making code at all the places where moving the page_ext users under
rcu_lock.