Prior to 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count")
the only place where page_owner could potentially go into recursion due to
its need of allocating more memory was in save_stack(), which ends up calling
into stackdepot code with the possibility of allocating memory.
We made sure to guard against that by signaling that the current task was
already in page_owner code, so in case a recursion attempt was made, we
could catch that and return dummy_handle.
After above commit, a new place in page_owner code was introduced where we
could allocate memory, meaning we could go into recursion would we take that
path.
Make sure to signal that we are in page_owner in that codepath as well.
Move the guard code into two helpers {un}set_current_in_page_owner()
and use them prior to calling in the two functions that might allocate
memory.
Signed-off-by: Oscar Salvador <[email protected]>
Fixes: 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count")
---
mm/page_owner.c | 30 +++++++++++++++++++++---------
1 file changed, 21 insertions(+), 9 deletions(-)
diff --git a/mm/page_owner.c b/mm/page_owner.c
index e96dd9092658..60663d657f7a 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -54,6 +54,22 @@ static depot_stack_handle_t early_handle;
static void init_early_allocated_pages(void);
+static inline void set_current_in_page_owner(void)
+{
+ /*
+ * Avoid recursion.
+ *
+ * We might need to allocate more memory from page_owner code, so make
+ * sure to signal it in order to avoid recursion.
+ */
+ current->in_page_owner = 1;
+}
+
+static inline void unset_current_in_page_owner(void)
+{
+ current->in_page_owner = 0;
+}
+
static int __init early_page_owner_param(char *buf)
{
int ret = kstrtobool(buf, &page_owner_enabled);
@@ -133,23 +149,16 @@ static noinline depot_stack_handle_t save_stack(gfp_t flags)
depot_stack_handle_t handle;
unsigned int nr_entries;
- /*
- * Avoid recursion.
- *
- * Sometimes page metadata allocation tracking requires more
- * memory to be allocated:
- * - when new stack trace is saved to stack depot
- */
if (current->in_page_owner)
return dummy_handle;
- current->in_page_owner = 1;
+ set_current_in_page_owner();
nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2);
handle = stack_depot_save(entries, nr_entries, flags);
if (!handle)
handle = failure_handle;
+ unset_current_in_page_owner();
- current->in_page_owner = 0;
return handle;
}
@@ -232,6 +241,7 @@ void __reset_page_owner(struct page *page, unsigned short order)
alloc_handle = page_owner->handle;
handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
+
for (i = 0; i < (1 << order); i++) {
__clear_bit(PAGE_EXT_OWNER_ALLOCATED, &page_ext->flags);
page_owner->free_handle = handle;
@@ -292,7 +302,9 @@ noinline void __set_page_owner(struct page *page, unsigned short order,
return;
__set_page_owner_handle(page_ext, handle, order, gfp_mask);
page_ext_put(page_ext);
+ set_current_in_page_owner();
inc_stack_record_count(handle, gfp_mask);
+ unset_current_in_page_owner();
}
void __set_page_owner_migrate_reason(struct page *page, int reason)
--
2.44.0
On Thu, Mar 14, 2024 at 12:42:45AM +0100, Oscar Salvador wrote:
> @@ -232,6 +241,7 @@ void __reset_page_owner(struct page *page, unsigned short order)
> alloc_handle = page_owner->handle;
>
> handle = save_stack(GFP_NOWAIT | __GFP_NOWARN);
> +
> for (i = 0; i < (1 << order); i++) {
Sigh, a last-minute unnoticed change.
@Andrew: Do you want me to send v2 fixing that up?
--
Oscar Salvador
SUSE Labs
On 2024/03/14 8:42, Oscar Salvador wrote:
> Prior to 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count")
> the only place where page_owner could potentially go into recursion due to
> its need of allocating more memory was in save_stack(), which ends up calling
> into stackdepot code with the possibility of allocating memory.
>
> We made sure to guard against that by signaling that the current task was
> already in page_owner code, so in case a recursion attempt was made, we
> could catch that and return dummy_handle.
>
> After above commit, a new place in page_owner code was introduced where we
> could allocate memory, meaning we could go into recursion would we take that
> path.
>
> Make sure to signal that we are in page_owner in that codepath as well.
> Move the guard code into two helpers {un}set_current_in_page_owner()
> and use them prior to calling in the two functions that might allocate
> memory.
>
> Signed-off-by: Oscar Salvador <[email protected]>
> Fixes: 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count")
Maybe culprit for a page owner refcount bug reported at
https://syzkaller.appspot.com/bug?id=8e4e66dfe299a2a00204ad220c641daaf1486a00 , for
that commit went to next-20240214 and syzbot started failing to test since next-20240215 ?
Please send this patch to linux-next.git as soon as possible (or can someone experiencing
this bug try booting linux-next.git with this patch applied, so that we can check whether
syzbot can resume testing linux-next.git), and then send to linux.git together (so that
various trees which depend on linux.git won't start failing to boot).
On Thu, Mar 14, 2024 at 12:01:24PM +0900, Tetsuo Handa wrote:
> Maybe culprit for a page owner refcount bug reported at
> https://syzkaller.appspot.com/bug?id=8e4e66dfe299a2a00204ad220c641daaf1486a00 , for
> that commit went to next-20240214 and syzbot started failing to test since next-20240215 ?
>
> Please send this patch to linux-next.git as soon as possible (or can someone experiencing
> this bug try booting linux-next.git with this patch applied, so that we can check whether
> syzbot can resume testing linux-next.git), and then send to linux.git together (so that
> various trees which depend on linux.git won't start failing to boot).
No, that is something else that I already started fixing a few days ago.
I think I will have the fix ready today.
--
Oscar Salvador
SUSE Labs
On Thu, Mar 14, 2024 at 06:47:43AM +0100, Oscar Salvador wrote:
> On Thu, Mar 14, 2024 at 12:01:24PM +0900, Tetsuo Handa wrote:
> > Maybe culprit for a page owner refcount bug reported at
> > https://syzkaller.appspot.com/bug?id=8e4e66dfe299a2a00204ad220c641daaf1486a00 , for
> > that commit went to next-20240214 and syzbot started failing to test since next-20240215 ?
> >
> > Please send this patch to linux-next.git as soon as possible (or can someone experiencing
> > this bug try booting linux-next.git with this patch applied, so that we can check whether
> > syzbot can resume testing linux-next.git), and then send to linux.git together (so that
> > various trees which depend on linux.git won't start failing to boot).
>
> No, that is something else that I already started fixing a few days ago.
> I think I will have the fix ready today.
I already have the fix. I will do some more testing and then I will send
it out.
Thanks
--
Oscar Salvador
SUSE Labs
On 3/14/24 00:42, Oscar Salvador wrote:
> Prior to 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count")
> the only place where page_owner could potentially go into recursion due to
> its need of allocating more memory was in save_stack(), which ends up calling
> into stackdepot code with the possibility of allocating memory.
>
> We made sure to guard against that by signaling that the current task was
> already in page_owner code, so in case a recursion attempt was made, we
> could catch that and return dummy_handle.
>
> After above commit, a new place in page_owner code was introduced where we
> could allocate memory, meaning we could go into recursion would we take that
> path.
>
> Make sure to signal that we are in page_owner in that codepath as well.
> Move the guard code into two helpers {un}set_current_in_page_owner()
> and use them prior to calling in the two functions that might allocate
> memory.
>
> Signed-off-by: Oscar Salvador <[email protected]>
> Fixes: 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count")
> ---
> mm/page_owner.c | 30 +++++++++++++++++++++---------
> 1 file changed, 21 insertions(+), 9 deletions(-)
>
> @@ -292,7 +302,9 @@ noinline void __set_page_owner(struct page *page, unsigned short order,
> return;
> __set_page_owner_handle(page_ext, handle, order, gfp_mask);
> page_ext_put(page_ext);
> + set_current_in_page_owner();
> inc_stack_record_count(handle, gfp_mask);
> + unset_current_in_page_owner();
This is because of the kmalloc() in add_stack_record_to_list() right? Why
not wrap just that then?
> }
>
> void __set_page_owner_migrate_reason(struct page *page, int reason)
On 2024/03/14 16:01, Oscar Salvador wrote:
> On Thu, Mar 14, 2024 at 06:47:43AM +0100, Oscar Salvador wrote:
>> On Thu, Mar 14, 2024 at 12:01:24PM +0900, Tetsuo Handa wrote:
>>> Maybe culprit for a page owner refcount bug reported at
>>> https://syzkaller.appspot.com/bug?id=8e4e66dfe299a2a00204ad220c641daaf1486a00 , for
>>> that commit went to next-20240214 and syzbot started failing to test since next-20240215 ?
>>>
>>> Please send this patch to linux-next.git as soon as possible (or can someone experiencing
>>> this bug try booting linux-next.git with this patch applied, so that we can check whether
>>> syzbot can resume testing linux-next.git), and then send to linux.git together (so that
>>> various trees which depend on linux.git won't start failing to boot).
>>
>> No, that is something else that I already started fixing a few days ago.
>> I think I will have the fix ready today.
>
> I already have the fix. I will do some more testing and then I will send
> it out.
OK. Please test your patch using
https://syzkaller.appspot.com/bug?extid=98c1a1753a0731df2dd4 .