2023-03-28 13:08:13

by Jaewon Kim

[permalink] [raw]
Subject: [PATCH] dma-buf/heaps: c9e8440eca61 staging: ion: Fix overflow and list bugs in system heap:

Normal free:212600kB min:7664kB low:57100kB high:106536kB
reserved_highatomic:4096KB active_anon:276kB inactive_anon:180kB
active_file:1200kB inactive_file:0kB unevictable:2932kB
writepending:0kB present:4109312kB managed:3689488kB mlocked:2932kB
pagetables:13600kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:200844kB
Out of memory and no killable processes...
Kernel panic - not syncing: System is deadlocked on memory

An OoM panic was reported, there were only native processes which are
non-killable as OOM_SCORE_ADJ_MIN.

After looking into the dump, I've found the dma-buf system heap was
trying to allocate a huge size. It seems to be a signed negative value.

dma_heap_ioctl_allocate(inline)
| heap_allocation = 0xFFFFFFC02247BD38 -> (
| len = 0xFFFFFFFFE7225100,

Actually the old ion system heap had policy which does not allow that
huge size with commit c9e8440eca61 ("staging: ion: Fix overflow and list
bugs in system heap"). We need this change again. Single allocation
should not be bigger than half of all memory.

Signed-off-by: Jaewon Kim <[email protected]>
---
drivers/dma-buf/heaps/system_heap.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index e8bd10e60998..4c1ef2ecfb0f 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -351,6 +351,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
struct page *page, *tmp_page;
int i, ret = -ENOMEM;

+ if (len / PAGE_SIZE > totalram_pages() / 2)
+ return ERR_PTR(-ENOMEM);
+
buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
if (!buffer)
return ERR_PTR(-ENOMEM);
--
2.17.1


2023-03-28 18:39:21

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH] dma-buf/heaps: c9e8440eca61 staging: ion: Fix overflow and list bugs in system heap:

On Tue, Mar 28, 2023 at 5:58 AM Jaewon Kim <[email protected]> wrote:
>
> Normal free:212600kB min:7664kB low:57100kB high:106536kB
> reserved_highatomic:4096KB active_anon:276kB inactive_anon:180kB
> active_file:1200kB inactive_file:0kB unevictable:2932kB
> writepending:0kB present:4109312kB managed:3689488kB mlocked:2932kB
> pagetables:13600kB bounce:0kB free_pcp:0kB local_pcp:0kB
> free_cma:200844kB
> Out of memory and no killable processes...
> Kernel panic - not syncing: System is deadlocked on memory
>
> An OoM panic was reported, there were only native processes which are
> non-killable as OOM_SCORE_ADJ_MIN.
>
> After looking into the dump, I've found the dma-buf system heap was
> trying to allocate a huge size. It seems to be a signed negative value.
>
> dma_heap_ioctl_allocate(inline)
> | heap_allocation = 0xFFFFFFC02247BD38 -> (
> | len = 0xFFFFFFFFE7225100,
>
> Actually the old ion system heap had policy which does not allow that
> huge size with commit c9e8440eca61 ("staging: ion: Fix overflow and list
> bugs in system heap"). We need this change again. Single allocation
> should not be bigger than half of all memory.
>
> Signed-off-by: Jaewon Kim <[email protected]>

Hey,
Thanks so much for sending this out! Looks reasonable to me, the
only issue is the commit subject line could be a bit better.

Maybe instead:
"dma-buf/heaps: system_heap: Avoid DoS by limiting single
allocations to half of all memory"

Otherwise,
Acked-by: John Stultz <[email protected]>

thanks
-john

2023-03-28 18:40:51

by T.J. Mercier

[permalink] [raw]
Subject: Re: [PATCH] dma-buf/heaps: c9e8440eca61 staging: ion: Fix overflow and list bugs in system heap:

On Tue, Mar 28, 2023 at 5:58 AM Jaewon Kim <[email protected]> wrote:
>
> Normal free:212600kB min:7664kB low:57100kB high:106536kB
> reserved_highatomic:4096KB active_anon:276kB inactive_anon:180kB
> active_file:1200kB inactive_file:0kB unevictable:2932kB
> writepending:0kB present:4109312kB managed:3689488kB mlocked:2932kB
> pagetables:13600kB bounce:0kB free_pcp:0kB local_pcp:0kB
> free_cma:200844kB
> Out of memory and no killable processes...
> Kernel panic - not syncing: System is deadlocked on memory
>
> An OoM panic was reported, there were only native processes which are
> non-killable as OOM_SCORE_ADJ_MIN.
>
> After looking into the dump, I've found the dma-buf system heap was
> trying to allocate a huge size. It seems to be a signed negative value.
>
> dma_heap_ioctl_allocate(inline)
> | heap_allocation = 0xFFFFFFC02247BD38 -> (
> | len = 0xFFFFFFFFE7225100,
>
> Actually the old ion system heap had policy which does not allow that
> huge size with commit c9e8440eca61 ("staging: ion: Fix overflow and list
> bugs in system heap"). We need this change again. Single allocation
> should not be bigger than half of all memory.
>
> Signed-off-by: Jaewon Kim <[email protected]>
> ---
> drivers/dma-buf/heaps/system_heap.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> index e8bd10e60998..4c1ef2ecfb0f 100644
> --- a/drivers/dma-buf/heaps/system_heap.c
> +++ b/drivers/dma-buf/heaps/system_heap.c
> @@ -351,6 +351,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
> struct page *page, *tmp_page;
> int i, ret = -ENOMEM;
>
> + if (len / PAGE_SIZE > totalram_pages() / 2)
> + return ERR_PTR(-ENOMEM);
> +

Instead of policy like that, would __GFP_RETRY_MAYFAIL on the system
heap's LOW_ORDER_GFP flags also avoid the panic, and eventually fail
the allocation request?

2023-03-29 02:49:59

by Jaewon Kim

[permalink] [raw]
Subject: RE: [PATCH] dma-buf/heaps: c9e8440eca61 staging: ion: Fix overflow and list bugs in system heap:

>
>
>--------- Original Message ---------
>Sender : John Stultz <[email protected]>
>Date : 2023-03-29 03:26 (GMT+9)
>Title : Re: [PATCH] dma-buf/heaps: c9e8440eca61 staging: ion: Fix overflow and list bugs in system heap:
>
>On Tue, Mar 28, 2023 at 5:58 AM Jaewon Kim <[email protected]> wrote:
>>
>> Normal free:212600kB min:7664kB low:57100kB high:106536kB
>> reserved_highatomic:4096KB active_anon:276kB inactive_anon:180kB
>> active_file:1200kB inactive_file:0kB unevictable:2932kB
>> writepending:0kB present:4109312kB managed:3689488kB mlocked:2932kB
>> pagetables:13600kB bounce:0kB free_pcp:0kB local_pcp:0kB
>> free_cma:200844kB
>> Out of memory and no killable processes...
>> Kernel panic - not syncing: System is deadlocked on memory
>>
>> An OoM panic was reported, there were only native processes which are
>> non-killable as OOM_SCORE_ADJ_MIN.
>>
>> After looking into the dump, I've found the dma-buf system heap was
>> trying to allocate a huge size. It seems to be a signed negative value.
>>
>> dma_heap_ioctl_allocate(inline)
>> | heap_allocation = 0xFFFFFFC02247BD38 -> (
>> | len = 0xFFFFFFFFE7225100,
>>
>> Actually the old ion system heap had policy which does not allow that
>> huge size with commit c9e8440eca61 ("staging: ion: Fix overflow and list
>> bugs in system heap"). We need this change again. Single allocation
>> should not be bigger than half of all memory.
>>
>> Signed-off-by: Jaewon Kim <[email protected]>
>
>Hey,
> Thanks so much for sending this out! Looks reasonable to me, the
>only issue is the commit subject line could be a bit better.
>
>Maybe instead:
> "dma-buf/heaps: system_heap: Avoid DoS by limiting single
>allocations to half of all memory"
>
>Otherwise,
>Acked-by: John Stultz <[email protected]>
>
>thanks
>-john
>

Hello John

Thank you for your Acked. and the subject thing.
Maybe I was in a hurry, I did not check it.

I am going to take yours

"dma-buf/heaps: system_heap: Avoid DoS by limiting single
allocations to half of all memory"

By the way let me talk with T.J.

Thank you
Jaewon Kim

2023-03-29 03:37:33

by Jaewon Kim

[permalink] [raw]
Subject: RE: [PATCH] dma-buf/heaps: c9e8440eca61 staging: ion: Fix overflow and list bugs in system heap:

>On Tue, Mar 28, 2023 at 5:58?AM Jaewon Kim <[email protected]> wrote:
>>
>> Normal free:212600kB min:7664kB low:57100kB high:106536kB
>> reserved_highatomic:4096KB active_anon:276kB inactive_anon:180kB
>> active_file:1200kB inactive_file:0kB unevictable:2932kB
>> writepending:0kB present:4109312kB managed:3689488kB mlocked:2932kB
>> pagetables:13600kB bounce:0kB free_pcp:0kB local_pcp:0kB
>> free_cma:200844kB
>> Out of memory and no killable processes...
>> Kernel panic - not syncing: System is deadlocked on memory
>>
>> An OoM panic was reported, there were only native processes which are
>> non-killable as OOM_SCORE_ADJ_MIN.
>>
>> After looking into the dump, I've found the dma-buf system heap was
>> trying to allocate a huge size. It seems to be a signed negative value.
>>
>> dma_heap_ioctl_allocate(inline)
>> | heap_allocation = 0xFFFFFFC02247BD38 -> (
>> | len = 0xFFFFFFFFE7225100,
>>
>> Actually the old ion system heap had policy which does not allow that
>> huge size with commit c9e8440eca61 ("staging: ion: Fix overflow and list
>> bugs in system heap"). We need this change again. Single allocation
>> should not be bigger than half of all memory.
>>
>> Signed-off-by: Jaewon Kim <[email protected]>
>> ---
>> drivers/dma-buf/heaps/system_heap.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
>> index e8bd10e60998..4c1ef2ecfb0f 100644
>> --- a/drivers/dma-buf/heaps/system_heap.c
>> +++ b/drivers/dma-buf/heaps/system_heap.c
>> @@ -351,6 +351,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
>> struct page *page, *tmp_page;
>> int i, ret = -ENOMEM;
>>
>> + if (len / PAGE_SIZE > totalram_pages() / 2)
>> + return ERR_PTR(-ENOMEM);
>> +
>
>Instead of policy like that, would __GFP_RETRY_MAYFAIL on the system
>heap's LOW_ORDER_GFP flags also avoid the panic, and eventually fail
>the allocation request?

Hello T.J.

Thank you for your opinion.
The __GFP_RETRY_MAYFAIL on LOW_ORDER_GFP seems to work.

page allocation failure: order:0, mode:0x144dc2(GFP_HIGHUSER|__GFP_RETRY_MAYFAIL|__GFP_COMP|__GFP_ZERO)
Node 0 active_anon:120kB inactive_anon:43012kB active_file:36kB inactive_file:788kB

I tried to test it, and the allocation stopped at very low file cache situation without OoM panic
as we expected. The phone device was freezing for few seconds though.

We can avoid OoM panic through either totalram_pages() / 2 check or __GFP_RETRY_MAYFAIL.

But I think we still need the totalram_pages() / 2 check so that we don't have to suffer
the freezing in UX perspective. We may kill some critical processes or users' recent apps.

Regarding __GFP_RETRY_MAYFAIL, I think it will help us avoid OoM panic. But I'm worried
about low memory devices which still need OoM kill to get memory like in camera scenarios.

So what do you think?

Thank you
Jaewon Kim

2023-03-29 17:00:18

by T.J. Mercier

[permalink] [raw]
Subject: Re: [PATCH] dma-buf/heaps: c9e8440eca61 staging: ion: Fix overflow and list bugs in system heap:

On Tue, Mar 28, 2023 at 8:13 PM Jaewon Kim <[email protected]> wrote:
>
> >On Tue, Mar 28, 2023 at 5:58?AM Jaewon Kim <[email protected]> wrote:
> >>
> >> Normal free:212600kB min:7664kB low:57100kB high:106536kB
> >> reserved_highatomic:4096KB active_anon:276kB inactive_anon:180kB
> >> active_file:1200kB inactive_file:0kB unevictable:2932kB
> >> writepending:0kB present:4109312kB managed:3689488kB mlocked:2932kB
> >> pagetables:13600kB bounce:0kB free_pcp:0kB local_pcp:0kB
> >> free_cma:200844kB
> >> Out of memory and no killable processes...
> >> Kernel panic - not syncing: System is deadlocked on memory
> >>
> >> An OoM panic was reported, there were only native processes which are
> >> non-killable as OOM_SCORE_ADJ_MIN.
> >>
> >> After looking into the dump, I've found the dma-buf system heap was
> >> trying to allocate a huge size. It seems to be a signed negative value.
> >>
> >> dma_heap_ioctl_allocate(inline)
> >> | heap_allocation = 0xFFFFFFC02247BD38 -> (
> >> | len = 0xFFFFFFFFE7225100,
> >>
> >> Actually the old ion system heap had policy which does not allow that
> >> huge size with commit c9e8440eca61 ("staging: ion: Fix overflow and list
> >> bugs in system heap"). We need this change again. Single allocation
> >> should not be bigger than half of all memory.
> >>
> >> Signed-off-by: Jaewon Kim <[email protected]>
> >> ---
> >> drivers/dma-buf/heaps/system_heap.c | 3 +++
> >> 1 file changed, 3 insertions(+)
> >>
> >> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> >> index e8bd10e60998..4c1ef2ecfb0f 100644
> >> --- a/drivers/dma-buf/heaps/system_heap.c
> >> +++ b/drivers/dma-buf/heaps/system_heap.c
> >> @@ -351,6 +351,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
> >> struct page *page, *tmp_page;
> >> int i, ret = -ENOMEM;
> >>
> >> + if (len / PAGE_SIZE > totalram_pages() / 2)
> >> + return ERR_PTR(-ENOMEM);
> >> +
> >
> >Instead of policy like that, would __GFP_RETRY_MAYFAIL on the system
> >heap's LOW_ORDER_GFP flags also avoid the panic, and eventually fail
> >the allocation request?
>
> Hello T.J.
>
> Thank you for your opinion.
> The __GFP_RETRY_MAYFAIL on LOW_ORDER_GFP seems to work.
>
> page allocation failure: order:0, mode:0x144dc2(GFP_HIGHUSER|__GFP_RETRY_MAYFAIL|__GFP_COMP|__GFP_ZERO)
> Node 0 active_anon:120kB inactive_anon:43012kB active_file:36kB inactive_file:788kB
>
> I tried to test it, and the allocation stopped at very low file cache situation without OoM panic
> as we expected. The phone device was freezing for few seconds though.
>
> We can avoid OoM panic through either totalram_pages() / 2 check or __GFP_RETRY_MAYFAIL.
>
> But I think we still need the totalram_pages() / 2 check so that we don't have to suffer
> the freezing in UX perspective. We may kill some critical processes or users' recent apps.
>
> Regarding __GFP_RETRY_MAYFAIL, I think it will help us avoid OoM panic. But I'm worried
> about low memory devices which still need OoM kill to get memory like in camera scenarios.
>
> So what do you think?
>
Hey Jaewon, thanks for checking! The totalram_pages() / 2 just feels
somewhat arbitrary. On the lowest memory devices I'm aware of that use
the system heap it would take a single buffer on the order of several
hundred megabytes to exceed that, so I guess the simple check is fine
here until someone says they just can't live without a buffer that
big!

Reviewed-by: T.J. Mercier <[email protected]>

2023-03-30 01:05:00

by Jaewon Kim

[permalink] [raw]
Subject: RE: [PATCH] dma-buf/heaps: c9e8440eca61 staging: ion: Fix overflow and list bugs in system heap:

>On Tue, Mar 28, 2023 at 8:13?PM Jaewon Kim <[email protected]> wrote:
>>
>> >On Tue, Mar 28, 2023 at 5:58?AM Jaewon Kim <[email protected]> wrote:
>> >>
>> >> Normal free:212600kB min:7664kB low:57100kB high:106536kB
>> >> reserved_highatomic:4096KB active_anon:276kB inactive_anon:180kB
>> >> active_file:1200kB inactive_file:0kB unevictable:2932kB
>> >> writepending:0kB present:4109312kB managed:3689488kB mlocked:2932kB
>> >> pagetables:13600kB bounce:0kB free_pcp:0kB local_pcp:0kB
>> >> free_cma:200844kB
>> >> Out of memory and no killable processes...
>> >> Kernel panic - not syncing: System is deadlocked on memory
>> >>
>> >> An OoM panic was reported, there were only native processes which are
>> >> non-killable as OOM_SCORE_ADJ_MIN.
>> >>
>> >> After looking into the dump, I've found the dma-buf system heap was
>> >> trying to allocate a huge size. It seems to be a signed negative value.
>> >>
>> >> dma_heap_ioctl_allocate(inline)
>> >> | heap_allocation = 0xFFFFFFC02247BD38 -> (
>> >> | len = 0xFFFFFFFFE7225100,
>> >>
>> >> Actually the old ion system heap had policy which does not allow that
>> >> huge size with commit c9e8440eca61 ("staging: ion: Fix overflow and list
>> >> bugs in system heap"). We need this change again. Single allocation
>> >> should not be bigger than half of all memory.
>> >>
>> >> Signed-off-by: Jaewon Kim <[email protected]>
>> >> ---
>> >> drivers/dma-buf/heaps/system_heap.c | 3 +++
>> >> 1 file changed, 3 insertions(+)
>> >>
>> >> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
>> >> index e8bd10e60998..4c1ef2ecfb0f 100644
>> >> --- a/drivers/dma-buf/heaps/system_heap.c
>> >> +++ b/drivers/dma-buf/heaps/system_heap.c
>> >> @@ -351,6 +351,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
>> >> struct page *page, *tmp_page;
>> >> int i, ret = -ENOMEM;
>> >>
>> >> + if (len / PAGE_SIZE > totalram_pages() / 2)
>> >> + return ERR_PTR(-ENOMEM);
>> >> +
>> >
>> >Instead of policy like that, would __GFP_RETRY_MAYFAIL on the system
>> >heap's LOW_ORDER_GFP flags also avoid the panic, and eventually fail
>> >the allocation request?
>>
>> Hello T.J.
>>
>> Thank you for your opinion.
>> The __GFP_RETRY_MAYFAIL on LOW_ORDER_GFP seems to work.
>>
>> page allocation failure: order:0, mode:0x144dc2(GFP_HIGHUSER|__GFP_RETRY_MAYFAIL|__GFP_COMP|__GFP_ZERO)
>> Node 0 active_anon:120kB inactive_anon:43012kB active_file:36kB inactive_file:788kB
>>
>> I tried to test it, and the allocation stopped at very low file cache situation without OoM panic
>> as we expected. The phone device was freezing for few seconds though.
>>
>> We can avoid OoM panic through either totalram_pages() / 2 check or __GFP_RETRY_MAYFAIL.
>>
>> But I think we still need the totalram_pages() / 2 check so that we don't have to suffer
>> the freezing in UX perspective. We may kill some critical processes or users' recent apps.
>>
>> Regarding __GFP_RETRY_MAYFAIL, I think it will help us avoid OoM panic. But I'm worried
>> about low memory devices which still need OoM kill to get memory like in camera scenarios.
>>
>> So what do you think?
>>
>Hey Jaewon, thanks for checking! The totalram_pages() / 2 just feels
>somewhat arbitrary. On the lowest memory devices I'm aware of that use
>the system heap it would take a single buffer on the order of several
>hundred megabytes to exceed that, so I guess the simple check is fine
>here until someone says they just can't live without a buffer that
>big!
>
>Reviewed-by: T.J. Mercier <[email protected]>

Hello T.J.

Thank you for your Reviewed-by.

I also think the totalram_pages() / 2 doesn't look perfect, but I think
we need it.

By the way I'm a little confused on a single buffer. Please help me to be clear.
Do you mean we may need to reconsider the totalram_pages() / 2 some day,
if camera may request a huge memory for a single camera buffer? Then I hope
the device has also huge total memory to support that high quality camera.

And if possible, could you give your idea about __GFP_RETRY_MAYFAIL regarding
what I said? I think OoM kill doesn't seem to occur that often thanks to LMKD kill.
And I also want to avoid OoM panic, so I'd like to apply it.
But what if there is a situation which still need OoM kill to get memory. I just
thought policy of __GFP_RETRY_MAYFAIL could be changed to allow OoM kill but return
NULL when there was a victim process.

Thank you
Jaewon Kim

2023-03-30 18:37:54

by T.J. Mercier

[permalink] [raw]
Subject: Re: [PATCH] dma-buf/heaps: c9e8440eca61 staging: ion: Fix overflow and list bugs in system heap:

On Wed, Mar 29, 2023 at 5:41 PM Jaewon Kim <[email protected]> wrote:
>
> >On Tue, Mar 28, 2023 at 8:13?PM Jaewon Kim <[email protected]> wrote:
> >>
> >> >On Tue, Mar 28, 2023 at 5:58?AM Jaewon Kim <[email protected]> wrote:
> >> >>
> >> >> Normal free:212600kB min:7664kB low:57100kB high:106536kB
> >> >> reserved_highatomic:4096KB active_anon:276kB inactive_anon:180kB
> >> >> active_file:1200kB inactive_file:0kB unevictable:2932kB
> >> >> writepending:0kB present:4109312kB managed:3689488kB mlocked:2932kB
> >> >> pagetables:13600kB bounce:0kB free_pcp:0kB local_pcp:0kB
> >> >> free_cma:200844kB
> >> >> Out of memory and no killable processes...
> >> >> Kernel panic - not syncing: System is deadlocked on memory
> >> >>
> >> >> An OoM panic was reported, there were only native processes which are
> >> >> non-killable as OOM_SCORE_ADJ_MIN.
> >> >>
> >> >> After looking into the dump, I've found the dma-buf system heap was
> >> >> trying to allocate a huge size. It seems to be a signed negative value.
> >> >>
> >> >> dma_heap_ioctl_allocate(inline)
> >> >> | heap_allocation = 0xFFFFFFC02247BD38 -> (
> >> >> | len = 0xFFFFFFFFE7225100,
> >> >>
> >> >> Actually the old ion system heap had policy which does not allow that
> >> >> huge size with commit c9e8440eca61 ("staging: ion: Fix overflow and list
> >> >> bugs in system heap"). We need this change again. Single allocation
> >> >> should not be bigger than half of all memory.
> >> >>
> >> >> Signed-off-by: Jaewon Kim <[email protected]>
> >> >> ---
> >> >> drivers/dma-buf/heaps/system_heap.c | 3 +++
> >> >> 1 file changed, 3 insertions(+)
> >> >>
> >> >> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> >> >> index e8bd10e60998..4c1ef2ecfb0f 100644
> >> >> --- a/drivers/dma-buf/heaps/system_heap.c
> >> >> +++ b/drivers/dma-buf/heaps/system_heap.c
> >> >> @@ -351,6 +351,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
> >> >> struct page *page, *tmp_page;
> >> >> int i, ret = -ENOMEM;
> >> >>
> >> >> + if (len / PAGE_SIZE > totalram_pages() / 2)
> >> >> + return ERR_PTR(-ENOMEM);
> >> >> +
> >> >
> >> >Instead of policy like that, would __GFP_RETRY_MAYFAIL on the system
> >> >heap's LOW_ORDER_GFP flags also avoid the panic, and eventually fail
> >> >the allocation request?
> >>
> >> Hello T.J.
> >>
> >> Thank you for your opinion.
> >> The __GFP_RETRY_MAYFAIL on LOW_ORDER_GFP seems to work.
> >>
> >> page allocation failure: order:0, mode:0x144dc2(GFP_HIGHUSER|__GFP_RETRY_MAYFAIL|__GFP_COMP|__GFP_ZERO)
> >> Node 0 active_anon:120kB inactive_anon:43012kB active_file:36kB inactive_file:788kB
> >>
> >> I tried to test it, and the allocation stopped at very low file cache situation without OoM panic
> >> as we expected. The phone device was freezing for few seconds though.
> >>
> >> We can avoid OoM panic through either totalram_pages() / 2 check or __GFP_RETRY_MAYFAIL.
> >>
> >> But I think we still need the totalram_pages() / 2 check so that we don't have to suffer
> >> the freezing in UX perspective. We may kill some critical processes or users' recent apps.
> >>
> >> Regarding __GFP_RETRY_MAYFAIL, I think it will help us avoid OoM panic. But I'm worried
> >> about low memory devices which still need OoM kill to get memory like in camera scenarios.
> >>
> >> So what do you think?
> >>
> >Hey Jaewon, thanks for checking! The totalram_pages() / 2 just feels
> >somewhat arbitrary. On the lowest memory devices I'm aware of that use
> >the system heap it would take a single buffer on the order of several
> >hundred megabytes to exceed that, so I guess the simple check is fine
> >here until someone says they just can't live without a buffer that
> >big!
> >
> >Reviewed-by: T.J. Mercier <[email protected]>
>
> Hello T.J.
>
> Thank you for your Reviewed-by.
>
> I also think the totalram_pages() / 2 doesn't look perfect, but I think
> we need it.
>
> By the way I'm a little confused on a single buffer. Please help me to be clear.
> Do you mean we may need to reconsider the totalram_pages() / 2 some day,
> if camera may request a huge memory for a single camera buffer? Then I hope
> the device has also huge total memory to support that high quality camera.
>
Right, it's only a problem if a very low memory device wants a very
large buffer. IDK why anyone would want to do that.

> And if possible, could you give your idea about __GFP_RETRY_MAYFAIL regarding
> what I said? I think OoM kill doesn't seem to occur that often thanks to LMKD kill.
> And I also want to avoid OoM panic, so I'd like to apply it.

Yeah even with the totalram_pages() / 2 check, a process could trigger
the panic by consuming all available memory by allocating multiple
buffers. (As long as that allocating process doesn't get oom killed
first, and it allocates faster than LMKD can kill it.) So to prevent
users of the system heap from crashing the system, I think it's still
worth adding __GFP_RETRY_MAYFAIL.

> But what if there is a situation which still need OoM kill to get memory. I just
> thought policy of __GFP_RETRY_MAYFAIL could be changed to allow OoM kill but return
> NULL when there was a victim process.

I'm not sure exactly what you mean here, but it might be nice to have
a way to allow oom kills but not panics if a victim can't be found
(and then fail the allocation request). Looks like that'd be possible
by changing alloc_pages to conditionally set oom_control->order = -1
for some new GFP flag, but not sure if that's worth it. As you
mentioned, that'd be a super slow allocation. So I don't think that's
a state we'd really want to be operating in.

> Thank you
> Jaewon Kim

2023-03-31 00:56:50

by Jaewon Kim

[permalink] [raw]
Subject: RE: [PATCH] dma-buf/heaps: c9e8440eca61 staging: ion: Fix overflow and list bugs in system heap:

>On Wed, Mar 29, 2023 at 5:41?PM Jaewon Kim <[email protected]> wrote:
>>
>> >On Tue, Mar 28, 2023 at 8:13?PM Jaewon Kim <[email protected]> wrote:
>> >>
>> >> >On Tue, Mar 28, 2023 at 5:58?AM Jaewon Kim <[email protected]> wrote:
>> >> >>
>> >> >> Normal free:212600kB min:7664kB low:57100kB high:106536kB
>> >> >> reserved_highatomic:4096KB active_anon:276kB inactive_anon:180kB
>> >> >> active_file:1200kB inactive_file:0kB unevictable:2932kB
>> >> >> writepending:0kB present:4109312kB managed:3689488kB mlocked:2932kB
>> >> >> pagetables:13600kB bounce:0kB free_pcp:0kB local_pcp:0kB
>> >> >> free_cma:200844kB
>> >> >> Out of memory and no killable processes...
>> >> >> Kernel panic - not syncing: System is deadlocked on memory
>> >> >>
>> >> >> An OoM panic was reported, there were only native processes which are
>> >> >> non-killable as OOM_SCORE_ADJ_MIN.
>> >> >>
>> >> >> After looking into the dump, I've found the dma-buf system heap was
>> >> >> trying to allocate a huge size. It seems to be a signed negative value.
>> >> >>
>> >> >> dma_heap_ioctl_allocate(inline)
>> >> >> | heap_allocation = 0xFFFFFFC02247BD38 -> (
>> >> >> | len = 0xFFFFFFFFE7225100,
>> >> >>
>> >> >> Actually the old ion system heap had policy which does not allow that
>> >> >> huge size with commit c9e8440eca61 ("staging: ion: Fix overflow and list
>> >> >> bugs in system heap"). We need this change again. Single allocation
>> >> >> should not be bigger than half of all memory.
>> >> >>
>> >> >> Signed-off-by: Jaewon Kim <[email protected]>
>> >> >> ---
>> >> >> drivers/dma-buf/heaps/system_heap.c | 3 +++
>> >> >> 1 file changed, 3 insertions(+)
>> >> >>
>> >> >> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
>> >> >> index e8bd10e60998..4c1ef2ecfb0f 100644
>> >> >> --- a/drivers/dma-buf/heaps/system_heap.c
>> >> >> +++ b/drivers/dma-buf/heaps/system_heap.c
>> >> >> @@ -351,6 +351,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
>> >> >> struct page *page, *tmp_page;
>> >> >> int i, ret = -ENOMEM;
>> >> >>
>> >> >> + if (len / PAGE_SIZE > totalram_pages() / 2)
>> >> >> + return ERR_PTR(-ENOMEM);
>> >> >> +
>> >> >
>> >> >Instead of policy like that, would __GFP_RETRY_MAYFAIL on the system
>> >> >heap's LOW_ORDER_GFP flags also avoid the panic, and eventually fail
>> >> >the allocation request?
>> >>
>> >> Hello T.J.
>> >>
>> >> Thank you for your opinion.
>> >> The __GFP_RETRY_MAYFAIL on LOW_ORDER_GFP seems to work.
>> >>
>> >> page allocation failure: order:0, mode:0x144dc2(GFP_HIGHUSER|__GFP_RETRY_MAYFAIL|__GFP_COMP|__GFP_ZERO)
>> >> Node 0 active_anon:120kB inactive_anon:43012kB active_file:36kB inactive_file:788kB
>> >>
>> >> I tried to test it, and the allocation stopped at very low file cache situation without OoM panic
>> >> as we expected. The phone device was freezing for few seconds though.
>> >>
>> >> We can avoid OoM panic through either totalram_pages() / 2 check or __GFP_RETRY_MAYFAIL.
>> >>
>> >> But I think we still need the totalram_pages() / 2 check so that we don't have to suffer
>> >> the freezing in UX perspective. We may kill some critical processes or users' recent apps.
>> >>
>> >> Regarding __GFP_RETRY_MAYFAIL, I think it will help us avoid OoM panic. But I'm worried
>> >> about low memory devices which still need OoM kill to get memory like in camera scenarios.
>> >>
>> >> So what do you think?
>> >>
>> >Hey Jaewon, thanks for checking! The totalram_pages() / 2 just feels
>> >somewhat arbitrary. On the lowest memory devices I'm aware of that use
>> >the system heap it would take a single buffer on the order of several
>> >hundred megabytes to exceed that, so I guess the simple check is fine
>> >here until someone says they just can't live without a buffer that
>> >big!
>> >
>> >Reviewed-by: T.J. Mercier <[email protected]>
>>
>> Hello T.J.
>>
>> Thank you for your Reviewed-by.
>>
>> I also think the totalram_pages() / 2 doesn't look perfect, but I think
>> we need it.
>>
>> By the way I'm a little confused on a single buffer. Please help me to be clear.
>> Do you mean we may need to reconsider the totalram_pages() / 2 some day,
>> if camera may request a huge memory for a single camera buffer? Then I hope
>> the device has also huge total memory to support that high quality camera.
>>
>Right, it's only a problem if a very low memory device wants a very
>large buffer. IDK why anyone would want to do that.
>
>> And if possible, could you give your idea about __GFP_RETRY_MAYFAIL regarding
>> what I said? I think OoM kill doesn't seem to occur that often thanks to LMKD kill.
>> And I also want to avoid OoM panic, so I'd like to apply it.
>
>Yeah even with the totalram_pages() / 2 check, a process could trigger
>the panic by consuming all available memory by allocating multiple
>buffers. (As long as that allocating process doesn't get oom killed
>first, and it allocates faster than LMKD can kill it.) So to prevent
>users of the system heap from crashing the system, I think it's still
>worth adding __GFP_RETRY_MAYFAIL.
>

Correct, I exactly think like you. I'd love to apply __GFP_RETRY_MAYFAIL
to this system heap in memory management perspective. But I think we are
not ready to apply because of that oom-kill-needed situations. Let me just
apply totalram_pages() / 2 this time. And I hope we have chance to discuss
this __GFP_RETRY_MAYFAIL some day later. I'm going to keep thinking about
it with monitoring this situations.

>> But what if there is a situation which still need OoM kill to get memory. I just
>> thought policy of __GFP_RETRY_MAYFAIL could be changed to allow OoM kill but return
>> NULL when there was a victim process.
>
>I'm not sure exactly what you mean here, but it might be nice to have
>a way to allow oom kills but not panics if a victim can't be found
>(and then fail the allocation request). Looks like that'd be possible
>by changing alloc_pages to conditionally set oom_control->order = -1
>for some new GFP flag, but not sure if that's worth it. As you
>mentioned, that'd be a super slow allocation. So I don't think that's
>a state we'd really want to be operating in.
>

Oh sorry I missed 'not', let me correct like 'when there was NOT a victim
process.' Anyway you already got what I meant. And I found the code
is_sysrq_oom regarding oom_control->order = -1. Yes it could be possible,
but as you said, I think we don't really want that so far. We need to think
more.

It was great discussion, I've apprecited it.

>> Thank you
>> Jaewon Kim