Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20;
MIME-Version: 1.0
References: <20230328125818.5574-1-jaewon31.kim@samsung.com>
 <CABdmKX1J6WzE9CMbRthROgHZLLhXZJBw4iOz-7q+RK5fGpggLA@mail.gmail.com>
 <20230329031302epcms1p6afc9d9d8e92db6a39c29044606d21afc@epcms1p6>
 <CGME20230328125807epcas1p1606c068a9043d6581a1fbdd30e7c53a2@epcms1p7>
 <CABdmKX3c+qK6ekhujkH9fo8bNagmd-M=a=ZWF3HOq1C0EzHs8g@mail.gmail.com> <20230330004117epcms1p7cab95489135a39bf511f6b2cf958e41e@epcms1p7>
In-Reply-To: <20230330004117epcms1p7cab95489135a39bf511f6b2cf958e41e@epcms1p7>
From:   "T.J. Mercier" <tjmercier@google.com>
Date:   Thu, 30 Mar 2023 11:27:16 -0700
Message-ID: <CABdmKX1urNvWihM=9WVAwxMsR_Tp_HU1RkX66WJ+iry_LB8yHg@mail.gmail.com>
Subject: Re: [PATCH] dma-buf/heaps: c9e8440eca61 staging: ion: Fix overflow
 and list bugs in system heap:
To:     jaewon31.kim@samsung.com
Cc:     "jstultz@google.com" <jstultz@google.com>,
        "sumit.semwal@linaro.org" <sumit.semwal@linaro.org>,
        "daniel.vetter@ffwll.ch" <daniel.vetter@ffwll.ch>,
        "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
        "hannes@cmpxchg.org" <hannes@cmpxchg.org>,
        "mhocko@kernel.org" <mhocko@kernel.org>,
        "linux-mm@kvack.org" <linux-mm@kvack.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "jaewon31.kim@gmail.com" <jaewon31.kim@gmail.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Precedence: bulk

On Wed, Mar 29, 2023 at 5:41=E2=80=AFPM Jaewon Kim <jaewon31.kim@samsung.co=
m> wrote:
>
> >On Tue, Mar 28, 2023 at 8:13?PM Jaewon Kim <jaewon31.kim@samsung.com> wr=
ote:
> >>
> >> >On Tue, Mar 28, 2023 at 5:58?AM Jaewon Kim <jaewon31.kim@samsung.com>=
 wrote:
> >> >>
> >> >> Normal free:212600kB min:7664kB low:57100kB high:106536kB
> >> >>   reserved_highatomic:4096KB active_anon:276kB inactive_anon:180kB
> >> >>   active_file:1200kB inactive_file:0kB unevictable:2932kB
> >> >>   writepending:0kB present:4109312kB managed:3689488kB mlocked:2932=
kB
> >> >>   pagetables:13600kB bounce:0kB free_pcp:0kB local_pcp:0kB
> >> >>   free_cma:200844kB
> >> >> Out of memory and no killable processes...
> >> >> Kernel panic - not syncing: System is deadlocked on memory
> >> >>
> >> >> An OoM panic was reported, there were only native processes which a=
re
> >> >> non-killable as OOM_SCORE_ADJ_MIN.
> >> >>
> >> >> After looking into the dump, I've found the dma-buf system heap was
> >> >> trying to allocate a huge size. It seems to be a signed negative va=
lue.
> >> >>
> >> >> dma_heap_ioctl_allocate(inline)
> >> >>     |  heap_allocation =3D 0xFFFFFFC02247BD38 -> (
> >> >>     |    len =3D 0xFFFFFFFFE7225100,
> >> >>
> >> >> Actually the old ion system heap had policy which does not allow th=
at
> >> >> huge size with commit c9e8440eca61 ("staging: ion: Fix overflow and=
 list
> >> >> bugs in system heap"). We need this change again. Single allocation
> >> >> should not be bigger than half of all memory.
> >> >>
> >> >> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
> >> >> ---
> >> >>  drivers/dma-buf/heaps/system_heap.c | 3 +++
> >> >>  1 file changed, 3 insertions(+)
> >> >>
> >> >> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/=
heaps/system_heap.c
> >> >> index e8bd10e60998..4c1ef2ecfb0f 100644
> >> >> --- a/drivers/dma-buf/heaps/system_heap.c
> >> >> +++ b/drivers/dma-buf/heaps/system_heap.c
> >> >> @@ -351,6 +351,9 @@ static struct dma_buf *system_heap_allocate(str=
uct dma_heap *heap,
> >> >>         struct page *page, *tmp_page;
> >> >>         int i, ret =3D -ENOMEM;
> >> >>
> >> >> +       if (len / PAGE_SIZE > totalram_pages() / 2)
> >> >> +               return ERR_PTR(-ENOMEM);
> >> >> +
> >> >
> >> >Instead of policy like that, would __GFP_RETRY_MAYFAIL on the system
> >> >heap's LOW_ORDER_GFP flags also avoid the panic, and eventually fail
> >> >the allocation request?
> >>
> >> Hello T.J.
> >>
> >> Thank you for your opinion.
> >> The __GFP_RETRY_MAYFAIL on LOW_ORDER_GFP seems to work.
> >>
> >> page allocation failure: order:0, mode:0x144dc2(GFP_HIGHUSER|__GFP_RET=
RY_MAYFAIL|__GFP_COMP|__GFP_ZERO)
> >> Node 0 active_anon:120kB inactive_anon:43012kB active_file:36kB inacti=
ve_file:788kB
> >>
> >> I tried to test it, and the allocation stopped at very low file cache =
situation without OoM panic
> >> as we expected. The phone device was freezing for few seconds though.
> >>
> >> We can avoid OoM panic through either totalram_pages() / 2 check or __=
GFP_RETRY_MAYFAIL.
> >>
> >> But I think we still need the totalram_pages() / 2 check so that we do=
n't have to suffer
> >> the freezing in UX perspective. We may kill some critical processes or=
 users' recent apps.
> >>
> >> Regarding __GFP_RETRY_MAYFAIL, I think it will help us avoid OoM panic=
. But I'm worried
> >> about low memory devices which still need OoM kill to get memory like =
in camera scenarios.
> >>
> >> So what do you think?
> >>
> >Hey Jaewon, thanks for checking! The totalram_pages() / 2 just feels
> >somewhat arbitrary. On the lowest memory devices I'm aware of that use
> >the system heap it would take a single buffer on the order of several
> >hundred megabytes to exceed that, so I guess the simple check is fine
> >here until someone says they just can't live without a buffer that
> >big!
> >
> >Reviewed-by: T.J. Mercier <tjmercier@google.com>
>
> Hello T.J.
>
> Thank you for your Reviewed-by.
>
> I also think the totalram_pages() / 2 doesn't look perfect, but I think
> we need it.
>
> By the way I'm a little confused on a single buffer. Please help me to be=
 clear.
> Do you mean we may need to reconsider the totalram_pages() / 2 some day,
> if camera may request a huge memory for a single camera buffer? Then I ho=
pe
> the device has also huge total memory to support that high quality camera=
.
>
Right, it's only a problem if a very low memory device wants a very
large buffer. IDK why anyone would want to do that.

> And if possible, could you give your idea about __GFP_RETRY_MAYFAIL regar=
ding
> what I said? I think OoM kill doesn't seem to occur that often thanks to =
LMKD kill.
> And I also want to avoid OoM panic, so I'd like to apply it.

Yeah even with the totalram_pages() / 2 check, a process could trigger
the panic by consuming all available memory by allocating multiple
buffers. (As long as that allocating process doesn't get oom killed
first, and it allocates faster than LMKD can kill it.) So to prevent
users of the system heap from crashing the system, I think it's still
worth adding __GFP_RETRY_MAYFAIL.

> But what if there is a situation which still need OoM kill to get memory.=
 I just
> thought policy of __GFP_RETRY_MAYFAIL could be changed to allow OoM kill =
but return
> NULL when there was a victim process.

I'm not sure exactly what you mean here, but it might be nice to have
a way to allow oom kills but not panics if a victim can't be found
(and then fail the allocation request). Looks like that'd be possible
by changing alloc_pages to conditionally set oom_control->order =3D -1
for some new GFP flag, but not sure if that's worth it. As you
mentioned, that'd be a super slow allocation. So I don't think that's
a state we'd really want to be operating in.

> Thank you
> Jaewon Kim