2023-08-21 16:38:14

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard

On Thu, Aug 17, 2023 at 11:05:22PM -0700, Chris Li wrote:
> In this patch series I want to safeguard
> the free_pcppage_bulk against change in the
> pcp->count outside of this function. e.g.
> by BPF program inject on the function tracepoint.
>
> I break up the patches into two seperate patches
> for the safeguard and clean up.
>
> Hopefully that is easier to review.
>
> Signed-off-by: Chris Li <[email protected]>

This sounds like a maintenance nightmare if internal state can be arbitrary
modified by a BPF program and still expected to work properly in all cases.
Every review would have to take into account "what if a BPF script modifies
state behind our back?"

--
Mel Gorman
SUSE Labs


2023-08-22 02:53:48

by Kemeng Shi

[permalink] [raw]
Subject: Re: [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard



on 8/21/2023 6:32 PM, Mel Gorman wrote:
> On Thu, Aug 17, 2023 at 11:05:22PM -0700, Chris Li wrote:
>> In this patch series I want to safeguard
>> the free_pcppage_bulk against change in the
>> pcp->count outside of this function. e.g.
>> by BPF program inject on the function tracepoint.
>>
>> I break up the patches into two seperate patches
>> for the safeguard and clean up.
>>
>> Hopefully that is easier to review.
>>
>> Signed-off-by: Chris Li <[email protected]>
>
> This sounds like a maintenance nightmare if internal state can be arbitrary
> modified by a BPF program and still expected to work properly in all cases.
> Every review would have to take into account "what if a BPF script modifies
> state behind our back?"
>
Agreed. We assume pcp->count is protected by pcp->lock. Instead of make code
work in case pcp->count could be changed without lock held, it's more reasonble
to modify pcp->count with pcp->lock held in BPF program.


2023-08-22 17:52:06

by Chris Li

[permalink] [raw]
Subject: Re: [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard

Hi Mel,

Adding Alexei to the discussion.

On Mon, Aug 21, 2023 at 3:32 AM Mel Gorman <[email protected]> wrote:
>
> On Thu, Aug 17, 2023 at 11:05:22PM -0700, Chris Li wrote:
> > In this patch series I want to safeguard
> > the free_pcppage_bulk against change in the
> > pcp->count outside of this function. e.g.
> > by BPF program inject on the function tracepoint.
> >
> > I break up the patches into two seperate patches
> > for the safeguard and clean up.
> >
> > Hopefully that is easier to review.
> >
> > Signed-off-by: Chris Li <[email protected]>
>
> This sounds like a maintenance nightmare if internal state can be arbitrary
> modified by a BPF program and still expected to work properly in all cases.
> Every review would have to take into account "what if a BPF script modifies
> state behind our back?"

Thanks for the feedback.

I agree that it is hard to support if we allow BPF to change any internal
stage as a rule. That is why it is a RFC. Would you consider it case
by case basis?
The kernel panic is bad, the first patch is actually very small. I can
also change it
to generate warnings if we detect the inconsistent state.

How about the second (clean up) patch or Keming's clean up version? I can modify
it to take out the pcp->count if the verdict is just not supporting
BPF changing internal
state at all. I do wish to get rid of the pindex_min and pindex_max.

Thanks

Chris

2023-08-23 01:37:47

by Chris Li

[permalink] [raw]
Subject: Re: [PATCH RFC 0/2] mm/page_alloc: free_pcppages_bulk safeguard

Hi Kemeng,

On Mon, Aug 21, 2023 at 6:27 PM Kemeng Shi <[email protected]> wrote:
> >
> Agreed. We assume pcp->count is protected by pcp->lock. Instead of make code
> work in case pcp->count could be changed without lock held, it's more reasonble
> to modify pcp->count with pcp->lock held in BPF program.

The lock is holded when pcp->count is modified. It is going through
the kernel page
allocation API. The issue is nest memory allocation inside spin_lock()
introduced by BPF.

The execution sequence is like this:

count = min(pcp->count, count);

/* Ensure requested pindex is drained first. */
pindex = pindex - 1;
bpf_injected_spin_lock_irqsave {
alloc_page();
original spin_lock_irqsave(&zone->lock, flags) ;
}

Chris





Chris