2020-03-04 00:24:23

by Jann Horn

[permalink] [raw]
Subject: SLUB: sysfs lets root force slab order below required minimum, causing memory corruption

Hi!

FYI, I noticed that if you do something like the following as root,
the system blows up pretty quickly with error messages about stuff
like corrupt freelist pointers because SLUB actually allows root to
force a page order that is smaller than what is required to store a
single object:

echo 0 > /sys/kernel/slab/task_struct/order

The other SLUB debugging options, like red_zone, also look kind of
suspicious with regards to races (either racing with other writes to
the SLUB debugging options, or with object allocations).


2020-03-04 01:28:02

by David Rientjes

[permalink] [raw]
Subject: Re: SLUB: sysfs lets root force slab order below required minimum, causing memory corruption

On Wed, 4 Mar 2020, Jann Horn wrote:

> Hi!
>
> FYI, I noticed that if you do something like the following as root,
> the system blows up pretty quickly with error messages about stuff
> like corrupt freelist pointers because SLUB actually allows root to
> force a page order that is smaller than what is required to store a
> single object:
>
> echo 0 > /sys/kernel/slab/task_struct/order
>
> The other SLUB debugging options, like red_zone, also look kind of
> suspicious with regards to races (either racing with other writes to
> the SLUB debugging options, or with object allocations).
>

Thanks for the report, Jann. To address the most immediate issue,
allowing a smaller order than allowed, I think we'd need something like
this.

I can propose it as a formal patch if nobody has any alternate
suggestions?
---
mm/slub.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3598,7 +3598,7 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
*/
size = ALIGN(size, s->align);
s->size = size;
- if (forced_order >= 0)
+ if (forced_order >= slab_order(size, 1, MAX_ORDER, 1))
order = forced_order;
else
order = calculate_order(size);

2020-03-04 02:23:48

by Kees Cook

[permalink] [raw]
Subject: Re: SLUB: sysfs lets root force slab order below required minimum, causing memory corruption

On Tue, Mar 03, 2020 at 05:26:14PM -0800, David Rientjes wrote:
> On Wed, 4 Mar 2020, Jann Horn wrote:
>
> > Hi!
> >
> > FYI, I noticed that if you do something like the following as root,
> > the system blows up pretty quickly with error messages about stuff
> > like corrupt freelist pointers because SLUB actually allows root to
> > force a page order that is smaller than what is required to store a
> > single object:
> >
> > echo 0 > /sys/kernel/slab/task_struct/order
> >
> > The other SLUB debugging options, like red_zone, also look kind of
> > suspicious with regards to races (either racing with other writes to
> > the SLUB debugging options, or with object allocations).
> >
>
> Thanks for the report, Jann. To address the most immediate issue,
> allowing a smaller order than allowed, I think we'd need something like
> this.
>
> I can propose it as a formal patch if nobody has any alternate
> suggestions?
> ---
> mm/slub.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3598,7 +3598,7 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
> */
> size = ALIGN(size, s->align);
> s->size = size;
> - if (forced_order >= 0)
> + if (forced_order >= slab_order(size, 1, MAX_ORDER, 1))
> order = forced_order;
> else
> order = calculate_order(size);

Seems reasonable!

For the race concerns, should this logic just make sure the resulting
order can never shrink? Or does it need much stronger atomicity?

--
Kees Cook

2020-03-04 13:18:24

by Vlastimil Babka

[permalink] [raw]
Subject: Re: SLUB: sysfs lets root force slab order below required minimum, causing memory corruption

On 3/4/20 1:23 AM, Jann Horn wrote:
> Hi!
>
> FYI, I noticed that if you do something like the following as root,
> the system blows up pretty quickly with error messages about stuff
> like corrupt freelist pointers because SLUB actually allows root to
> force a page order that is smaller than what is required to store a
> single object:
>
> echo 0 > /sys/kernel/slab/task_struct/order
>
> The other SLUB debugging options, like red_zone, also look kind of
> suspicious with regards to races (either racing with other writes to
> the SLUB debugging options, or with object allocations).

Yeah I also wondered last week that there seems to be no sychronization with
alloc/free activity. Increasing order is AFAICS also dangerous with freelist
randomization:

https://lore.kernel.org/linux-mm/[email protected]/

2020-03-04 14:58:12

by Pekka Enberg

[permalink] [raw]
Subject: Re: SLUB: sysfs lets root force slab order below required minimum, causing memory corruption



On 3/4/20 3:26 AM, David Rientjes wrote:
> On Wed, 4 Mar 2020, Jann Horn wrote:
>
>> Hi!
>>
>> FYI, I noticed that if you do something like the following as root,
>> the system blows up pretty quickly with error messages about stuff
>> like corrupt freelist pointers because SLUB actually allows root to
>> force a page order that is smaller than what is required to store a
>> single object:
>>
>> echo 0 > /sys/kernel/slab/task_struct/order
>>
>> The other SLUB debugging options, like red_zone, also look kind of
>> suspicious with regards to races (either racing with other writes to
>> the SLUB debugging options, or with object allocations).
>>
>
> Thanks for the report, Jann. To address the most immediate issue,
> allowing a smaller order than allowed, I think we'd need something like
> this.
>
> I can propose it as a formal patch if nobody has any alternate
> suggestions?
> ---
> mm/slub.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3598,7 +3598,7 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
> */
> size = ALIGN(size, s->align);
> s->size = size;
> - if (forced_order >= 0)
> + if (forced_order >= slab_order(size, 1, MAX_ORDER, 1))
> order = forced_order;
> else
> order = calculate_order(size);
>

Reviewed-by: Pekka Enberg <[email protected]>

2020-03-04 17:27:08

by Vlastimil Babka

[permalink] [raw]
Subject: Re: SLUB: sysfs lets root force slab order below required minimum, causing memory corruption

On 3/4/20 3:22 AM, Kees Cook wrote:
> On Tue, Mar 03, 2020 at 05:26:14PM -0800, David Rientjes wrote:
>
> Seems reasonable!
>
> For the race concerns, should this logic just make sure the resulting
> order can never shrink? Or does it need much stronger atomicity?

If order grows, I think we also need to recalculate the random sequence for
freelist randomization [1]. I expect that would be rather problematic with
parallel allocations/freeing going on.

As was also noted, the any_slab_objects(s) checks are racy - might return false
and immediately some other CPU can allocate some.

I wonder if this race window could be fixed at all without introducing extra
locking in the fast path? Which means it's probably not worth the trouble of
having these runtime knobs. How about making the files read-only (if not remove
completely). Vijayanand described a use case in [2], shouldn't it be possible to
implement that scenario (all caches have debugging enabled except zram cache)
with kernel parameters only?

Thanks,
Vlastimil

[1] https://lore.kernel.org/linux-mm/[email protected]/
[2]
https://lore.kernel.org/linux-mm/[email protected]/

2020-03-04 20:41:24

by David Rientjes

[permalink] [raw]
Subject: Re: SLUB: sysfs lets root force slab order below required minimum, causing memory corruption

On Wed, 4 Mar 2020, Vlastimil Babka wrote:

> > Seems reasonable!
> >
> > For the race concerns, should this logic just make sure the resulting
> > order can never shrink? Or does it need much stronger atomicity?
>
> If order grows, I think we also need to recalculate the random sequence for
> freelist randomization [1]. I expect that would be rather problematic with
> parallel allocations/freeing going on.
>
> As was also noted, the any_slab_objects(s) checks are racy - might return false
> and immediately some other CPU can allocate some.
>
> I wonder if this race window could be fixed at all without introducing extra
> locking in the fast path? Which means it's probably not worth the trouble of
> having these runtime knobs. How about making the files read-only (if not remove
> completely). Vijayanand described a use case in [2], shouldn't it be possible to
> implement that scenario (all caches have debugging enabled except zram cache)
> with kernel parameters only?
>

I'm not sure how dependent the CONFIG_SLUB_DEBUG users are on being able
to modify these are runtime (they've been around for 12+ years) but I
agree that it seems particularly dangerous.

I think they can be fixed by freezing allocations and frees for the
particular kmem_cache on all cpus which would add the additional
conditional in the fastpath and that's going to be required in the very
small minority of cases where an admin actually wants to change these.

The slub_debug kernel command line options are already pretty
comprehensive as described by Documentation/vm/slub.rst. I *think* these
tunables were primarily introduced for kernel debugging and not general
purpose, perhaps with the exception of "order".

So I think we may be able to fix "order" with a combination of my patch as
well as a fix to the freelist randomization and that the others should
likely be made read only.

Subject: Re: SLUB: sysfs lets root force slab order below required minimum, causing memory corruption

On Wed, 4 Mar 2020, David Rientjes wrote:

> I'm not sure how dependent the CONFIG_SLUB_DEBUG users are on being able
> to modify these are runtime (they've been around for 12+ years) but I
> agree that it seems particularly dangerous.

The order of each individual slab page is stored in struct page. That is
why every slub slab page can have a different order. This enabled fallback
to order 0 allocations and also allows a dynamic configuration of the
order at runtime.

> The slub_debug kernel command line options are already pretty
> comprehensive as described by Documentation/vm/slub.rst. I *think* these
> tunables were primarily introduced for kernel debugging and not general
> purpose, perhaps with the exception of "order".

What do you mean by "general purpose? Certainly the allocator should not
blow up when forcing zero order allocations.

> So I think we may be able to fix "order" with a combination of my patch as
> well as a fix to the freelist randomization and that the others should
> likely be made read only.

Hmmm. races increases as more metadata is added that is depending on the
size of the slab page and the number of objects in it.