While testing OBJFREELIST_SLAB integration with pagealloc, we found a
bug where kmem_cache(sys) would be created with both CFLGS_OFF_SLAB &
CFLGS_OBJFREELIST_SLAB.
The original kmem_cache is created early making OFF_SLAB not possible.
When kmem_cache(sys) is created, OFF_SLAB is possible and if pagealloc
is enabled it will try to enable it first under certain conditions.
Given kmem_cache(sys) reuses the original flag, you can have both flags
at the same time resulting in allocation failures and odd behaviors.
The proposed fix removes these flags by default at the entrance of
__kmem_cache_create. This way the function will define which way the
freelist should be handled at this stage for the new cache.
Fixes: b03a017bebc4 ("mm/slab: introduce new slab management type, OBJFREELIST_SLAB")
Signed-off-by: Thomas Garnier <[email protected]>
Signed-off-by: Greg Thelen <[email protected]>
---
Based on next-20161025
---
mm/slab.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/mm/slab.c b/mm/slab.c
index 3c83c29..efe280a 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2027,6 +2027,14 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
int err;
size_t size = cachep->size;
+ /*
+ * memcg re-creates caches with the flags of the originals. Remove
+ * the freelist related flags to ensure they are re-defined at this
+ * stage. Prevent having both flags on edge cases like with pagealloc
+ * if the original cache was created too early to be OFF_SLAB.
+ */
+ flags &= ~(CFLGS_OBJFREELIST_SLAB|CFLGS_OFF_SLAB);
+
#if DEBUG
#if FORCED_DEBUG
/*
--
2.8.0.rc3.226.g39d4020
Hmmm...Doesnt this belong into memcg_create_kmem_cache() or into
kmem_cache_create() in mm/slab_common.h? Definitely not in an allocator
specific function since this is an issue for all allocators.
memcg_create_kmem_cache() simply assumes that it can pass flags from the
kmem_cache structure to kmem_cache_create(). However, those flags may
contain slab specific options.
kmem_cache_create() could filter out flags that cannot be specified.
Maybe create SLAB_FLAGS_PERMITTED in linux/mm/slab.h and mask other bits
out in kmem_cache_create()?
Slub also has internal flags and those also should not be passed to
kmem_cache_create(). If we define the valid ones we can mask them out.
The cleanest approach would be if kmem_cache_create() would reject invalid
flags and fail and if memcg_create_kmem_cache() would mask out the invalid
flags using SLAB_FLAGS_PERMITTED or so.
On Wed, 26 Oct 2016, Thomas Garnier wrote:
> While testing OBJFREELIST_SLAB integration with pagealloc, we found a
> bug where kmem_cache(sys) would be created with both CFLGS_OFF_SLAB &
> CFLGS_OBJFREELIST_SLAB.
>
> The original kmem_cache is created early making OFF_SLAB not possible.
> When kmem_cache(sys) is created, OFF_SLAB is possible and if pagealloc
> is enabled it will try to enable it first under certain conditions.
> Given kmem_cache(sys) reuses the original flag, you can have both flags
> at the same time resulting in allocation failures and odd behaviors.
>
> The proposed fix removes these flags by default at the entrance of
> __kmem_cache_create. This way the function will define which way the
> freelist should be handled at this stage for the new cache.
>
> Fixes: b03a017bebc4 ("mm/slab: introduce new slab management type, OBJFREELIST_SLAB")
> Signed-off-by: Thomas Garnier <[email protected]>
> Signed-off-by: Greg Thelen <[email protected]>
> ---
> Based on next-20161025
> ---
> mm/slab.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 3c83c29..efe280a 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -2027,6 +2027,14 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
> int err;
> size_t size = cachep->size;
>
> + /*
> + * memcg re-creates caches with the flags of the originals. Remove
> + * the freelist related flags to ensure they are re-defined at this
> + * stage. Prevent having both flags on edge cases like with pagealloc
> + * if the original cache was created too early to be OFF_SLAB.
> + */
> + flags &= ~(CFLGS_OBJFREELIST_SLAB|CFLGS_OFF_SLAB);
> +
> #if DEBUG
> #if FORCED_DEBUG
> /*
>
On Wed, Oct 26, 2016 at 12:08 PM, Christoph Lameter <[email protected]> wrote:
> Hmmm...Doesnt this belong into memcg_create_kmem_cache() or into
> kmem_cache_create() in mm/slab_common.h? Definitely not in an allocator
> specific function since this is an issue for all allocators.
>
> memcg_create_kmem_cache() simply assumes that it can pass flags from the
> kmem_cache structure to kmem_cache_create(). However, those flags may
> contain slab specific options.
>
> kmem_cache_create() could filter out flags that cannot be specified.
That make sense.
>
> Maybe create SLAB_FLAGS_PERMITTED in linux/mm/slab.h and mask other bits
> out in kmem_cache_create()?
>
> Slub also has internal flags and those also should not be passed to
> kmem_cache_create(). If we define the valid ones we can mask them out.
>
> The cleanest approach would be if kmem_cache_create() would reject invalid
> flags and fail and if memcg_create_kmem_cache() would mask out the invalid
> flags using SLAB_FLAGS_PERMITTED or so.
Okay, I think for SLAB we can allow everything except the two flags
mentioned here.
Should I deny certain flags for SLUB? I can allow everything for now.
>
>
>
> On Wed, 26 Oct 2016, Thomas Garnier wrote:
>
>> While testing OBJFREELIST_SLAB integration with pagealloc, we found a
>> bug where kmem_cache(sys) would be created with both CFLGS_OFF_SLAB &
>> CFLGS_OBJFREELIST_SLAB.
>>
>> The original kmem_cache is created early making OFF_SLAB not possible.
>> When kmem_cache(sys) is created, OFF_SLAB is possible and if pagealloc
>> is enabled it will try to enable it first under certain conditions.
>> Given kmem_cache(sys) reuses the original flag, you can have both flags
>> at the same time resulting in allocation failures and odd behaviors.
>>
>> The proposed fix removes these flags by default at the entrance of
>> __kmem_cache_create. This way the function will define which way the
>> freelist should be handled at this stage for the new cache.
>>
>> Fixes: b03a017bebc4 ("mm/slab: introduce new slab management type, OBJFREELIST_SLAB")
>> Signed-off-by: Thomas Garnier <[email protected]>
>> Signed-off-by: Greg Thelen <[email protected]>
>> ---
>> Based on next-20161025
>> ---
>> mm/slab.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/mm/slab.c b/mm/slab.c
>> index 3c83c29..efe280a 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -2027,6 +2027,14 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
>> int err;
>> size_t size = cachep->size;
>>
>> + /*
>> + * memcg re-creates caches with the flags of the originals. Remove
>> + * the freelist related flags to ensure they are re-defined at this
>> + * stage. Prevent having both flags on edge cases like with pagealloc
>> + * if the original cache was created too early to be OFF_SLAB.
>> + */
>> + flags &= ~(CFLGS_OBJFREELIST_SLAB|CFLGS_OFF_SLAB);
>> +
>> #if DEBUG
>> #if FORCED_DEBUG
>> /*
>>
--
Thomas
On Wed, 26 Oct 2016, Thomas Garnier wrote:
> Okay, I think for SLAB we can allow everything except the two flags
> mentioned here.
No no no. Just allow the flags already defined in include/linux/slab.h
that can be specd by subsystems when they call into the slab allocators.
> Should I deny certain flags for SLUB? I can allow everything for now.
All allocator should just allow flags defined in include/linux/slab.h be
passed to kmem_cache_create(). That is the API that all allocators need to support.
If someone wants to add new flags then we need to make sure that all
allocators can handle it.
The flags are (from include/linux/slab.h)
/*
* Flags to pass to kmem_cache_create().
*/
#define SLAB_CONSISTENCY_CHECKS 0x00000100UL /* DEBUG: Perform (expensive) checks on alloc/free */
#define SLAB_RED_ZONE 0x00000400UL /* DEBUG: Red zone objs in a cache */
#define SLAB_POISON 0x00000800UL /* DEBUG: Poison objects */
#define SLAB_HWCACHE_ALIGN 0x00002000UL /* Align objs on cache lines */
#define SLAB_CACHE_DMA 0x00004000UL /* Use GFP_DMA memory */
#define SLAB_STORE_USER 0x00010000UL /* DEBUG: Store the last owner for bug hunting */
#define SLAB_PANIC 0x00040000UL /* Panic if kmem_cache_create() fails */
#define SLAB_DESTROY_BY_RCU 0x00080000UL /* Defer freeing slabs to RCU */
#define SLAB_MEM_SPREAD 0x00100000UL /* Spread some memory over cpuset */
#define SLAB_TRACE 0x00200000UL /* Trace allocations and frees */
#define SLAB_DEBUG_OBJECTS 0x00400000UL
#define SLAB_NOLEAKTRACE 0x00800000UL /* Avoid kmemleak tracing
#define SLAB_NOTRACK 0x01000000UL
#define SLAB_FAILSLAB 0x02000000UL /* Fault injection mark */
#define SLAB_ACCOUNT 0x04000000UL /* Account to memcg */
#define SLAB_KASAN 0x08000000UL
#define SLAB_RECLAIM_ACCOUNT 0x00020000UL /* Objects are reclaimable */
The patch is marked for memcg but I do not see any direct relation.
I am not familiar with this code enough probably but if this really is
memcg kmem related, please do not forget to CC Vladimir
On Wed 26-10-16 10:41:28, Thomas Garnier wrote:
> While testing OBJFREELIST_SLAB integration with pagealloc, we found a
> bug where kmem_cache(sys) would be created with both CFLGS_OFF_SLAB &
> CFLGS_OBJFREELIST_SLAB.
>
> The original kmem_cache is created early making OFF_SLAB not possible.
> When kmem_cache(sys) is created, OFF_SLAB is possible and if pagealloc
> is enabled it will try to enable it first under certain conditions.
> Given kmem_cache(sys) reuses the original flag, you can have both flags
> at the same time resulting in allocation failures and odd behaviors.
>
> The proposed fix removes these flags by default at the entrance of
> __kmem_cache_create. This way the function will define which way the
> freelist should be handled at this stage for the new cache.
>
> Fixes: b03a017bebc4 ("mm/slab: introduce new slab management type, OBJFREELIST_SLAB")
> Signed-off-by: Thomas Garnier <[email protected]>
> Signed-off-by: Greg Thelen <[email protected]>
> ---
> Based on next-20161025
> ---
> mm/slab.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 3c83c29..efe280a 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -2027,6 +2027,14 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
> int err;
> size_t size = cachep->size;
>
> + /*
> + * memcg re-creates caches with the flags of the originals. Remove
> + * the freelist related flags to ensure they are re-defined at this
> + * stage. Prevent having both flags on edge cases like with pagealloc
> + * if the original cache was created too early to be OFF_SLAB.
> + */
> + flags &= ~(CFLGS_OBJFREELIST_SLAB|CFLGS_OFF_SLAB);
> +
> #if DEBUG
> #if FORCED_DEBUG
> /*
> --
> 2.8.0.rc3.226.g39d4020
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Michal Hocko
SUSE Labs
On Thu, Oct 27, 2016 at 12:25 AM, Michal Hocko <[email protected]> wrote:
> The patch is marked for memcg but I do not see any direct relation.
> I am not familiar with this code enough probably but if this really is
> memcg kmem related, please do not forget to CC Vladimir
>
Yes, the next iteration should be closer to memcg. I will CC Vladimir.
Thanks for the heads-up.
> On Wed 26-10-16 10:41:28, Thomas Garnier wrote:
>> While testing OBJFREELIST_SLAB integration with pagealloc, we found a
>> bug where kmem_cache(sys) would be created with both CFLGS_OFF_SLAB &
>> CFLGS_OBJFREELIST_SLAB.
>>
>> The original kmem_cache is created early making OFF_SLAB not possible.
>> When kmem_cache(sys) is created, OFF_SLAB is possible and if pagealloc
>> is enabled it will try to enable it first under certain conditions.
>> Given kmem_cache(sys) reuses the original flag, you can have both flags
>> at the same time resulting in allocation failures and odd behaviors.
>>
>> The proposed fix removes these flags by default at the entrance of
>> __kmem_cache_create. This way the function will define which way the
>> freelist should be handled at this stage for the new cache.
>>
>> Fixes: b03a017bebc4 ("mm/slab: introduce new slab management type, OBJFREELIST_SLAB")
>> Signed-off-by: Thomas Garnier <[email protected]>
>> Signed-off-by: Greg Thelen <[email protected]>
>> ---
>> Based on next-20161025
>> ---
>> mm/slab.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/mm/slab.c b/mm/slab.c
>> index 3c83c29..efe280a 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -2027,6 +2027,14 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
>> int err;
>> size_t size = cachep->size;
>>
>> + /*
>> + * memcg re-creates caches with the flags of the originals. Remove
>> + * the freelist related flags to ensure they are re-defined at this
>> + * stage. Prevent having both flags on edge cases like with pagealloc
>> + * if the original cache was created too early to be OFF_SLAB.
>> + */
>> + flags &= ~(CFLGS_OBJFREELIST_SLAB|CFLGS_OFF_SLAB);
>> +
>> #if DEBUG
>> #if FORCED_DEBUG
>> /*
>> --
>> 2.8.0.rc3.226.g39d4020
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to [email protected]. For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>
> --
> Michal Hocko
> SUSE Labs
--
Thomas