From: Matthew Wilcox <[email protected]>
__GFP_ZERO requests that the object be initialised to all-zeroes,
while the purpose of a constructor is to initialise an object to a
particular pattern. We cannot do both. Add a warning to catch any
users who mistakenly pass a __GFP_ZERO flag when allocating a slab with
a constructor.
Fixes: d07dbea46405 ("Slab allocators: support __GFP_ZERO in all allocators")
Signed-off-by: Matthew Wilcox <[email protected]>
Cc: [email protected]
---
mm/slab.c | 6 ++++--
mm/slob.c | 4 +++-
mm/slub.c | 6 ++++--
3 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/mm/slab.c b/mm/slab.c
index 38d3f4fd17d7..8b2cb7db85db 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3313,8 +3313,10 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
local_irq_restore(save_flags);
ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
- if (unlikely(flags & __GFP_ZERO) && ptr)
- memset(ptr, 0, cachep->object_size);
+ if (unlikely(flags & __GFP_ZERO) && ptr) {
+ if (!WARN_ON_ONCE(cachep->ctor))
+ memset(ptr, 0, cachep->object_size);
+ }
slab_post_alloc_hook(cachep, flags, 1, &ptr);
return ptr;
diff --git a/mm/slob.c b/mm/slob.c
index 1a46181b675c..958173fd7c24 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -556,8 +556,10 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
flags, node);
}
- if (b && c->ctor)
+ if (b && c->ctor) {
+ WARN_ON_ONCE(flags & __GFP_ZERO);
c->ctor(b);
+ }
kmemleak_alloc_recursive(b, c->size, 1, c->flags, flags);
return b;
diff --git a/mm/slub.c b/mm/slub.c
index 9e1100f9298f..0f55f0a0dcaa 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2714,8 +2714,10 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
stat(s, ALLOC_FASTPATH);
}
- if (unlikely(gfpflags & __GFP_ZERO) && object)
- memset(object, 0, s->object_size);
+ if (unlikely(gfpflags & __GFP_ZERO) && object) {
+ if (!WARN_ON_ONCE(s->ctor))
+ memset(object, 0, s->object_size);
+ }
slab_post_alloc_hook(s, gfpflags, 1, &object);
--
2.16.3
From: Matthew Wilcox <[email protected]>
The page cache has used the mapping's GFP flags for allocating
radix tree nodes for a long time. It took care to always mask off the
__GFP_HIGHMEM flag, and masked off other flags in other paths, but the
__GFP_ZERO flag was still able to sneak through. The __GFP_DMA and
__GFP_DMA32 flags would also have been able to sneak through if they
were ever used. Fix them all by using GFP_RECLAIM_MASK at the innermost
location, and remove it from earlier in the callchain.
Fixes: 19f99cee206c ("f2fs: add core inode operations")
Reported-by: Minchan Kim <[email protected]>
Signed-off-by: Matthew Wilcox <[email protected]>
Cc: [email protected]
---
mm/filemap.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index c2147682f4c3..1a4bfc5ed3dc 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -785,7 +785,7 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
VM_BUG_ON_PAGE(!PageLocked(new), new);
VM_BUG_ON_PAGE(new->mapping, new);
- error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
+ error = radix_tree_preload(gfp_mask & GFP_RECLAIM_MASK);
if (!error) {
struct address_space *mapping = old->mapping;
void (*freepage)(struct page *);
@@ -841,7 +841,7 @@ static int __add_to_page_cache_locked(struct page *page,
return error;
}
- error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM);
+ error = radix_tree_maybe_preload(gfp_mask & GFP_RECLAIM_MASK);
if (error) {
if (!huge)
mem_cgroup_cancel_charge(page, memcg, false);
@@ -1574,8 +1574,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset,
if (fgp_flags & FGP_ACCESSED)
__SetPageReferenced(page);
- err = add_to_page_cache_lru(page, mapping, offset,
- gfp_mask & GFP_RECLAIM_MASK);
+ err = add_to_page_cache_lru(page, mapping, offset, gfp_mask);
if (unlikely(err)) {
put_page(page);
page = NULL;
@@ -2378,7 +2377,7 @@ static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask)
if (!page)
return -ENOMEM;
- ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask & GFP_KERNEL);
+ ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask);
if (ret == 0)
ret = mapping->a_ops->readpage(file, page);
else if (ret == -EEXIST)
--
2.16.3
On Tue, Apr 10, 2018 at 05:53:50AM -0700, Matthew Wilcox wrote:
> From: Matthew Wilcox <[email protected]>
>
> __GFP_ZERO requests that the object be initialised to all-zeroes,
> while the purpose of a constructor is to initialise an object to a
> particular pattern. We cannot do both. Add a warning to catch any
> users who mistakenly pass a __GFP_ZERO flag when allocating a slab with
> a constructor.
>
> Fixes: d07dbea46405 ("Slab allocators: support __GFP_ZERO in all allocators")
> Signed-off-by: Matthew Wilcox <[email protected]>
> Cc: [email protected]
Acked-by: Johannes Weiner <[email protected]>
On Tue, Apr 10, 2018 at 05:53:51AM -0700, Matthew Wilcox wrote:
> From: Matthew Wilcox <[email protected]>
>
> The page cache has used the mapping's GFP flags for allocating
> radix tree nodes for a long time. It took care to always mask off the
> __GFP_HIGHMEM flag, and masked off other flags in other paths, but the
> __GFP_ZERO flag was still able to sneak through. The __GFP_DMA and
> __GFP_DMA32 flags would also have been able to sneak through if they
> were ever used. Fix them all by using GFP_RECLAIM_MASK at the innermost
> location, and remove it from earlier in the callchain.
Could you please mention the nullptr crash here, maybe even in the
patch subject? That makes it much easier to find this patch when you
run into that bug or when evaluating backport candidates.
Other than that,
> Fixes: 19f99cee206c ("f2fs: add core inode operations")
> Reported-by: Minchan Kim <[email protected]>
> Signed-off-by: Matthew Wilcox <[email protected]>
> Cc: [email protected]
Acked-by: Johannes Weiner <[email protected]>
On Tue 10-04-18 05:53:50, Matthew Wilcox wrote:
> From: Matthew Wilcox <[email protected]>
>
> __GFP_ZERO requests that the object be initialised to all-zeroes,
> while the purpose of a constructor is to initialise an object to a
> particular pattern. We cannot do both. Add a warning to catch any
> users who mistakenly pass a __GFP_ZERO flag when allocating a slab with
> a constructor.
>
> Fixes: d07dbea46405 ("Slab allocators: support __GFP_ZERO in all allocators")
> Signed-off-by: Matthew Wilcox <[email protected]>
> Cc: [email protected]
> ---
> mm/slab.c | 6 ++++--
> mm/slob.c | 4 +++-
> mm/slub.c | 6 ++++--
> 3 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 38d3f4fd17d7..8b2cb7db85db 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3313,8 +3313,10 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
> local_irq_restore(save_flags);
> ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
>
> - if (unlikely(flags & __GFP_ZERO) && ptr)
> - memset(ptr, 0, cachep->object_size);
> + if (unlikely(flags & __GFP_ZERO) && ptr) {
> + if (!WARN_ON_ONCE(cachep->ctor))
> + memset(ptr, 0, cachep->object_size);
> + }
>
> slab_post_alloc_hook(cachep, flags, 1, &ptr);
> return ptr;
Why don't we need to cover this in slab_alloc and kmem_cache_alloc_bulk as well?
Other than that this patch makes sense to me.
--
Michal Hocko
SUSE Labs
On Tue 10-04-18 05:53:51, Matthew Wilcox wrote:
> From: Matthew Wilcox <[email protected]>
>
> The page cache has used the mapping's GFP flags for allocating
> radix tree nodes for a long time. It took care to always mask off the
> __GFP_HIGHMEM flag, and masked off other flags in other paths, but the
> __GFP_ZERO flag was still able to sneak through. The __GFP_DMA and
> __GFP_DMA32 flags would also have been able to sneak through if they
> were ever used. Fix them all by using GFP_RECLAIM_MASK at the innermost
> location, and remove it from earlier in the callchain.
>
> Fixes: 19f99cee206c ("f2fs: add core inode operations")
> Reported-by: Minchan Kim <[email protected]>
> Signed-off-by: Matthew Wilcox <[email protected]>
> Cc: [email protected]
I would push this into __radix_tree_preload...
Anyway
Acked-by: Michal Hocko <[email protected]>
> ---
> mm/filemap.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index c2147682f4c3..1a4bfc5ed3dc 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -785,7 +785,7 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
> VM_BUG_ON_PAGE(!PageLocked(new), new);
> VM_BUG_ON_PAGE(new->mapping, new);
>
> - error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
> + error = radix_tree_preload(gfp_mask & GFP_RECLAIM_MASK);
> if (!error) {
> struct address_space *mapping = old->mapping;
> void (*freepage)(struct page *);
> @@ -841,7 +841,7 @@ static int __add_to_page_cache_locked(struct page *page,
> return error;
> }
>
> - error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM);
> + error = radix_tree_maybe_preload(gfp_mask & GFP_RECLAIM_MASK);
> if (error) {
> if (!huge)
> mem_cgroup_cancel_charge(page, memcg, false);
> @@ -1574,8 +1574,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset,
> if (fgp_flags & FGP_ACCESSED)
> __SetPageReferenced(page);
>
> - err = add_to_page_cache_lru(page, mapping, offset,
> - gfp_mask & GFP_RECLAIM_MASK);
> + err = add_to_page_cache_lru(page, mapping, offset, gfp_mask);
> if (unlikely(err)) {
> put_page(page);
> page = NULL;
> @@ -2378,7 +2377,7 @@ static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask)
> if (!page)
> return -ENOMEM;
>
> - ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask & GFP_KERNEL);
> + ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask);
> if (ret == 0)
> ret = mapping->a_ops->readpage(file, page);
> else if (ret == -EEXIST)
> --
> 2.16.3
--
Michal Hocko
SUSE Labs
On 04/10/2018 02:53 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <[email protected]>
>
> __GFP_ZERO requests that the object be initialised to all-zeroes,
> while the purpose of a constructor is to initialise an object to a
> particular pattern. We cannot do both. Add a warning to catch any
> users who mistakenly pass a __GFP_ZERO flag when allocating a slab with
> a constructor.
>
> Fixes: d07dbea46405 ("Slab allocators: support __GFP_ZERO in all allocators")
> Signed-off-by: Matthew Wilcox <[email protected]>
> Cc: [email protected]
It doesn't fix any known problem, does it? Then the stable tag seems too
much IMHO. Especially for fixing a 2007 commit.
Otherwise
Acked-by: Vlastimil Babka <[email protected]>
> ---
> mm/slab.c | 6 ++++--
> mm/slob.c | 4 +++-
> mm/slub.c | 6 ++++--
> 3 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 38d3f4fd17d7..8b2cb7db85db 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3313,8 +3313,10 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
> local_irq_restore(save_flags);
> ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
>
> - if (unlikely(flags & __GFP_ZERO) && ptr)
> - memset(ptr, 0, cachep->object_size);
> + if (unlikely(flags & __GFP_ZERO) && ptr) {
> + if (!WARN_ON_ONCE(cachep->ctor))
> + memset(ptr, 0, cachep->object_size);
> + }
>
> slab_post_alloc_hook(cachep, flags, 1, &ptr);
> return ptr;
> diff --git a/mm/slob.c b/mm/slob.c
> index 1a46181b675c..958173fd7c24 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -556,8 +556,10 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
> flags, node);
> }
>
> - if (b && c->ctor)
> + if (b && c->ctor) {
> + WARN_ON_ONCE(flags & __GFP_ZERO);
> c->ctor(b);
> + }
>
> kmemleak_alloc_recursive(b, c->size, 1, c->flags, flags);
> return b;
> diff --git a/mm/slub.c b/mm/slub.c
> index 9e1100f9298f..0f55f0a0dcaa 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2714,8 +2714,10 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
> stat(s, ALLOC_FASTPATH);
> }
>
> - if (unlikely(gfpflags & __GFP_ZERO) && object)
> - memset(object, 0, s->object_size);
> + if (unlikely(gfpflags & __GFP_ZERO) && object) {
> + if (!WARN_ON_ONCE(s->ctor))
> + memset(object, 0, s->object_size);
> + }
>
> slab_post_alloc_hook(s, gfpflags, 1, &object);
>
>
On Tue, Apr 10, 2018 at 05:53:51AM -0700, Matthew Wilcox wrote:
> From: Matthew Wilcox <[email protected]>
>
> The page cache has used the mapping's GFP flags for allocating
> radix tree nodes for a long time. It took care to always mask off the
> __GFP_HIGHMEM flag, and masked off other flags in other paths, but the
> __GFP_ZERO flag was still able to sneak through. The __GFP_DMA and
> __GFP_DMA32 flags would also have been able to sneak through if they
> were ever used. Fix them all by using GFP_RECLAIM_MASK at the innermost
> location, and remove it from earlier in the callchain.
>
> Fixes: 19f99cee206c ("f2fs: add core inode operations")
Why this patch fix 19f99cee206c instead of 449dd6984d0e?
F2FS doesn't have any problem before introducing 449dd6984d0e?
> Reported-by: Minchan Kim <[email protected]>
> Signed-off-by: Matthew Wilcox <[email protected]>
> Cc: [email protected]
> ---
> mm/filemap.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index c2147682f4c3..1a4bfc5ed3dc 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -785,7 +785,7 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
> VM_BUG_ON_PAGE(!PageLocked(new), new);
> VM_BUG_ON_PAGE(new->mapping, new);
>
> - error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
> + error = radix_tree_preload(gfp_mask & GFP_RECLAIM_MASK);
> if (!error) {
> struct address_space *mapping = old->mapping;
> void (*freepage)(struct page *);
> @@ -841,7 +841,7 @@ static int __add_to_page_cache_locked(struct page *page,
> return error;
> }
>
> - error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM);
> + error = radix_tree_maybe_preload(gfp_mask & GFP_RECLAIM_MASK);
> if (error) {
> if (!huge)
> mem_cgroup_cancel_charge(page, memcg, false);
> @@ -1574,8 +1574,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset,
> if (fgp_flags & FGP_ACCESSED)
> __SetPageReferenced(page);
>
> - err = add_to_page_cache_lru(page, mapping, offset,
> - gfp_mask & GFP_RECLAIM_MASK);
> + err = add_to_page_cache_lru(page, mapping, offset, gfp_mask);
> if (unlikely(err)) {
> put_page(page);
> page = NULL;
> @@ -2378,7 +2377,7 @@ static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask)
> if (!page)
> return -ENOMEM;
>
> - ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask & GFP_KERNEL);
> + ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask);
> if (ret == 0)
> ret = mapping->a_ops->readpage(file, page);
> else if (ret == -EEXIST)
> --
> 2.16.3
>
On Tue 10-04-18 05:53:51, Matthew Wilcox wrote:
> From: Matthew Wilcox <[email protected]>
>
> The page cache has used the mapping's GFP flags for allocating
> radix tree nodes for a long time. It took care to always mask off the
> __GFP_HIGHMEM flag, and masked off other flags in other paths, but the
> __GFP_ZERO flag was still able to sneak through. The __GFP_DMA and
> __GFP_DMA32 flags would also have been able to sneak through if they
> were ever used. Fix them all by using GFP_RECLAIM_MASK at the innermost
> location, and remove it from earlier in the callchain.
>
> Fixes: 19f99cee206c ("f2fs: add core inode operations")
> Reported-by: Minchan Kim <[email protected]>
> Signed-off-by: Matthew Wilcox <[email protected]>
> Cc: [email protected]
Looks good to me. You can add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> ---
> mm/filemap.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index c2147682f4c3..1a4bfc5ed3dc 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -785,7 +785,7 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
> VM_BUG_ON_PAGE(!PageLocked(new), new);
> VM_BUG_ON_PAGE(new->mapping, new);
>
> - error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
> + error = radix_tree_preload(gfp_mask & GFP_RECLAIM_MASK);
> if (!error) {
> struct address_space *mapping = old->mapping;
> void (*freepage)(struct page *);
> @@ -841,7 +841,7 @@ static int __add_to_page_cache_locked(struct page *page,
> return error;
> }
>
> - error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM);
> + error = radix_tree_maybe_preload(gfp_mask & GFP_RECLAIM_MASK);
> if (error) {
> if (!huge)
> mem_cgroup_cancel_charge(page, memcg, false);
> @@ -1574,8 +1574,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset,
> if (fgp_flags & FGP_ACCESSED)
> __SetPageReferenced(page);
>
> - err = add_to_page_cache_lru(page, mapping, offset,
> - gfp_mask & GFP_RECLAIM_MASK);
> + err = add_to_page_cache_lru(page, mapping, offset, gfp_mask);
> if (unlikely(err)) {
> put_page(page);
> page = NULL;
> @@ -2378,7 +2377,7 @@ static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask)
> if (!page)
> return -ENOMEM;
>
> - ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask & GFP_KERNEL);
> + ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask);
> if (ret == 0)
> ret = mapping->a_ops->readpage(file, page);
> else if (ret == -EEXIST)
> --
> 2.16.3
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On 04/10/2018 05:53 AM, Matthew Wilcox wrote:
> From: Matthew Wilcox <[email protected]>
>
> __GFP_ZERO requests that the object be initialised to all-zeroes,
> while the purpose of a constructor is to initialise an object to a
> particular pattern. We cannot do both. Add a warning to catch any
> users who mistakenly pass a __GFP_ZERO flag when allocating a slab with
> a constructor.
>
> Fixes: d07dbea46405 ("Slab allocators: support __GFP_ZERO in all allocators")
> Signed-off-by: Matthew Wilcox <[email protected]>
> Cc: [email protected]
Since there are probably no bug to fix, what about adding the extra check
only for some DEBUG option ?
How many caches are still using ctor these days ?
On Tue, Apr 10, 2018 at 10:45:45PM +0900, Minchan Kim wrote:
> On Tue, Apr 10, 2018 at 05:53:51AM -0700, Matthew Wilcox wrote:
> > From: Matthew Wilcox <[email protected]>
> >
> > The page cache has used the mapping's GFP flags for allocating
> > radix tree nodes for a long time. It took care to always mask off the
> > __GFP_HIGHMEM flag, and masked off other flags in other paths, but the
> > __GFP_ZERO flag was still able to sneak through. The __GFP_DMA and
> > __GFP_DMA32 flags would also have been able to sneak through if they
> > were ever used. Fix them all by using GFP_RECLAIM_MASK at the innermost
> > location, and remove it from earlier in the callchain.
> >
> > Fixes: 19f99cee206c ("f2fs: add core inode operations")
>
> Why this patch fix 19f99cee206c instead of 449dd6984d0e?
> F2FS doesn't have any problem before introducing 449dd6984d0e?
Well, there's the problem. This bug is the combination of three different
things:
1. The working set code relying on list_empty.
2. The page cache not filtering out the bad flags.
3. F2FS specifying a flag nobody had ever specified before.
So what single patch does this patch fix? I don't think it really matters.
On Tue, 10 Apr 2018, Matthew Wilcox wrote:
> __GFP_ZERO requests that the object be initialised to all-zeroes,
> while the purpose of a constructor is to initialise an object to a
> particular pattern. We cannot do both. Add a warning to catch any
> users who mistakenly pass a __GFP_ZERO flag when allocating a slab with
> a constructor.
Can we move this check out of the critical paths and check for
a ctor and GFP_ZERO when calling the page allocator? F.e. in
allocate_slab()?
On Tue, 10 Apr 2018, Christopher Lameter wrote:
> On Tue, 10 Apr 2018, Matthew Wilcox wrote:
>
> > __GFP_ZERO requests that the object be initialised to all-zeroes,
> > while the purpose of a constructor is to initialise an object to a
> > particular pattern. We cannot do both. Add a warning to catch any
> > users who mistakenly pass a __GFP_ZERO flag when allocating a slab with
> > a constructor.
>
> Can we move this check out of the critical paths and check for
> a ctor and GFP_ZERO when calling the page allocator? F.e. in
> allocate_slab()?
The ctor's are run at that point from setup_object() so it makes sense.
On 04/10, Matthew Wilcox wrote:
> On Tue, Apr 10, 2018 at 10:45:45PM +0900, Minchan Kim wrote:
> > On Tue, Apr 10, 2018 at 05:53:51AM -0700, Matthew Wilcox wrote:
> > > From: Matthew Wilcox <[email protected]>
> > >
> > > The page cache has used the mapping's GFP flags for allocating
> > > radix tree nodes for a long time. It took care to always mask off the
> > > __GFP_HIGHMEM flag, and masked off other flags in other paths, but the
> > > __GFP_ZERO flag was still able to sneak through. The __GFP_DMA and
> > > __GFP_DMA32 flags would also have been able to sneak through if they
> > > were ever used. Fix them all by using GFP_RECLAIM_MASK at the innermost
> > > location, and remove it from earlier in the callchain.
> > >
> > > Fixes: 19f99cee206c ("f2fs: add core inode operations")
> >
> > Why this patch fix 19f99cee206c instead of 449dd6984d0e?
> > F2FS doesn't have any problem before introducing 449dd6984d0e?
>
> Well, there's the problem. This bug is the combination of three different
> things:
>
> 1. The working set code relying on list_empty.
> 2. The page cache not filtering out the bad flags.
> 3. F2FS specifying a flag nobody had ever specified before.
>
> So what single patch does this patch fix? I don't think it really matters.
Hope there'd be someone who does care about patch description though, IMHO,
this fixes the MM regression introduced by:
449dd6984d0e ("mm: keep page cache radix tree nodes in check") merged in v3.15,
2014.
19f99cee206c ("f2fs: add core inode operations) merged in v3.8, 2012, just
revealed this out. In fact, I've never hit this bug in old kernels.
From the user viewpoint, may I suggest to describe what kind of symptom we're
able to see due to this bug?
Something like:
[ 7858.792946] [<ffffff80086f4de0>] __list_del_entry+0x30/0xd0
[ 7858.792951] [<ffffff8008362018>] list_lru_del+0xac/0x1ac
[ 7858.792957] [<ffffff800830f04c>] page_cache_tree_insert+0xd8/0x110
[ 7858.792962] [<ffffff8008310188>] __add_to_page_cache_locked+0xf8/0x4e0
[ 7858.792967] [<ffffff800830ff34>] add_to_page_cache_lru+0x50/0x1ac
[ 7858.792972] [<ffffff800830fdd0>] pagecache_get_page+0x468/0x57c
[ 7858.792979] [<ffffff80085d081c>] __get_node_page+0x84/0x764
[ 7858.792986] [<ffffff800859cd94>] f2fs_iget+0x264/0xdc8
[ 7858.792991] [<ffffff800859ee00>] f2fs_lookup+0x3b4/0x660
[ 7858.792998] [<ffffff80083d2540>] lookup_slow+0x1e4/0x348
[ 7858.793003] [<ffffff80083d0eb8>] walk_component+0x21c/0x320
[ 7858.793008] [<ffffff80083d0010>] path_lookupat+0x90/0x1bc
[ 7858.793013] [<ffffff80083cfe6c>] filename_lookup+0x8c/0x1a0
[ 7858.793018] [<ffffff80083c52d0>] vfs_fstatat+0x84/0x10c
[ 7858.793023] [<ffffff80083c5b00>] SyS_newfstatat+0x28/0x64
Thanks,
On Tue, Apr 10, 2018 at 09:21:20AM -0500, Christopher Lameter wrote:
> On Tue, 10 Apr 2018, Matthew Wilcox wrote:
>
> > __GFP_ZERO requests that the object be initialised to all-zeroes,
> > while the purpose of a constructor is to initialise an object to a
> > particular pattern. We cannot do both. Add a warning to catch any
> > users who mistakenly pass a __GFP_ZERO flag when allocating a slab with
> > a constructor.
>
> Can we move this check out of the critical paths and check for
> a ctor and GFP_ZERO when calling the page allocator? F.e. in
> allocate_slab()?
Are you willing to have this kind of bug go uncaught for a while?
In this specific case, __GFP_ZERO was only being passed on a few of the
calls to kmem_cache_alloc. So we'd happily trash the constructed object
any time we didn't allocate a page.
I appreciate it's a tradeoff, and we don't want to clutter the critical
path unnecessarily.
On Tue, Apr 10, 2018 at 06:53:04AM -0700, Eric Dumazet wrote:
> On 04/10/2018 05:53 AM, Matthew Wilcox wrote:
> > From: Matthew Wilcox <[email protected]>
> >
> > __GFP_ZERO requests that the object be initialised to all-zeroes,
> > while the purpose of a constructor is to initialise an object to a
> > particular pattern. We cannot do both. Add a warning to catch any
> > users who mistakenly pass a __GFP_ZERO flag when allocating a slab with
> > a constructor.
> >
> > Fixes: d07dbea46405 ("Slab allocators: support __GFP_ZERO in all allocators")
> > Signed-off-by: Matthew Wilcox <[email protected]>
> > Cc: [email protected]
>
> Since there are probably no bug to fix, what about adding the extra check
> only for some DEBUG option ?
>
> How many caches are still using ctor these days ?
That's a really good question, and strangely hard to find out. I settled
on "git grep -A4 kmem_cache_alloc" and then searching the 'less' output
with '[^L]);'.
--
arch/powerpc/kvm/book3s_64_mmu_radix.c: kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, 0, pte_ctor);
--
arch/powerpc/mm/init-common.c: new = kmem_cache_create(name, table_size, align, 0, ctor);
--
arch/powerpc/platforms/cell/spufs/inode.c: spufs_inode_cache = kmem_cache_create("spufs_inode_cache",
arch/powerpc/platforms/cell/spufs/inode.c- sizeof(struct spufs_inode_info), 0,
arch/powerpc/platforms/cell/spufs/inode.c- SLAB_HWCACHE_ALIGN|SLAB_ACCOUNT, spufs_init_once);
--
arch/sh/mm/pgtable.c: pgd_cachep = kmem_cache_create("pgd_cache",
arch/sh/mm/pgtable.c- PTRS_PER_PGD * (1<<PTE_MAGNITUDE),
arch/sh/mm/pgtable.c- PAGE_SIZE, SLAB_PANIC, pgd_ctor);
--
arch/sparc/mm/tsb.c: pgtable_cache = kmem_cache_create("pgtable_cache",
arch/sparc/mm/tsb.c- PAGE_SIZE, PAGE_SIZE,
arch/sparc/mm/tsb.c- 0,
arch/sparc/mm/tsb.c- _clear_page);
--
drivers/dax/super.c: dax_cache = kmem_cache_create("dax_cache", sizeof(struct
dax_device), 0,
drivers/dax/super.c- (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT
|
drivers/dax/super.c- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
drivers/dax/super.c- init_once);
--
drivers/staging/ncpfs/inode.c: ncp_inode_cachep = kmem_cache_create("ncp_inode_
cache",
drivers/staging/ncpfs/inode.c- sizeof(stru
ct ncp_inode_info),
drivers/staging/ncpfs/inode.c- 0, (SLAB_RE
CLAIM_ACCOUNT|
drivers/staging/ncpfs/inode.c- SLAB_MEM
_SPREAD|SLAB_ACCOUNT),
drivers/staging/ncpfs/inode.c- init_once);
--
drivers/usb/mon/mon_text.c: rp->e_slab = kmem_cache_create(rp->slab_name,
drivers/usb/mon/mon_text.c- sizeof(struct mon_event_text), sizeof(long),
0,
drivers/usb/mon/mon_text.c- mon_text_ctor);
--
fs/9p/v9fs.c: v9fs_inode_cache = kmem_cache_create("v9fs_inode_cache",
fs/9p/v9fs.c- sizeof(struct v9fs_inode),
fs/9p/v9fs.c- 0, (SLAB_RECLAIM_ACCOUNT|
fs/9p/v9fs.c- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
fs/9p/v9fs.c- v9fs_inode_init_once);
--
fs/adfs/super.c: adfs_inode_cachep = kmem_cache_create("adfs_inode_cache",
fs/adfs/super.c- sizeof(struct adfs_inode_info),
fs/adfs/super.c- 0, (SLAB_RECLAIM_ACCOUNT|
fs/adfs/super.c- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
fs/adfs/super.c- init_once);
... snip a huge number of filesystems ...
--
ipc/mqueue.c: mqueue_inode_cachep = kmem_cache_create("mqueue_inode_cache",
ipc/mqueue.c- sizeof(struct mqueue_inode_info), 0,
ipc/mqueue.c- SLAB_HWCACHE_ALIGN|SLAB_ACCOUNT, init_once);
--
kernel/fork.c: sighand_cachep = kmem_cache_create("sighand_cache",
kernel/fork.c- sizeof(struct sighand_struct), 0,
kernel/fork.c- SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_TYPESAFE_BY_R
CU|
kernel/fork.c- SLAB_ACCOUNT, sighand_ctor);
--
lib/radix-tree.c: radix_tree_node_cachep = kmem_cache_create("radix_tree_n
ode",
lib/radix-tree.c- sizeof(struct radix_tree_node), 0,
lib/radix-tree.c- SLAB_PANIC | SLAB_RECLAIM_ACCOUNT,
lib/radix-tree.c- radix_tree_node_ctor);
--
mm/rmap.c: anon_vma_cachep = kmem_cache_create("anon_vma", sizeof(struct an
on_vma),
mm/rmap.c- 0, SLAB_TYPESAFE_BY_RCU|SLAB_PANIC|SLAB_ACCOUNT,
mm/rmap.c- anon_vma_ctor);
--
mm/shmem.c: shmem_inode_cachep = kmem_cache_create("shmem_inode_cache",
mm/shmem.c- sizeof(struct shmem_inode_info),
mm/shmem.c- 0, SLAB_PANIC|SLAB_ACCOUNT, shmem_init_inode);
--
net/sunrpc/rpc_pipe.c: rpc_inode_cachep = kmem_cache_create("rpc_inode_cache",
net/sunrpc/rpc_pipe.c- sizeof(struct rpc_inode),
net/sunrpc/rpc_pipe.c- 0, (SLAB_HWCACHE_ALIGN|SLAB_RECL
AIM_ACCOUNT|
net/sunrpc/rpc_pipe.c- SLAB_MEM_SPREAD|
SLAB_ACCOUNT),
net/sunrpc/rpc_pipe.c- init_once);
--
security/integrity/iint.c: kmem_cache_create("iint_cache", sizeof(struc
t integrity_iint_cache),
security/integrity/iint.c- 0, SLAB_PANIC, init_once);
So aside from the filesystems, about fourteen places use it in the kernel.
If we want to get rid of the concept of constructors, it's doable,
but somebody needs to do the work to show what the effects will be.
For example, I took a quick look at the sighand_struct in kernel/fork.c.
That initialises the spinlock and waitqueue head which are at the end
of sighand_struct. The caller who allocates sighand_struct touches
the head of the struct. So if we removed the ctor, we'd touch two
cachelines on allocation instead of one ... but we could rearrange the
sighand_struct to put all the initialised bits in the first cacheline
(and we probably should).
On Tue, 10 Apr 2018, Matthew Wilcox wrote:
> Are you willing to have this kind of bug go uncaught for a while?
There will be frequent allocations and this will show up at some point.
Also you could put this into the debug only portions somehwere so we
always catch it when debugging is on,
'
On Tue, 10 Apr 2018, Matthew Wilcox wrote:
> If we want to get rid of the concept of constructors, it's doable,
> but somebody needs to do the work to show what the effects will be.
How do you envision dealing with the SLAB_TYPESAFE_BY_RCU slab caches?
Those must have a defined state of the objects at all times and a constructor is
required for that. And their use of RCU is required for numerous lockless
lookup algorithms in the kernhel.
On Tue, Apr 10, 2018 at 12:30:23PM -0500, Christopher Lameter wrote:
> On Tue, 10 Apr 2018, Matthew Wilcox wrote:
>
> > If we want to get rid of the concept of constructors, it's doable,
> > but somebody needs to do the work to show what the effects will be.
>
> How do you envision dealing with the SLAB_TYPESAFE_BY_RCU slab caches?
> Those must have a defined state of the objects at all times and a constructor is
> required for that. And their use of RCU is required for numerous lockless
> lookup algorithms in the kernhel.
Not at all times. Only once they've been used. Re-constructing them
once they've been used might break the rcu typesafety, I suppose ...
would need to examine the callers.
On Tue, 10 Apr 2018, Matthew Wilcox wrote:
> > How do you envision dealing with the SLAB_TYPESAFE_BY_RCU slab caches?
> > Those must have a defined state of the objects at all times and a constructor is
> > required for that. And their use of RCU is required for numerous lockless
> > lookup algorithms in the kernhel.
>
> Not at all times. Only once they've been used. Re-constructing them
> once they've been used might break the rcu typesafety, I suppose ...
> would need to examine the callers.
Objects can be freed and reused and still be accessed from code that
thinks the object is the old and not the new object....
On Tue, Apr 10, 2018 at 12:45:56PM -0500, Christopher Lameter wrote:
> On Tue, 10 Apr 2018, Matthew Wilcox wrote:
>
> > > How do you envision dealing with the SLAB_TYPESAFE_BY_RCU slab caches?
> > > Those must have a defined state of the objects at all times and a constructor is
> > > required for that. And their use of RCU is required for numerous lockless
> > > lookup algorithms in the kernhel.
> >
> > Not at all times. Only once they've been used. Re-constructing them
> > once they've been used might break the rcu typesafety, I suppose ...
> > would need to examine the callers.
>
> Objects can be freed and reused and still be accessed from code that
> thinks the object is the old and not the new object....
Yes, I know, that's the point of RCU typesafety. My point is that an
object *which has never been used* can't be accessed. So you don't *need*
a constructor.
On Tue, 10 Apr 2018, Matthew Wilcox wrote:
> > Objects can be freed and reused and still be accessed from code that
> > thinks the object is the old and not the new object....
>
> Yes, I know, that's the point of RCU typesafety. My point is that an
> object *which has never been used* can't be accessed. So you don't *need*
> a constructor.
But the object needs to have the proper contents after it was released and
re-allocated. Some objects may rely on contents (like list heads)
surviving the realloc process because access must always be possible.
validate_slab() checks on proper metadata content in a slab
although it does not access the payload. So that may work you separate
the payload init from the metadata init.