2012-05-02 04:28:38

by Minchan Kim

[permalink] [raw]
Subject: [PATCH] vmalloc: add warning in __vmalloc

Now there are several places to use __vmalloc with GFP_ATOMIC,
GFP_NOIO, GFP_NOFS but unfortunately __vmalloc calls map_vm_area
which calls alloc_pages with GFP_KERNEL to allocate page tables.
It means it's possible to happen deadlock.
I don't know why it doesn't have reported until now.

Firstly, I tried passing gfp_t to lower functions to support __vmalloc
with such flags but other mm guys don't want and decided that
all of caller should be fixed.

http://marc.info/?l=linux-kernel&m=133517143616544&w=2

To begin with, let's listen other's opinion whether they can fix it
by other approach without calling __vmalloc with such flags.

So this patch adds warning in __vmalloc_node_range to detect it and
to be fixed hopely. __vmalloc_node_range isn't random chocie because
all caller which has gfp_mask of map_vm_area use it through __vmalloc_area_node.
And __vmalloc_area_node is current static function and is called by only
__vmalloc_node_range. So warning in __vmalloc_node_range would cover all
vmalloc functions which have gfp_t argument.

I Cced related maintainers.
If I miss someone, please Cced them.

* Changelog
* Replace WARN_ON with WARN_ON_ONCE - by Andrew Morton, Nick Piggin.

Cc: Neil Brown <[email protected]>
Cc: Artem Bityutskiy <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Steven Whitehouse <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: James Morris <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Sage Weil <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
---
mm/vmalloc.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index c28b0b9..def5943 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1648,6 +1648,10 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
void *addr;
unsigned long real_size = size;

+ WARN_ON_ONCE(!(gfp_mask & __GFP_WAIT) ||
+ !(gfp_mask & __GFP_IO) ||
+ !(gfp_mask & __GFP_FS));
+
size = PAGE_ALIGN(size);
if (!size || (size >> PAGE_SHIFT) > totalram_pages)
goto fail;
--
1.7.9.5


2012-05-02 19:46:14

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] vmalloc: add warning in __vmalloc

On Wed, 2 May 2012 13:28:09 +0900
Minchan Kim <[email protected]> wrote:

> Now there are several places to use __vmalloc with GFP_ATOMIC,
> GFP_NOIO, GFP_NOFS but unfortunately __vmalloc calls map_vm_area
> which calls alloc_pages with GFP_KERNEL to allocate page tables.
> It means it's possible to happen deadlock.
> I don't know why it doesn't have reported until now.
>
> Firstly, I tried passing gfp_t to lower functions to support __vmalloc
> with such flags but other mm guys don't want and decided that
> all of caller should be fixed.
>
> http://marc.info/?l=linux-kernel&m=133517143616544&w=2
>
> To begin with, let's listen other's opinion whether they can fix it
> by other approach without calling __vmalloc with such flags.
>
> So this patch adds warning in __vmalloc_node_range to detect it and
> to be fixed hopely. __vmalloc_node_range isn't random chocie because
> all caller which has gfp_mask of map_vm_area use it through __vmalloc_area_node.
> And __vmalloc_area_node is current static function and is called by only
> __vmalloc_node_range. So warning in __vmalloc_node_range would cover all
> vmalloc functions which have gfp_t argument.
>
> I Cced related maintainers.
> If I miss someone, please Cced them.
>
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1648,6 +1648,10 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> void *addr;
> unsigned long real_size = size;
>
> + WARN_ON_ONCE(!(gfp_mask & __GFP_WAIT) ||
> + !(gfp_mask & __GFP_IO) ||
> + !(gfp_mask & __GFP_FS));
> +
> size = PAGE_ALIGN(size);
> if (!size || (size >> PAGE_SHIFT) > totalram_pages)
> goto fail;

Well. What are we actually doing here? Causing the kernel to spew a
warning due to known-buggy callsites, so that users will report the
warnings, eventually goading maintainers into fixing their stuff.

This isn't very efficient :(

It would be better to fix that stuff first, then add the warning to
prevent reoccurrences. Yes, maintainers are very naughty and probably
do need cattle prods^W^W warnings to motivate them to fix stuff, but we
should first make an effort to get these things fixed without
irritating and alarming our users.

Where are these offending callsites?

2012-05-03 01:03:13

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH] vmalloc: add warning in __vmalloc

On 05/03/2012 04:46 AM, Andrew Morton wrote:

> On Wed, 2 May 2012 13:28:09 +0900
> Minchan Kim <[email protected]> wrote:
>
>> Now there are several places to use __vmalloc with GFP_ATOMIC,
>> GFP_NOIO, GFP_NOFS but unfortunately __vmalloc calls map_vm_area
>> which calls alloc_pages with GFP_KERNEL to allocate page tables.
>> It means it's possible to happen deadlock.
>> I don't know why it doesn't have reported until now.
>>
>> Firstly, I tried passing gfp_t to lower functions to support __vmalloc
>> with such flags but other mm guys don't want and decided that
>> all of caller should be fixed.
>>
>> http://marc.info/?l=linux-kernel&m=133517143616544&w=2
>>
>> To begin with, let's listen other's opinion whether they can fix it
>> by other approach without calling __vmalloc with such flags.
>>
>> So this patch adds warning in __vmalloc_node_range to detect it and
>> to be fixed hopely. __vmalloc_node_range isn't random chocie because
>> all caller which has gfp_mask of map_vm_area use it through __vmalloc_area_node.
>> And __vmalloc_area_node is current static function and is called by only
>> __vmalloc_node_range. So warning in __vmalloc_node_range would cover all
>> vmalloc functions which have gfp_t argument.
>>
>> I Cced related maintainers.
>> If I miss someone, please Cced them.
>>
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -1648,6 +1648,10 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>> void *addr;
>> unsigned long real_size = size;
>>
>> + WARN_ON_ONCE(!(gfp_mask & __GFP_WAIT) ||
>> + !(gfp_mask & __GFP_IO) ||
>> + !(gfp_mask & __GFP_FS));
>> +
>> size = PAGE_ALIGN(size);
>> if (!size || (size >> PAGE_SHIFT) > totalram_pages)
>> goto fail;
>
> Well. What are we actually doing here? Causing the kernel to spew a
> warning due to known-buggy callsites, so that users will report the
> warnings, eventually goading maintainers into fixing their stuff.
>
> This isn't very efficient :(


Yes. I hope maintainers fix it before merging this.

>
> It would be better to fix that stuff first, then add the warning to
> prevent reoccurrences. Yes, maintainers are very naughty and probably
> do need cattle prods^W^W warnings to motivate them to fix stuff, but we
> should first make an effort to get these things fixed without
> irritating and alarming our users.
>
> Where are these offending callsites?


dm:
__alloc_buffer_wait_no_callback

ubi:
ubi_dbg_check_write
ubi_dbg_check_all_ff

ext4 :
ext4_kvmalloc

gfs2 :
gfs2_alloc_sort_buffer

ntfs :
__ntfs_malloc

ubifs :
dbg_dump_leb
scan_check_cb
dump_lpt_leb
dbg_check_ltab_lnum
dbg_scan_orphans

mm :
alloc_large_system_hash

ceph :
fill_inode
ceph_setxattr
ceph_removexattr
ceph_x_build_authorizer
ceph_decode_buffer
ceph_alloc_middle



>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>



--
Kind regards,
Minchan Kim

2012-05-03 05:46:15

by Sage Weil

[permalink] [raw]
Subject: Re: [PATCH] vmalloc: add warning in __vmalloc

On Thu, 3 May 2012, Minchan Kim wrote:
> On 05/03/2012 04:46 AM, Andrew Morton wrote:
> > Well. What are we actually doing here? Causing the kernel to spew a
> > warning due to known-buggy callsites, so that users will report the
> > warnings, eventually goading maintainers into fixing their stuff.
> >
> > This isn't very efficient :(
>
>
> Yes. I hope maintainers fix it before merging this.
>
> >
> > It would be better to fix that stuff first, then add the warning to
> > prevent reoccurrences. Yes, maintainers are very naughty and probably
> > do need cattle prods^W^W warnings to motivate them to fix stuff, but we
> > should first make an effort to get these things fixed without
> > irritating and alarming our users.
> >
> > Where are these offending callsites?

Okay, maybe this is a stupid question, but: if an fs can't call vmalloc
with GFP_NOFS without risking deadlock, calling with GFP_KERNEL instead
doesn't fix anything (besides being more honest). This really means that
vmalloc is effectively off-limits for file systems in any
writeback-related path, right?

sage


>
>
> dm:
> __alloc_buffer_wait_no_callback
>
> ubi:
> ubi_dbg_check_write
> ubi_dbg_check_all_ff
>
> ext4 :
> ext4_kvmalloc
>
> gfs2 :
> gfs2_alloc_sort_buffer
>
> ntfs :
> __ntfs_malloc
>
> ubifs :
> dbg_dump_leb
> scan_check_cb
> dump_lpt_leb
> dbg_check_ltab_lnum
> dbg_scan_orphans
>
> mm :
> alloc_large_system_hash
>
> ceph :
> fill_inode
> ceph_setxattr
> ceph_removexattr
> ceph_x_build_authorizer
> ceph_decode_buffer
> ceph_alloc_middle
>
>
>
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to [email protected]. For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> > Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
> >
>
>
>
> --
> Kind regards,
> Minchan Kim
>
>

2012-05-03 05:55:58

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH] vmalloc: add warning in __vmalloc

On Wed, 2012-05-02 at 12:46 -0700, Andrew Morton wrote:
> Well. What are we actually doing here? Causing the kernel to spew a
> warning due to known-buggy callsites, so that users will report the
> warnings, eventually goading maintainers into fixing their stuff.
>
> This isn't very efficient :(

I'll look at UBIFS and UBI - they use vmalloc and probably some of them
may be in write-back paths.

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-05-03 06:30:58

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH] vmalloc: add warning in __vmalloc

On 3 May 2012 15:46, Sage Weil <[email protected]> wrote:
> On Thu, 3 May 2012, Minchan Kim wrote:
>> On 05/03/2012 04:46 AM, Andrew Morton wrote:
>> > Well.  What are we actually doing here?  Causing the kernel to spew a
>> > warning due to known-buggy callsites, so that users will report the
>> > warnings, eventually goading maintainers into fixing their stuff.
>> >
>> > This isn't very efficient :(
>>
>>
>> Yes. I hope maintainers fix it before merging this.
>>
>> >
>> > It would be better to fix that stuff first, then add the warning to
>> > prevent reoccurrences.  Yes, maintainers are very naughty and probably
>> > do need cattle prods^W^W warnings to motivate them to fix stuff, but we
>> > should first make an effort to get these things fixed without
>> > irritating and alarming our users.
>> >
>> > Where are these offending callsites?
>
> Okay, maybe this is a stupid question, but: if an fs can't call vmalloc
> with GFP_NOFS without risking deadlock, calling with GFP_KERNEL instead
> doesn't fix anything (besides being more honest).  This really means that
> vmalloc is effectively off-limits for file systems in any
> writeback-related path, right?

Anywhere it cannot reenter the filesystem, yes. GFP_NOFS is effectively
GFP_KERNEL when calling vmalloc.

Note that in writeback paths, a "good citizen" filesystem should not require
any allocations, or at least it should be able to tolerate allocation failures.
So fixing that would be a good idea anyway.

2012-05-03 07:10:10

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH] vmalloc: add warning in __vmalloc

On Thu, 2012-05-03 at 16:30 +1000, Nick Piggin wrote:
> Note that in writeback paths, a "good citizen" filesystem should not require
> any allocations, or at least it should be able to tolerate allocation failures.
> So fixing that would be a good idea anyway.

This is a good point, but UBIFS kmallocs(GFP_NOFS) when doing I/O
because it needs to compress/decompress. But I agree that if kmalloc
fails, we should have a fall-back reserve buffer protected by a mutex
for memory pressure situations.

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-05-03 07:14:18

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH] vmalloc: add warning in __vmalloc

On 3 May 2012 17:13, Artem Bityutskiy <[email protected]> wrote:
> On Thu, 2012-05-03 at 16:30 +1000, Nick Piggin wrote:
>> Note that in writeback paths, a "good citizen" filesystem should not require
>> any allocations, or at least it should be able to tolerate allocation failures.
>> So fixing that would be a good idea anyway.
>
> This is a good point, but UBIFS kmallocs(GFP_NOFS) when doing I/O
> because it needs to compress/decompress. But I agree that if kmalloc
> fails, we should have a fall-back reserve buffer protected by a mutex
> for memory pressure situations.

AKA, a mempool :)

2012-05-03 11:08:54

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [PATCH] vmalloc: add warning in __vmalloc

On Wed, 2012-05-02 at 12:46 -0700, Andrew Morton wrote:
> On Wed, 2 May 2012 13:28:09 +0900
> Minchan Kim <[email protected]> wrote:
>
> > Now there are several places to use __vmalloc with GFP_ATOMIC,
> > GFP_NOIO, GFP_NOFS but unfortunately __vmalloc calls map_vm_area
> > which calls alloc_pages with GFP_KERNEL to allocate page tables.
> > It means it's possible to happen deadlock.
> > I don't know why it doesn't have reported until now.
> >
> > Firstly, I tried passing gfp_t to lower functions to support __vmalloc
> > with such flags but other mm guys don't want and decided that
> > all of caller should be fixed.
> >
> > http://marc.info/?l=linux-kernel&m=133517143616544&w=2
> >
> > To begin with, let's listen other's opinion whether they can fix it
> > by other approach without calling __vmalloc with such flags.
> >
> > So this patch adds warning in __vmalloc_node_range to detect it and
> > to be fixed hopely. __vmalloc_node_range isn't random chocie because
> > all caller which has gfp_mask of map_vm_area use it through __vmalloc_area_node.
> > And __vmalloc_area_node is current static function and is called by only
> > __vmalloc_node_range. So warning in __vmalloc_node_range would cover all
> > vmalloc functions which have gfp_t argument.
> >
> > I Cced related maintainers.
> > If I miss someone, please Cced them.
> >
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -1648,6 +1648,10 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> > void *addr;
> > unsigned long real_size = size;
> >
> > + WARN_ON_ONCE(!(gfp_mask & __GFP_WAIT) ||
> > + !(gfp_mask & __GFP_IO) ||
> > + !(gfp_mask & __GFP_FS));
> > +
> > size = PAGE_ALIGN(size);
> > if (!size || (size >> PAGE_SHIFT) > totalram_pages)
> > goto fail;
>
> Well. What are we actually doing here? Causing the kernel to spew a
> warning due to known-buggy callsites, so that users will report the
> warnings, eventually goading maintainers into fixing their stuff.
>
> This isn't very efficient :(
>
> It would be better to fix that stuff first, then add the warning to
> prevent reoccurrences. Yes, maintainers are very naughty and probably
> do need cattle prods^W^W warnings to motivate them to fix stuff, but we
> should first make an effort to get these things fixed without
> irritating and alarming our users.
>
> Where are these offending callsites?

OK, I checked my part - both UBI and UBIFS call __vmalloc() with
GFP_NOFS in several places of the _debugging_ code, and this is why we
do not see any issues - the debugging code is used very rarely for
validating purposes. All the places look fixable, I'll fix them a bit
later.

WARN_ON_ONCE() looks like a good first step. An I think it is better if
maintainers fix their areas rather than if someone who does not know how
the subsystem works starts trying to do that.

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-05-03 13:50:33

by Steven Whitehouse

[permalink] [raw]
Subject: Re: [PATCH] vmalloc: add warning in __vmalloc

Hi,

On Thu, 2012-05-03 at 16:30 +1000, Nick Piggin wrote:
> On 3 May 2012 15:46, Sage Weil <[email protected]> wrote:
> > On Thu, 3 May 2012, Minchan Kim wrote:
> >> On 05/03/2012 04:46 AM, Andrew Morton wrote:
> >> > Well. What are we actually doing here? Causing the kernel to spew a
> >> > warning due to known-buggy callsites, so that users will report the
> >> > warnings, eventually goading maintainers into fixing their stuff.
> >> >
> >> > This isn't very efficient :(
> >>
> >>
> >> Yes. I hope maintainers fix it before merging this.
> >>
> >> >
> >> > It would be better to fix that stuff first, then add the warning to
> >> > prevent reoccurrences. Yes, maintainers are very naughty and probably
> >> > do need cattle prods^W^W warnings to motivate them to fix stuff, but we
> >> > should first make an effort to get these things fixed without
> >> > irritating and alarming our users.
> >> >
> >> > Where are these offending callsites?
> >
> > Okay, maybe this is a stupid question, but: if an fs can't call vmalloc
> > with GFP_NOFS without risking deadlock, calling with GFP_KERNEL instead
> > doesn't fix anything (besides being more honest). This really means that
> > vmalloc is effectively off-limits for file systems in any
> > writeback-related path, right?
>
> Anywhere it cannot reenter the filesystem, yes. GFP_NOFS is effectively
> GFP_KERNEL when calling vmalloc.
>
> Note that in writeback paths, a "good citizen" filesystem should not require
> any allocations, or at least it should be able to tolerate allocation failures.
> So fixing that would be a good idea anyway.

For cluster filesystems, there is an additional issue. When we allocate
memory with GFP_KERNEL we might land up pushing inodes out of cache,
which can also mean deallocating them. That process involves taking
cluster locks, and so it is not valid to do this while holding another
cluster lock (since the locks may be taken in random order).

In the GFS2 use case for vmalloc, this is being done if kmalloc fails
and also if the memory required is too large for kmalloc (very unlikely,
but possible with very large directories). Also, it is being done under
a cluster lock (shared mode).

I recently looked back at the thread which resulted in that particular
vmalloc call being left there:
http://www.redhat.com/archives/cluster-devel/2010-July/msg00021.html
http://www.redhat.com/archives/cluster-devel/2010-July/msg00022.html
http://www.redhat.com/archives/cluster-devel/2010-July/msg00023.html

which reminded me of the problem. So this might not be so easy to
resolve...

Steve.