2014-01-17 08:47:18

by Marek Szyprowski

[permalink] [raw]
Subject: [PATCH] staging: lustre: fix GFP_ATOMIC macro usage

GFP_ATOMIC is not a single gfp flag, but a macro which expands to the other
flags and LACK of __GFP_WAIT flag. To check if caller wanted to perform an
atomic allocation, the code must test __GFP_WAIT flag presence.

Signed-off-by: Marek Szyprowski <[email protected]>
---
.../lustre/include/linux/libcfs/libcfs_private.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h
index d0d942c..dddccca1 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h
@@ -120,7 +120,7 @@ do { \
do { \
LASSERT(!in_interrupt() || \
((size) <= LIBCFS_VMALLOC_SIZE && \
- ((mask) & GFP_ATOMIC)) != 0); \
+ ((mask) & __GFP_WAIT) == 0)); \
} while (0)

#define LIBCFS_ALLOC_POST(ptr, size) \
--
1.7.9.5


2014-01-17 14:33:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] staging: lustre: fix GFP_ATOMIC macro usage

On Fri, Jan 17, 2014 at 09:46:56AM +0100, Marek Szyprowski wrote:
> GFP_ATOMIC is not a single gfp flag, but a macro which expands to the other
> flags and LACK of __GFP_WAIT flag. To check if caller wanted to perform an
> atomic allocation, the code must test __GFP_WAIT flag presence.
>
> Signed-off-by: Marek Szyprowski <[email protected]>
> ---
> .../lustre/include/linux/libcfs/libcfs_private.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h
> index d0d942c..dddccca1 100644
> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h
> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h
> @@ -120,7 +120,7 @@ do { \
> do { \
> LASSERT(!in_interrupt() || \
> ((size) <= LIBCFS_VMALLOC_SIZE && \
> - ((mask) & GFP_ATOMIC)) != 0); \
> + ((mask) & __GFP_WAIT) == 0)); \
> } while (0)

What a horrible assert, can't we just remove this entirely?
in_interrupt() usually should never be checked, if so, the code is doing
something wrong. And __GFP flags shouldn't be used on their own.

thanks,

greg k-h

2014-01-17 14:51:43

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH] staging: lustre: fix GFP_ATOMIC macro usage

We will want to get rid of lustre's custom allocator before this gets
out of staging.

But one feature that the lustre allocator has which is pretty neat is
that it lets you debug how much memory the filesystem is using. Is
there a standard way to find this information?

regards,
dan carpenter

2014-01-17 15:17:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] staging: lustre: fix GFP_ATOMIC macro usage

On Fri, Jan 17, 2014 at 05:51:28PM +0300, Dan Carpenter wrote:
> We will want to get rid of lustre's custom allocator before this gets
> out of staging.
>
> But one feature that the lustre allocator has which is pretty neat is
> that it lets you debug how much memory the filesystem is using. Is
> there a standard way to find this information?

Create your own mempool/slab/whatever_it's_called and look in the
debugfs or proc files for the allocator usage, depending on the memory
allocator the kernel is using.

That's how the rest of the kernel does it, no reason lustre should be
any different.

thanks,

greg k-h

2014-01-20 08:18:16

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH] staging: lustre: fix GFP_ATOMIC macro usage

Hello,

On 2014-01-17 15:33, Greg Kroah-Hartman wrote:
> On Fri, Jan 17, 2014 at 09:46:56AM +0100, Marek Szyprowski wrote:
> > GFP_ATOMIC is not a single gfp flag, but a macro which expands to the other
> > flags and LACK of __GFP_WAIT flag. To check if caller wanted to perform an
> > atomic allocation, the code must test __GFP_WAIT flag presence.
> >
> > Signed-off-by: Marek Szyprowski <[email protected]>
> > ---
> > .../lustre/include/linux/libcfs/libcfs_private.h | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h
> > index d0d942c..dddccca1 100644
> > --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h
> > +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_private.h
> > @@ -120,7 +120,7 @@ do { \
> > do { \
> > LASSERT(!in_interrupt() || \
> > ((size) <= LIBCFS_VMALLOC_SIZE && \
> > - ((mask) & GFP_ATOMIC)) != 0); \
> > + ((mask) & __GFP_WAIT) == 0)); \
> > } while (0)
>
> What a horrible assert, can't we just remove this entirely?
> in_interrupt() usually should never be checked, if so, the code is doing
> something wrong. And __GFP flags shouldn't be used on their own.

Well, I've prepared this patch when I was checking kernel sources for
incorrect
GFP_ATOMIC usage. I agree that drivers should use generic memory allocation
methods instead of inventing their own stuff. Feel free to ignore my
patch in
favor of removing this custom allocator at all.

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2014-01-21 20:02:39

by Dilger, Andreas

[permalink] [raw]
Subject: Re: [PATCH] staging: lustre: fix GFP_ATOMIC macro usage

On 2014/01/17, 8:17 AM, "Greg Kroah-Hartman" <[email protected]>
wrote:

>On Fri, Jan 17, 2014 at 05:51:28PM +0300, Dan Carpenter wrote:
>> We will want to get rid of lustre's custom allocator before this gets
>> out of staging.
>>
>> But one feature that the lustre allocator has which is pretty neat is
>> that it lets you debug how much memory the filesystem is using. Is
>> there a standard way to find this information?
>
>Create your own mempool/slab/whatever_it's_called and look in the
>debugfs or proc files for the allocator usage, depending on the memory
>allocator the kernel is using.
>
>That's how the rest of the kernel does it, no reason lustre should be
>any different.

The Lustre allocation macros track the memory usage across the whole
filesystem,
not just of a single structure that a mempool/slab/whatever would do.
This is
useful to know for debugging purposes (e.g. user complains about not having
enough RAM for their highly-tuned application, or to check for leaks at
unmount).

It can also log the alloc/free calls and post-process them to find leaks
easily, or find pieces code that is allocating too much memory that are not
using dedicated slabs. This also works if you encounter a system with a
lot
of allocated memory, enable "free" logging, and then unmount the
filesystem.
The logs will show which structures are being freed (assuming they are not
leaked completely) and point you to whatever is not being shrunk properly.

I don't know if there is any way to track this with regular kmalloc(), and
creating separate slabs for so ever data structure would be ugly. The
generic
/proc/meminfo data doesn't really tell you what is using all the memory,
and
the size-NNNN slabs give some information, but are used all over the
kernel.

I'm pretty much resigned to losing all of this functionality, but it
definitely
has been very useful for finding problems.

Cheers, Andreas
--
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division

2014-01-21 20:16:06

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH] staging: lustre: fix GFP_ATOMIC macro usage

Greg meant mempools not slab. You should look at it. It does what you
need for debugging and solves a couple other problems as well.

We have a leak checker in the kernel but most people disable it. I
forget the config name. There are a bunch of useful debug configs.

regards,
dan carpenter

2014-01-21 21:15:34

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] staging: lustre: fix GFP_ATOMIC macro usage

On 01/21/2014 12:02 PM, Dilger, Andreas wrote:
> The Lustre allocation macros track the memory usage across the whole
> filesystem,
> not just of a single structure that a mempool/slab/whatever would do.
> This is
> useful to know for debugging purposes (e.g. user complains about not having
> enough RAM for their highly-tuned application, or to check for leaks at
> unmount).

Urg, it does this with a global variable. If we did this kind of thing
generically, we'd get eaten alive by all of the cacheline bouncing from
that atomic. It's also a 32-bit atomic. Guess it's never had more than
4GB of memory tracked in there. :)

This also doesn't track overhead from things that *are* in slabs like
the inodes, or the 14 kmem_caches that lustre has, so it's far from a
complete picture of how lustre is using memory.

> It can also log the alloc/free calls and post-process them to find leaks
> easily, or find pieces code that is allocating too much memory that are not
> using dedicated slabs. This also works if you encounter a system with a
> lot of allocated memory, enable "free" logging, and then unmount the
> filesystem.
> The logs will show which structures are being freed (assuming they are not
> leaked completely) and point you to whatever is not being shrunk properly.

This isn't perfect, but it does cover most of the ext4 call sites in my
kernel. It would work better for a module, I'd imagine:

cd /sys/kernel/debug/tracing/events/kmem
echo -n 'call_site < 0xffffffff81e0af00 && call_site >=
0xffffffff81229cf0' > kmalloc/filter
echo 1 > kmalloc/enable
cat /sys/kernel/debug/tracing/trace_pipe

It will essentially log all the kmalloc() calls from the ext4 code. I
got the call site locations from grepping System.map. It would be
_really_ nice if we were able to do something like:

echo 'call_site =~ *ext4*'

but there's no way to do that which I know of. You could probably rig
something up with the function graph tracer by triggering only on entry
to the lustre code.

> I don't know if there is any way to track this with regular kmalloc(), and
> creating separate slabs for so ever data structure would be ugly. The
> generic
> /proc/meminfo data doesn't really tell you what is using all the memory,
> and
> the size-NNNN slabs give some information, but are used all over the
> kernel.
>
> I'm pretty much resigned to losing all of this functionality, but it
> definitely
> has been very useful for finding problems.

Yeah, it is hard to find out who is responsible for leaking pages or
kmalloc()s, especially after the fact. But, we seem to limp along just
fine. If lustre is that bad of a memory leaker that it *NEEDS* this
feature, we have bigger problems on our hands. :)


2014-01-22 01:31:39

by Oleg Drokin

[permalink] [raw]
Subject: Re: [PATCH] staging: lustre: fix GFP_ATOMIC macro usage

Hello!

On Jan 21, 2014, at 3:16 PM, Dan Carpenter wrote:

> We have a leak checker in the kernel but most people disable it. I
> forget the config name. There are a bunch of useful debug configs.

I actually use it at times too and it's useful (e.g. it works even if you did not wrap the allocation in the lustre macro, and this did happen before),
but it comes with it's own problems.
E.g. it gobbles on memory like there's no tomorrow (at least in my case), it has horrible false failure ratio with zfs in place (and probably
that's why it uses even more memory then), it has otherwise quite significant failure ratio too.
But yes, it is useful and I am glad it's there. It does not solve some other usecases outlined by Andreas, though.

Bye,
Oleg-

2014-01-22 01:52:12

by Oleg Drokin

[permalink] [raw]
Subject: Re: [PATCH] staging: lustre: fix GFP_ATOMIC macro usage

Hello!

On Jan 21, 2014, at 4:15 PM, Dave Hansen wrote:

> On 01/21/2014 12:02 PM, Dilger, Andreas wrote:
>> The Lustre allocation macros track the memory usage across the whole
>> filesystem,
>> not just of a single structure that a mempool/slab/whatever would do.
>> This is
>> useful to know for debugging purposes (e.g. user complains about not having
>> enough RAM for their highly-tuned application, or to check for leaks at
>> unmount).
> Urg, it does this with a global variable. If we did this kind of thing
> generically, we'd get eaten alive by all of the cacheline bouncing from
> that atomic. It's also a 32-bit atomic. Guess it's never had more than
> 4GB of memory tracked in there. :)

No, hopefully we'll never get to be this memory hungry on a single node.
Good point about the cacheline, I guess.

> This also doesn't track overhead from things that *are* in slabs like
> the inodes, or the 14 kmem_caches that lustre has, so it's far from a
> complete picture of how lustre is using memory.

The inodes are per filesystem, so we do have complete picture there.
dentries and some other structures are shared, but e.g. on a client lustre is
frequently the only active FS in use and as such it's easy.

>> It can also log the alloc/free calls and post-process them to find leaks
>> easily, or find pieces code that is allocating too much memory that are not
>> using dedicated slabs. This also works if you encounter a system with a
>> lot of allocated memory, enable "free" logging, and then unmount the
>> filesystem.
>> The logs will show which structures are being freed (assuming they are not
>> leaked completely) and point you to whatever is not being shrunk properly.
>
> This isn't perfect, but it does cover most of the ext4 call sites in my
> kernel. It would work better for a module, I'd imagine:
>
> cd /sys/kernel/debug/tracing/events/kmem
> echo -n 'call_site < 0xffffffff81e0af00 && call_site >=
> 0xffffffff81229cf0' > kmalloc/filter
> echo 1 > kmalloc/enable
> cat /sys/kernel/debug/tracing/trace_pipe

That's a neat trick.

> It will essentially log all the kmalloc() calls from the ext4 code. I
> got the call site locations from grepping System.map. It would be
> _really_ nice if we were able to do something like:
>
> echo 'call_site =~ *ext4*'

So basically module address plus len from /proc/modules should do for modular features? Should be easy to script.

Is there any neat way to enable this after module is loaded, but before it gets any control, so that there's
a full track of all allocations and deallocations (obviously that's only going to be used for debugging).

> Yeah, it is hard to find out who is responsible for leaking pages or
> kmalloc()s, especially after the fact. But, we seem to limp along just
> fine. If lustre is that bad of a memory leaker that it *NEEDS* this
> feature, we have bigger problems on our hands. :)

It's one of those things that you don't need often, but that does come very handy once the need arises ;)
It's also nice that it warns you on module unload that "hey, you left this much stuff behind, bad boy!".
But we can live without it, that's true too.

Bye,
Oleg