Hello,
I've been chasing an OOM-death problem on 2.6.17 that showed up while
running a J2EE application on my recently-built Gentoo box. The crash
was ugly - leaking huge numbers of skbuff_head_cache and size-2048 slab
entries until my java processes died and the system became unusable
and unresponsive.
My environment is:
Gentoo kernel build 2.6.17-gentoo-r8, built with GCC 4.1.1.
I tried Catalin Marinas' kmemleak patches, and had to rebuild with
GCC 3.4.6 because of a 4.1.1 compiler bug that prevents compilation
of the patches.
And... building with 3.4.5 fixed the leak! So I guess I have very little
detail to report - except that there's a nasty leak in 2.6.17 when built
with 4.1.1.
If anyone has a version of kmemleak that I can build with 4.1.1, or
any other suggestions for instrumentation, I'd be happy to gather more
data - the problem is very easy for me to reproduce.
Nathan Meyers
[email protected]
On 10/13/06, [email protected] <[email protected]> wrote:
> If anyone has a version of kmemleak that I can build with 4.1.1, or
> any other suggestions for instrumentation, I'd be happy to gather more
> data - the problem is very easy for me to reproduce.
You should cc Catalin for that. Alternatively, you could try
CONFIG_DEBUG_SLAB_LEAK.
On Thu, 2006-10-12 at 20:49 -0400, [email protected] wrote:
> I tried Catalin Marinas' kmemleak patches, and had to rebuild with
> GCC 3.4.6 because of a 4.1.1 compiler bug that prevents compilation
> of the patches.
Yeah, seems any remotely recent gcc hates it. That puts a rather large
dent in usability.
> And... building with 3.4.5 fixed the leak! So I guess I have very little
> detail to report - except that there's a nasty leak in 2.6.17 when built
> with 4.1.1.
If you build using 3.4.5 _without_ the kmemleak patches, do you see the
leak again? (ie is kmemleak altering timing, or is kernel miscompiled)
> If anyone has a version of kmemleak that I can build with 4.1.1, or
> any other suggestions for instrumentation, I'd be happy to gather more
> data - the problem is very easy for me to reproduce.
I can only suggest trying latest/greatest to see if the issue is still
present, and if so, try to find a way that others may trigger it.
-Mike
On Fri, Oct 13, 2006 at 08:25:12AM +0000, Mike Galbraith wrote:
> On Thu, 2006-10-12 at 20:49 -0400, [email protected] wrote:
>
> > I tried Catalin Marinas' kmemleak patches, and had to rebuild with
> > GCC 3.4.6 because of a 4.1.1 compiler bug that prevents compilation
> > of the patches.
>
> Yeah, seems any remotely recent gcc hates it. That puts a rather large
> dent in usability.
>
> > And... building with 3.4.5 fixed the leak! So I guess I have very little
> > detail to report - except that there's a nasty leak in 2.6.17 when built
> > with 4.1.1.
>
> If you build using 3.4.5 _without_ the kmemleak patches, do you see the
> leak again? (ie is kmemleak altering timing, or is kernel miscompiled)
I wondered the same thing. I went back to the original source and .config
- rebuilding with 3.4.6 (3.4.5 is a typo) fixed the leak.
>
> > If anyone has a version of kmemleak that I can build with 4.1.1, or
> > any other suggestions for instrumentation, I'd be happy to gather more
> > data - the problem is very easy for me to reproduce.
>
> I can only suggest trying latest/greatest to see if the issue is still
> present, and if so, try to find a way that others may trigger it.
I may just do that - apparently 4.1.2 is supposed to fix the kmemleak
compile problem. My (admittedly lazy) inclination is to wait until that
comes out in a Gentoo ebuild.
Nathan
>
> -Mike
>
>
On 13/10/06, Pekka Enberg <[email protected]> wrote:
> On 10/13/06, [email protected] <[email protected]> wrote:
> > If anyone has a version of kmemleak that I can build with 4.1.1, or
> > any other suggestions for instrumentation, I'd be happy to gather more
> > data - the problem is very easy for me to reproduce.
>
> You should cc Catalin for that. Alternatively, you could try
> CONFIG_DEBUG_SLAB_LEAK.
Thanks for cc'ing me (I'm still on holiday and not following the
mailing list). The problem is the __builtin_constant_p gcc function
which doesn't work properly with 4.x versions. It was fixed in latest
gcc versions though. Kmemleak relies on __builtin_constant_p to
determine the pointer aliases and without it you would get plenty of
false positives.
--
Catalin
On Fri, 2006-10-13 at 06:55 -0400, [email protected] wrote:
> On Fri, Oct 13, 2006 at 08:25:12AM +0000, Mike Galbraith wrote:
> > On Thu, 2006-10-12 at 20:49 -0400, [email protected] wrote:
> >
> > > I tried Catalin Marinas' kmemleak patches, and had to rebuild with
> > > GCC 3.4.6 because of a 4.1.1 compiler bug that prevents compilation
> > > of the patches.
> >
> > Yeah, seems any remotely recent gcc hates it. That puts a rather large
> > dent in usability.
> >
> > > And... building with 3.4.5 fixed the leak! So I guess I have very little
> > > detail to report - except that there's a nasty leak in 2.6.17 when built
> > > with 4.1.1.
> >
> > If you build using 3.4.5 _without_ the kmemleak patches, do you see the
> > leak again? (ie is kmemleak altering timing, or is kernel miscompiled)
>
> I wondered the same thing. I went back to the original source and .config
> - rebuilding with 3.4.6 (3.4.5 is a typo) fixed the leak.
Hmm. That leaves us with a 4.1.1 miss-compile maybe.
> > > If anyone has a version of kmemleak that I can build with 4.1.1, or
> > > any other suggestions for instrumentation, I'd be happy to gather more
> > > data - the problem is very easy for me to reproduce.
> >
> > I can only suggest trying latest/greatest to see if the issue is still
> > present, and if so, try to find a way that others may trigger it.
>
> I may just do that - apparently 4.1.2 is supposed to fix the kmemleak
> compile problem. My (admittedly lazy) inclination is to wait until that
> comes out in a Gentoo ebuild.
I think some re-evaluation is needed.
(fwiw, I tried a pre-release 4.1.2 compiler, and it still choked... I
didn't even look, so salt to taste)
-Mike
On Fri, 2006-10-13 at 12:59 +0100, Catalin Marinas wrote:
> On 13/10/06, Pekka Enberg <[email protected]> wrote:
> > On 10/13/06, [email protected] <[email protected]> wrote:
> > > If anyone has a version of kmemleak that I can build with 4.1.1, or
> > > any other suggestions for instrumentation, I'd be happy to gather more
> > > data - the problem is very easy for me to reproduce.
> >
> > You should cc Catalin for that. Alternatively, you could try
> > CONFIG_DEBUG_SLAB_LEAK.
>
> Thanks for cc'ing me (I'm still on holiday and not following the
> mailing list). The problem is the __builtin_constant_p gcc function
> which doesn't work properly with 4.x versions. It was fixed in latest
> gcc versions though. Kmemleak relies on __builtin_constant_p to
> determine the pointer aliases and without it you would get plenty of
> false positives.
SuSE (for one?) doesn't appear to know about it. gcc version 4.1.2
20060920 (month old prerelease) still has the problem. After some
rummaging around, I found the fix (attached in case someone else wants
to try it).
2.6.19-rc1 + patch-2.6.19-rc1-kmemleak-0.11 compiles fine now (unless
CONFIG_DEBUG_KEEP_INIT is set), boots and runs too.. but axle grease
runs a lot faster ;-) I'll try a stripped down config sometime.
-Mike
On Sun, Oct 15, 2006 at 07:59:14AM +0000, Mike Galbraith wrote:
> On Fri, 2006-10-13 at 12:59 +0100, Catalin Marinas wrote:
> > On 13/10/06, Pekka Enberg <[email protected]> wrote:
> > > On 10/13/06, [email protected] <[email protected]> wrote:
> > > > If anyone has a version of kmemleak that I can build with 4.1.1, or
> > > > any other suggestions for instrumentation, I'd be happy to gather more
> > > > data - the problem is very easy for me to reproduce.
>
> 2.6.19-rc1 + patch-2.6.19-rc1-kmemleak-0.11 compiles fine now (unless
> CONFIG_DEBUG_KEEP_INIT is set), boots and runs too.. but axle grease
> runs a lot faster ;-) I'll try a stripped down config sometime.
>
> -Mike
Thanks for digging that up - I'm building gcc now and will let you
know if any useful info emerges.
Nathan
On Sun, 2006-10-15 at 10:14 -0400, [email protected] wrote:
> On Sun, Oct 15, 2006 at 07:59:14AM +0000, Mike Galbraith wrote:
> > On Fri, 2006-10-13 at 12:59 +0100, Catalin Marinas wrote:
> > > On 13/10/06, Pekka Enberg <[email protected]> wrote:
> > > > On 10/13/06, [email protected] <[email protected]> wrote:
> > > > > If anyone has a version of kmemleak that I can build with 4.1.1, or
> > > > > any other suggestions for instrumentation, I'd be happy to gather more
> > > > > data - the problem is very easy for me to reproduce.
> >
> > 2.6.19-rc1 + patch-2.6.19-rc1-kmemleak-0.11 compiles fine now (unless
> > CONFIG_DEBUG_KEEP_INIT is set), boots and runs too.. but axle grease
> > runs a lot faster ;-) I'll try a stripped down config sometime.
> >
> > -Mike
>
> Thanks for digging that up - I'm building gcc now and will let you
> know if any useful info emerges.
Buyer beware of course ;-)
-Mike
On Sun, 2006-10-15 at 07:59 +0000, Mike Galbraith wrote:
> 2.6.19-rc1 + patch-2.6.19-rc1-kmemleak-0.11 compiles fine now (unless
> CONFIG_DEBUG_KEEP_INIT is set), boots and runs too.. but axle grease
> runs a lot faster ;-) I'll try a stripped down config sometime.
My roughly three orders of magnitude (amusing to watch:) boot slowdown
turned out to be stack unwinding. With CONFIG_UNWIND_INFO disabled,
2.6.19-rc2 + patch-2.6.19-rc1-kmemleak-0.11 runs just fine.
-Mike
On 16/10/06, Mike Galbraith <[email protected]> wrote:
> On Sun, 2006-10-15 at 07:59 +0000, Mike Galbraith wrote:
>
> > 2.6.19-rc1 + patch-2.6.19-rc1-kmemleak-0.11 compiles fine now (unless
> > CONFIG_DEBUG_KEEP_INIT is set), boots and runs too.. but axle grease
> > runs a lot faster ;-) I'll try a stripped down config sometime.
>
> My roughly three orders of magnitude (amusing to watch:) boot slowdown
> turned out to be stack unwinding. With CONFIG_UNWIND_INFO disabled,
> 2.6.19-rc2 + patch-2.6.19-rc1-kmemleak-0.11 runs just fine.
Kmemleak introduces some overhead but shouldn't be that bad.
DEBUG_SLAB also introduces an overhead by erasing the data in the
allocated blocks.
Note that if the allocated blocks are added to a list and never
removed, kmemleak won't be able to detect the leak as the objects are
stilled referred. In this case, you can only use DEBUG_SLAB_LEAK.
--
Catalin
On Mon, 2006-10-16 at 09:07 +0100, Catalin Marinas wrote:
> On 16/10/06, Mike Galbraith <[email protected]> wrote:
> > On Sun, 2006-10-15 at 07:59 +0000, Mike Galbraith wrote:
> >
> > > 2.6.19-rc1 + patch-2.6.19-rc1-kmemleak-0.11 compiles fine now (unless
> > > CONFIG_DEBUG_KEEP_INIT is set), boots and runs too.. but axle grease
> > > runs a lot faster ;-) I'll try a stripped down config sometime.
> >
> > My roughly three orders of magnitude (amusing to watch:) boot slowdown
> > turned out to be stack unwinding. With CONFIG_UNWIND_INFO disabled,
> > 2.6.19-rc2 + patch-2.6.19-rc1-kmemleak-0.11 runs just fine.
>
> Kmemleak introduces some overhead but shouldn't be that bad.
> DEBUG_SLAB also introduces an overhead by erasing the data in the
> allocated blocks.
2.6.18 with your rc6 patch booted normally with stack unwind enabled.
-Mike
On 16/10/06, Mike Galbraith <[email protected]> wrote:
> On Mon, 2006-10-16 at 09:07 +0100, Catalin Marinas wrote:
> > Kmemleak introduces some overhead but shouldn't be that bad.
> > DEBUG_SLAB also introduces an overhead by erasing the data in the
> > allocated blocks.
>
> 2.6.18 with your rc6 patch booted normally with stack unwind enabled.
The only difference is that kmemleak now uses save_stack_trace() to
generate the call chain. In the previous versions I implemented a
simple stack backtrace myself, with the disadvantage that it only
worked on ARM and x86.
I think kmemleak should use the common stack trace API and investigate
why it is slower (either save_stack_trace is slower with stack unwind
enabled or kmemleak doesn't use these functions properly).
--
Catalin
On Mon, 2006-10-16 at 09:44 +0100, Catalin Marinas wrote:
> On 16/10/06, Mike Galbraith <[email protected]> wrote:
> > On Mon, 2006-10-16 at 09:07 +0100, Catalin Marinas wrote:
> > > Kmemleak introduces some overhead but shouldn't be that bad.
> > > DEBUG_SLAB also introduces an overhead by erasing the data in the
> > > allocated blocks.
> >
> > 2.6.18 with your rc6 patch booted normally with stack unwind enabled.
>
> The only difference is that kmemleak now uses save_stack_trace() to
> generate the call chain. In the previous versions I implemented a
> simple stack backtrace myself, with the disadvantage that it only
> worked on ARM and x86.
>
> I think kmemleak should use the common stack trace API and investigate
> why it is slower (either save_stack_trace is slower with stack unwind
> enabled or kmemleak doesn't use these functions properly).
The stack traces look fine without unwind, and at a glance looked fine
with unwind as well, so I speculate you must be using save_stack_trace
properly. The only difference I noticed was the incredible speed
difference. I gave up on getting to run level 5 with unwind, getting to
level 2 took ages, and the box was horribly slow at everything.
-Mike
Mike Galbraith wrote:
> On Sun, 2006-10-15 at 10:14 -0400, [email protected] wrote:
>> On Sun, Oct 15, 2006 at 07:59:14AM +0000, Mike Galbraith wrote:
>>> On Fri, 2006-10-13 at 12:59 +0100, Catalin Marinas wrote:
>>>> On 13/10/06, Pekka Enberg <[email protected]> wrote:
>>>>> On 10/13/06, [email protected] <[email protected]> wrote:
>>>>>> If anyone has a version of kmemleak that I can build with 4.1.1, or
>>>>>> any other suggestions for instrumentation, I'd be happy to gather more
>>>>>> data - the problem is very easy for me to reproduce.
>>> 2.6.19-rc1 + patch-2.6.19-rc1-kmemleak-0.11 compiles fine now (unless
>>> CONFIG_DEBUG_KEEP_INIT is set), boots and runs too.. but axle grease
>>> runs a lot faster ;-) I'll try a stripped down config sometime.
>>>
>>> -Mike
>> Thanks for digging that up - I'm building gcc now and will let you
>> know if any useful info emerges.
>
> Buyer beware of course ;-)
>
> -Mike
>
>
So, after all this, what I have to report is: Nothing. Building the same
kernel with which I saw the problem (Gentoo's 2.6.17-r8 ebuild) with the
patched gcc 4.1.1 and the kmemleak patches failed to reproduce the
problem. Either those changes perturbed the kernel enough to "fix" the
problem, or my earlier kernel build was some sort of unrepeatable
miscompile.
I noticed one oddness with the 2.6.17 kmemleak patches when built with
the patched gcc. When I had earlier built with gcc-3.4.6
(CONFIG_DEBUG_MEMLEAK_TRACE_LENGTH=4 and CONFIG_FRAME_POINTER=y),
kmemleak reported good information: every entry included four levels of
stack that clearly mapped to addresses described in System.map. That was
not the case when I built with the patched 4.1.1: every entry included
just one level of stack, with an apparently bogus address that didn't
map into the range of addresses in System.map.
So, in the end, a frustrated experiment. I'll be back if I find anything
interesting. Until then, I'm leaving the list, so please include my
address in any followup conversation. Thanks!
Nathan Meyers
[email protected]