2009-06-28 20:23:39

by Sitsofe Wheeler

[permalink] [raw]
Subject: Re: [TIP] BUG kmalloc-4096: Poison overwritten (ath5k_rx_skb_alloc)

On Tue, May 26, 2009 at 10:10:30PM +0100, Sitsofe Wheeler wrote:
> On Fri, May 22, 2009 at 10:39:31AM +0100, Sitsofe Wheeler wrote:
> > On Mon, May 18, 2009 at 11:05:40AM +0100, Sitsofe Wheeler wrote:
> > > On Fri, May 15, 2009 at 12:09:04AM -0400, Bob Copeland wrote:
> > > >
> > > > This is too ugly to live, but I'd like to know if you can reproduce
> > > > with this patch. If it still happens, then I guess it's back to
> >
> > The poison message has not reappeared but this morning there was a
>
> Yet again, I spoke too soon. This evening the following was in dmesg:
>
> I'm pretty certain this kernel had your previous patch (although the
> directory that held the kernel has since been cleaned away on the server
> on which it was compiled).

OK I'm still trying to follow this. I forwarded your patch to 2.6.31-rc1
but this time I have kmemleak enabled. I left it streaming radio for the
past few hours and the following appeared in the logs:

Jun 28 17:01:34 eeepc kernel: [ 744.083787] kmemleak: unreferenced object 0xf5845770 (size 64):
Jun 28 17:01:34 eeepc kernel: [ 744.083795] kmemleak: comm "swapper", pid 1, jiffies 4294673468
Jun 28 17:01:34 eeepc kernel: [ 744.083799] kmemleak: backtrace:
Jun 28 17:01:34 eeepc kernel: [ 744.083811] kmemleak: [<c01a749d>] kmemleak_alloc+0x11d/0x2a0
Jun 28 17:01:34 eeepc kernel: [ 744.083818] kmemleak: [<c01a4376>] __kmalloc+0x136/0x210
Jun 28 17:01:34 eeepc kernel: [ 744.083828] kmemleak: [<c043f923>] ieee80211_register_hw+0x83/0x4b0
Jun 28 17:01:34 eeepc kernel: [ 744.083837] kmemleak: [<c04630d5>] ath5k_pci_probe+0xee5/0x1100
Jun 28 17:01:34 eeepc kernel: [ 744.083849] kmemleak: [<c0279ab3>] local_pci_probe+0x13/0x20
Jun 28 17:01:34 eeepc kernel: [ 744.083856] kmemleak: [<c027a608>] pci_device_probe+0x68/0x90
Jun 28 17:01:34 eeepc kernel: [ 744.083864] kmemleak: [<c0312670>] driver_probe_device+0x70/0x140
Jun 28 17:01:34 eeepc kernel: [ 744.083872] kmemleak: [<c031293a>] __driver_attach+0x7a/0x80
Jun 28 17:01:34 eeepc kernel: [ 744.083881] kmemleak: [<c0311af9>] bus_for_each_dev+0x49/0x70
Jun 28 17:01:34 eeepc kernel: [ 744.083888] kmemleak: [<c031250e>] driver_attach+0x1e/0x20
Jun 28 17:01:34 eeepc kernel: [ 744.083896] kmemleak: [<c0312188>] bus_add_driver+0xd8/0x290
Jun 28 17:01:34 eeepc kernel: [ 744.083904] kmemleak: [<c0312a6f>] driver_register+0x5f/0x120
Jun 28 17:01:34 eeepc kernel: [ 744.083911] kmemleak: [<c027a443>] __pci_register_driver+0x53/0xc0
Jun 28 17:01:34 eeepc kernel: [ 744.083922] kmemleak: [<c061d82d>] init_ath5k_pci+0x1d/0x40
Jun 28 17:01:34 eeepc kernel: [ 744.083929] kmemleak: [<c0101122>] do_one_initcall+0x32/0x1d0
Jun 28 17:01:34 eeepc kernel: [ 744.083939] kmemleak: [<c05fe8aa>] kernel_init+0xba/0x110
Jun 28 17:01:34 eeepc kernel: [ 744.084074] kmemleak: unreferenced object 0xf5845b60 (size 64):
Jun 28 17:01:34 eeepc kernel: [ 744.084080] kmemleak: comm "swapper", pid 1, jiffies 4294683694
Jun 28 17:01:34 eeepc kernel: [ 744.084085] kmemleak: backtrace:
Jun 28 17:01:34 eeepc kernel: [ 744.084093] kmemleak: [<c01a749d>] kmemleak_alloc+0x11d/0x2a0
Jun 28 17:01:34 eeepc kernel: [ 744.084100] kmemleak: [<c01a4376>] __kmalloc+0x136/0x210
Jun 28 17:01:34 eeepc kernel: [ 744.084111] kmemleak: [<c0239880>] __crypto_alloc_tfm+0x40/0x170
Jun 28 17:01:34 eeepc kernel: [ 744.084120] kmemleak: [<c0239cbc>] crypto_alloc_base+0x3c/0x80
Jun 28 17:01:34 eeepc kernel: [ 744.084128] kmemleak: [<c04420ff>] ieee80211_wep_init+0x2f/0x80
Jun 28 17:01:34 eeepc kernel: [ 744.084136] kmemleak: [<c043fa9d>] ieee80211_register_hw+0x1fd/0x4b0
Jun 28 17:01:34 eeepc kernel: [ 744.084144] kmemleak: [<c04630d5>] ath5k_pci_probe+0xee5/0x1100
Jun 28 17:01:34 eeepc kernel: [ 744.084153] kmemleak: [<c0279ab3>] local_pci_probe+0x13/0x20
Jun 28 17:01:34 eeepc kernel: [ 744.084160] kmemleak: [<c027a608>] pci_device_probe+0x68/0x90
Jun 28 17:01:34 eeepc kernel: [ 744.084168] kmemleak: [<c0312670>] driver_probe_device+0x70/0x140
Jun 28 17:01:34 eeepc kernel: [ 744.084175] kmemleak: [<c031293a>] __driver_attach+0x7a/0x80
Jun 28 17:01:34 eeepc kernel: [ 744.084184] kmemleak: [<c0311af9>] bus_for_each_dev+0x49/0x70
Jun 28 17:01:34 eeepc kernel: [ 744.084191] kmemleak: [<c031250e>] driver_attach+0x1e/0x20
Jun 28 17:01:34 eeepc kernel: [ 744.084199] kmemleak: [<c0312188>] bus_add_driver+0xd8/0x290
Jun 28 17:01:34 eeepc kernel: [ 744.084207] kmemleak: [<c0312a6f>] driver_register+0x5f/0x120
Jun 28 17:01:34 eeepc kernel: [ 744.084214] kmemleak: [<c027a443>] __pci_register_driver+0x53/0xc0

There were quite a few leaks before this. I dunno if kmemleak is
spurious or not. As always, any ideas?

--
Sitsofe | http://sucs.org/~sits/


2009-07-14 02:25:47

by Bob Copeland

[permalink] [raw]
Subject: Re: [TIP] BUG kmalloc-4096: Poison overwritten (ath5k_rx_skb_alloc)

On Sun, Jun 28, 2009 at 09:23:29PM +0100, Sitsofe Wheeler wrote:
> OK I'm still trying to follow this. I forwarded your patch to 2.6.31-rc1
> but this time I have kmemleak enabled. I left it streaming radio for the
> past few hours and the following appeared in the logs:
>
> Jun 28 17:01:34 eeepc kernel: [ 744.083787] kmemleak: unreferenced object 0xf5845770 (size 64):
> Jun 28 17:01:34 eeepc kernel: [ 744.083795] kmemleak: comm "swapper", pid 1, jiffies 4294673468
> Jun 28 17:01:34 eeepc kernel: [ 744.083799] kmemleak: backtrace:
> Jun 28 17:01:34 eeepc kernel: [ 744.083811] kmemleak: [<c01a749d>] kmemleak_alloc+0x11d/0x2a0
> Jun 28 17:01:34 eeepc kernel: [ 744.083818] kmemleak: [<c01a4376>] __kmalloc+0x136/0x210
> Jun 28 17:01:34 eeepc kernel: [ 744.083828] kmemleak: [<c043f923>] ieee80211_register_hw+0x83/0x4b0
[snip]


> There were quite a few leaks before this. I dunno if kmemleak is
> spurious or not. As always, any ideas?

I can't seem to boot a kmemleak kernel myself. But there were a number
of fixes to kmemleak recently to track memory allocated in
ieee80211_register_hw, IIRC -- so it would be good to know if this
persists with all the latest fixes.

As always, no ideas :( I'm guessing nothing in .31-rc has magically
fixed any problems? I still say it's some kind of race condition
made worse by the eeepc performance, but unless I can find one or
replicate it with introduced load, I'm afraid I'm stumped.

(Well, I guess we can always go off and rewrite the rx loop without
the circular terminator and see how well that works on your HW.)

--
Bob Copeland %% http://www.bobcopeland.com