Thanks, Nico
> just while running, I got the following brcmsmac related error:
>
> [17336.572558] WARNING: at
drivers/net/wireless/brcm80211/brcmsmac/main.c:7968
brcms_c_wait_for_tx_completion+0x99/0xb0 [brcmsmac]()
> [17336.572562] Hardware name: MacBookAir4,2
> [17336.572565] Modules linked in: sha256_generic snd_hda_codec_hdmi
snd_hda_codec_cirrus ext4 jbd2 mbcache arc4 uvcvideo brcmsmac
videobuf2_vmalloc videobuf2_memops cordic brcmutil btusb bluetooth i915
videobuf2_core mac80211 videodev crc16 acpi_cpufreq snd_hda_intel media
i2c_algo_bit cfg80211 mperf drm_kms_helper snd_hda_codec rfkill joydev
bcm5974 iTCO_wdt iTCO_vendor_support i2c_i801 intel_agp lpc_ich applesmc
evdev microcode drm coretemp input_polldev pcspkr intel_gtt mfd_core
i2c_core processor bcma snd_hwdep video button battery apple_bl mei ac
snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_timer snd soundcore
kvm_intel kvm nfs nfs_acl lockd auth_rpcgss sunrpc fscache autofs4 jfs
xts gf128mul dm_crypt dm_mod sd_mod pata_acpi ata_generic hid_generic
usb_storage hid_apple usbhid hid uhci_hcd aesni_intel aes_x86_64
aes_generic ghash_clmulni_intel cryptd crc32c_intel ata_piix libata
scsi_mod ehci_hcd usbcore usb_common
> [17336.572716] Pid: 27714, comm: kworker/u:1 Not tainted 3.5.4-1-ARCH #1
> [17336.572719] Call Trace:
> [17336.572734] [<ffffffff81051a3f>] warn_slowpath_common+0x7f/0xc0
> [17336.572743] [<ffffffff81051a9a>] warn_slowpath_null+0x1a/0x20
> [17336.572759] [<ffffffffa063bd29>]
brcms_c_wait_for_tx_completion+0x99/0xb0 [brcmsmac]
> [17336.572769] [<ffffffffa062e1bb>] brcms_ops_flush+0x3b/0x60 [brcmsmac]
This issue is bugging me for quite a while and some patch do patch up
the issue, but it is still bleeding. Nasty one to get to the root cause
of it, but I am testing my pitbull mentality on it ;-)
Gr. AvS
I've just go the hint to do
echo 3 > /proc/sys/vm/drop_caches
via private mail. Tried that, but does not change a thing.
I'm now in a new situation: I did *not* do a backup, but still
have ~800M of slab usage. I'm trying to compile a kernel now without
running out of memory (not so easy, if you are used to do make -j)
Attached are the latest figures from my (still) running system.
Cheers,
Nico
--
PGP key: 7ED9 F7D3 6B10 81D7 0EC5 5C09 D7DC C8E4 3187 7DF0
On 09/24/2012 05:43 PM, David Rientjes wrote:
> On Fri, 21 Sep 2012, Nico Schottelius wrote:
>
>> Active / Total Objects (% used) : 1165130 / 1198087 (97.2%)
>> Active / Total Slabs (% used) : 81027 / 81027 (100.0%)
>> Active / Total Caches (% used) : 69 / 101 (68.3%)
>> Active / Total Size (% used) : 1237249.81K / 1246521.94K (99.3%)
>> Minimum / Average / Maximum Object : 0.01K / 1.04K / 15.23K
>>
>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
>> 993607 993607 100% 1.21K 75358 26 2411456K jfs_ip
>
> Well that doesn't look good. 100% of the inode cache for jfs are being
> used which either means
>
> - there's a memory leak, or
maybe a missing iput() somewhere?
Nico, does unmounting the usb drive after killing the backup clean up
the jfs inode cache?
> - there's some sort of throttling issue in jfs.
>
> And those objects are consuming ~2.3GB of slab on your 4GB machine and
> seems to only have occurred between v3.4.2 to v3.5.3.
Almost nothing in jfs has changed between these releases. Only this:
vfs: Rename end_writeback() to clear_inode()
> It would be interesting to see what kmemleak would tell us.
>
> Adding Dave Kleikamp to the cc.
>
Good morning,
Dave Kleikamp [Tue, Sep 25, 2012 at 10:07:14AM -0500]:
> >> Active / Total Objects (% used) : 1165130 / 1198087 (97.2%)
> >> Active / Total Slabs (% used) : 81027 / 81027 (100.0%)
> >> Active / Total Caches (% used) : 69 / 101 (68.3%)
> >> Active / Total Size (% used) : 1237249.81K / 1246521.94K (99.3%)
> >> Minimum / Average / Maximum Object : 0.01K / 1.04K / 15.23K
> >>
> >> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> >> 993607 993607 100% 1.21K 75358 26 2411456K jfs_ip
> >
> > Well that doesn't look good. 100% of the inode cache for jfs are being
> > used which either means
> >
> > - there's a memory leak, or
>
> maybe a missing iput() somewhere?
>
> Nico, does unmounting the usb drive after killing the backup clean up
> the jfs inode cache?
Iirc, it does not. I'll check this evening, because doing the backup
currently forces me to reboot my productive system.
> > - there's some sort of throttling issue in jfs.
> >
> > And those objects are consuming ~2.3GB of slab on your 4GB machine and
> > seems to only have occurred between v3.4.2 to v3.5.3.
>
> Almost nothing in jfs has changed between these releases. Only this:
>
> vfs: Rename end_writeback() to clear_inode()
The other obscurity is that the root filesystem is also jfs - it's not
only the usb disk.
Cheers,
Nico
--
PGP key: 7ED9 F7D3 6B10 81D7 0EC5 5C09 D7DC C8E4 3187 7DF0
Adding David as he suggested looking at slabtop.
Gr. AvS
-------- Original Message --------
Subject: Re: Out of memory on 3.5 kernels
Date: Fri, 21 Sep 2012 21:49:31 +0200
From: Nico Schottelius <[email protected]>
To: Arend van Spriel <[email protected]>
CC: Nico Schottelius <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>
Latest results:
- it seems I can trigger the problem by doing the backup to my external
usb disk
- the usb disk uses jfs
- jfs seems to take the major part in cache memory consumption
I've attached dmesg and slabtop outputs of various stages:
- before and during the backup
I've had to abort the backup, because memory pressure made the system
almost completly unusable.
Hope this helps & someone has a clue on what went wrong were...
Cheers,
Nico
--
PGP key: 7ED9 F7D3 6B10 81D7 0EC5 5C09 D7DC C8E4 3187 7DF0
Good morning,
Attached are the results of kmemleak at various stages (before, during,
after the backup). I am now running 3.6.0-rc7-00039-g6f0f9b6, which has
the same beheviour as 3.5.
According to slabtop it seems that jfs_ip is again the one consuming > 2
GiB memory in the end stage.
Let me know if you need more data, for me it's reboot time now!
Cheers,
Nico
--
PGP key: 7ED9 F7D3 6B10 81D7 0EC5 5C09 D7DC C8E4 3187 7DF0
Latest results:
- it seems I can trigger the problem by doing the backup to my external
usb disk
- the usb disk uses jfs
- jfs seems to take the major part in cache memory consumption
I've attached dmesg and slabtop outputs of various stages:
- before and during the backup
I've had to abort the backup, because memory pressure made the system
almost completly unusable.
Hope this helps & someone has a clue on what went wrong were...
Cheers,
Nico
--
PGP key: 7ED9 F7D3 6B10 81D7 0EC5 5C09 D7DC C8E4 3187 7DF0
David Rientjes [Mon, Sep 24, 2012 at 03:43:27PM -0700]:
> [...]
>
> And those objects are consuming ~2.3GB of slab on your 4GB machine and
> seems to only have occurred between v3.4.2 to v3.5.3.
>
> It would be interesting to see what kmemleak would tell us.
As the Archlinux kernel does not support kmemleak:
[8:04] brief:android-apk% zgrep -i meml /proc/config.gz
CONFIG_INPUT_FF_MEMLESS=m
# CONFIG_DEBUG_KMEMLEAK is not set
I'll give the latest kernel tree from Linus a try this evening and if it
is in there. If not, I'll go back to 3.5.4 in Linus tree and try again.
Cheers,
Nico
--
PGP key: 7ED9 F7D3 6B10 81D7 0EC5 5C09 D7DC C8E4 3187 7DF0
On Fri, 21 Sep 2012, Nico Schottelius wrote:
> Active / Total Objects (% used) : 1165130 / 1198087 (97.2%)
> Active / Total Slabs (% used) : 81027 / 81027 (100.0%)
> Active / Total Caches (% used) : 69 / 101 (68.3%)
> Active / Total Size (% used) : 1237249.81K / 1246521.94K (99.3%)
> Minimum / Average / Maximum Object : 0.01K / 1.04K / 15.23K
>
> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> 993607 993607 100% 1.21K 75358 26 2411456K jfs_ip
Well that doesn't look good. 100% of the inode cache for jfs are being
used which either means
- there's a memory leak, or
- there's some sort of throttling issue in jfs.
And those objects are consuming ~2.3GB of slab on your 4GB machine and
seems to only have occurred between v3.4.2 to v3.5.3.
It would be interesting to see what kmemleak would tell us.
Adding Dave Kleikamp to the cc.
Hey Valdis,
[email protected] [Fri, Oct 05, 2012 at 11:48:04AM -0400]:
> On Wed, 03 Oct 2012 23:23:11 +0200, Nico Schottelius said:
>
> > does anyone of you have a clue so far what may be causing the huge
> > slab usage?
> >
> > I've just found an interesting detail: umounting and cryptsetup
> > luksClosing frees up the used memory (not sure which one was freeing
> > up)
>
> For what it's worth, I'm seeing a similar problem in linux-next on my laptop -
> trying to run a backup to an external hard drive that has a LUKS partition on
> it will OOM. (For some reason having the external LUKS partition is much more
> problematic than the LVM-on-LUKS on the internal drive)
Indeed, I also have the internal drive encrypted with LUKS, also jfs
on it, but the problem only occurs on the external drive.
Is there the possibility that the JFS on the external drive got into a
state that may caus JFS to behave unexpected? If this is true, it would
explain why it happens only with the external drive and also accross
kernel versions.
But "good" to hear I am not the only one affected anymore.
Cheers,
Nico
--
PGP key: 7ED9 F7D3 6B10 81D7 0EC5 5C09 D7DC C8E4 3187 7DF0
Hello,
does anyone of you have a clue so far what may be causing the huge
slab usage?
I've just found an interesting detail: umounting and cryptsetup
luksClosing frees up the used memory (not sure which one was freeing
up)
Attached are dmesg, slabtop output during the backup and after umounting.
Cheers,
Nico
--
PGP key: 7ED9 F7D3 6B10 81D7 0EC5 5C09 D7DC C8E4 3187 7DF0
Good morning,
update: this problem still exists on 3.6.2-1-ARCH and it got worse:
I reformatted the external disk to use xfs, but as the my
root filesystem is still jfs, it still appears:
Active / Total Objects (% used) : 642732 / 692268 (92.8%)
Active / Total Slabs (% used) : 24801 / 24801 (100.0%)
Active / Total Caches (% used) : 79 / 111 (71.2%)
Active / Total Size (% used) : 603522.30K / 622612.05K (96.9%)
Minimum / Average / Maximum Object : 0.01K / 0.90K / 15.25K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
475548 467649 98% 1.21K 18722 26 599104K jfs_ip
25670 19143 74% 0.05K 302 85 1208K shared_policy_node
24612 16861 68% 0.19K 1172 21 4688K dentry
24426 19524 79% 0.17K 1062 23 4248K vm_area_struct
21636 21180 97% 0.11K 601 36 2404K sysfs_dir_cache
12352 9812 79% 0.06K 193 64 772K kmalloc-64
11684 9145 78% 0.09K 254 46 1016K anon_vma
9855 8734 88% 0.58K 365 27 5840K inode_cache
9728 9281 95% 0.01K 19 512 76K kmalloc-8
8932 4411 49% 0.55K 319 28 5104K radix_tree_node
6336 5760 90% 0.25K 198 32 1584K kmalloc-256
5632 5632 100% 0.02K 22 256 88K kmalloc-16
4998 2627 52% 0.09K 119 42 476K kmalloc-96
4998 3893 77% 0.04K 49 102 196K Acpi-Namespace
4736 3887 82% 0.03K 37 128 148K kmalloc-32
4144 4144 100% 0.07K 74 56 296K Acpi-ParseExt
3740 3740 100% 0.02K 22 170 88K numa_policy
3486 3023 86% 0.19K 166 21 664K kmalloc-192
3200 2047 63% 0.12K 100 32 400K kmalloc-128
2304 2074 90% 0.50K 72 32 1152K kmalloc-512
2136 2019 94% 0.64K 89 24 1424K proc_inode_cache
2080 2080 100% 0.12K 65 32 260K jfs_mp
2024 1890 93% 0.70K 88 23 1408K shmem_inode_cache
1632 1556 95% 1.00K 51 32 1632K kmalloc-1024
I am wondering if anyone is feeling responsible for this bug or if the mid-term
solution is to move away from jfs?
Cheers,
Nico
--
PGP key: 7ED9 F7D3 6B10 81D7 0EC5 5C09 D7DC C8E4 3187 7DF0
On Wed, 03 Oct 2012 23:23:11 +0200, Nico Schottelius said:
> does anyone of you have a clue so far what may be causing the huge
> slab usage?
>
> I've just found an interesting detail: umounting and cryptsetup
> luksClosing frees up the used memory (not sure which one was freeing
> up)
For what it's worth, I'm seeing a similar problem in linux-next on my laptop -
trying to run a backup to an external hard drive that has a LUKS partition on
it will OOM. (For some reason having the external LUKS partition is much more
problematic than the LVM-on-LUKS on the internal drive)
I've started bisecting, and gotten this far:
% git bisect log
git bisect start
# bad: [1aa44772a621e8547dc4db41b47c747469fe0ea3] Add linux-next specific files for 20121001
git bisect bad 1aa44772a621e8547dc4db41b47c747469fe0ea3
# good: [fea7a08acb13524b47711625eebea40a0ede69a0] Linux 3.6-rc3
git bisect good fea7a08acb13524b47711625eebea40a0ede69a0
# good: [526c4d73327f56f83da8b8088fd0b3c7be38c7ae] Merge remote-tracking branch 'regulator/for-next'
git bisect good 526c4d73327f56f83da8b8088fd0b3c7be38c7ae
# good: [961d70d88557405c5b7302c7d87752566468f035] Merge remote-tracking branch 'tty/tty-next'
git bisect good 961d70d88557405c5b7302c7d87752566468f035
(That's as far as I've gotten that I trust, the next bisect hits a different problem where
connecting the hard drive and *starting* cryptLuks hangs the machine hard, so I'll
have to finish bisecting that problem first, then return to bisecting this one)
I admit not being sure why you see it on a 3.5 kernel, but I only see it on stuff
that's after 3.6-rc3.
On 11/21/2012 04:37 PM, Dave Kleikamp wrote:
> On 10/30/2012 05:35 AM, Nico Schottelius wrote:
>> Good morning,
>>
>> update: this problem still exists on 3.6.2-1-ARCH and it got worse:
>>
>> I reformatted the external disk to use xfs, but as the my
>> root filesystem is still jfs, it still appears:
>>
>> Active / Total Objects (% used) : 642732 / 692268 (92.8%)
>> Active / Total Slabs (% used) : 24801 / 24801 (100.0%)
>> Active / Total Caches (% used) : 79 / 111 (71.2%)
>> Active / Total Size (% used) : 603522.30K / 622612.05K (96.9%)
>> Minimum / Average / Maximum Object : 0.01K / 0.90K / 15.25K
>>
>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
>> 475548 467649 98% 1.21K 18722 26 599104K jfs_ip
...
>>
>> I am wondering if anyone is feeling responsible for this bug or if the mid-term
>> solution is to move away from jfs?
>
> Sorry, I haven't taken too close a look at this, but I did notice
> another conversation that may be related:
>
> https://lkml.org/lkml/2012/11/17/26
>
> The commit in question first showed up in 3.5-rc1, which coincides with
> your problem.
I believe this commit will fix the problem:
http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4eff96d
It is targeted for the stable kernels.
On 10/30/2012 05:35 AM, Nico Schottelius wrote:
> Good morning,
>
> update: this problem still exists on 3.6.2-1-ARCH and it got worse:
>
> I reformatted the external disk to use xfs, but as the my
> root filesystem is still jfs, it still appears:
>
> Active / Total Objects (% used) : 642732 / 692268 (92.8%)
> Active / Total Slabs (% used) : 24801 / 24801 (100.0%)
> Active / Total Caches (% used) : 79 / 111 (71.2%)
> Active / Total Size (% used) : 603522.30K / 622612.05K (96.9%)
> Minimum / Average / Maximum Object : 0.01K / 0.90K / 15.25K
>
> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> 475548 467649 98% 1.21K 18722 26 599104K jfs_ip
> 25670 19143 74% 0.05K 302 85 1208K shared_policy_node
> 24612 16861 68% 0.19K 1172 21 4688K dentry
> 24426 19524 79% 0.17K 1062 23 4248K vm_area_struct
> 21636 21180 97% 0.11K 601 36 2404K sysfs_dir_cache
> 12352 9812 79% 0.06K 193 64 772K kmalloc-64
> 11684 9145 78% 0.09K 254 46 1016K anon_vma
> 9855 8734 88% 0.58K 365 27 5840K inode_cache
> 9728 9281 95% 0.01K 19 512 76K kmalloc-8
> 8932 4411 49% 0.55K 319 28 5104K radix_tree_node
> 6336 5760 90% 0.25K 198 32 1584K kmalloc-256
> 5632 5632 100% 0.02K 22 256 88K kmalloc-16
> 4998 2627 52% 0.09K 119 42 476K kmalloc-96
> 4998 3893 77% 0.04K 49 102 196K Acpi-Namespace
> 4736 3887 82% 0.03K 37 128 148K kmalloc-32
> 4144 4144 100% 0.07K 74 56 296K Acpi-ParseExt
> 3740 3740 100% 0.02K 22 170 88K numa_policy
> 3486 3023 86% 0.19K 166 21 664K kmalloc-192
> 3200 2047 63% 0.12K 100 32 400K kmalloc-128
> 2304 2074 90% 0.50K 72 32 1152K kmalloc-512
> 2136 2019 94% 0.64K 89 24 1424K proc_inode_cache
> 2080 2080 100% 0.12K 65 32 260K jfs_mp
> 2024 1890 93% 0.70K 88 23 1408K shmem_inode_cache
> 1632 1556 95% 1.00K 51 32 1632K kmalloc-1024
>
>
> I am wondering if anyone is feeling responsible for this bug or if the mid-term
> solution is to move away from jfs?
Sorry, I haven't taken too close a look at this, but I did notice
another conversation that may be related:
https://lkml.org/lkml/2012/11/17/26
The commit in question first showed up in 3.5-rc1, which coincides with
your problem.
>
> Cheers,
>
> Nico
>
Hey Dave,
Dave Kleikamp [Tue, Nov 27, 2012 at 09:56:58AM -0600]:
> [...]
> I believe this commit will fix the problem:
> http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4eff96d
>
> It is targeted for the stable kernels.
Thanks, I'll give it a try, I've v3.7-rc7-25-g2844a48 compiled already
and give the system a reboot this evening.
Cheers,
Nico
--
PGP key: 7ED9 F7D3 6B10 81D7 0EC5 5C09 D7DC C8E4 3187 7DF0
* Nico Schottelius <[email protected]> wrote:
> Good morning,
>
> update: this problem still exists on 3.6.2-1-ARCH and it got worse:
>
> I reformatted the external disk to use xfs, but as the my
> root filesystem is still jfs, it still appears:
>
> Active / Total Objects (% used) : 642732 / 692268 (92.8%)
> Active / Total Slabs (% used) : 24801 / 24801 (100.0%)
> Active / Total Caches (% used) : 79 / 111 (71.2%)
> Active / Total Size (% used) : 603522.30K / 622612.05K (96.9%)
> Minimum / Average / Maximum Object : 0.01K / 0.90K / 15.25K
>
> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> 475548 467649 98% 1.21K 18722 26 599104K jfs_ip
> 25670 19143 74% 0.05K 302 85 1208K shared_policy_node
> 24612 16861 68% 0.19K 1172 21 4688K dentry
> 24426 19524 79% 0.17K 1062 23 4248K vm_area_struct
> 21636 21180 97% 0.11K 601 36 2404K sysfs_dir_cache
> 12352 9812 79% 0.06K 193 64 772K kmalloc-64
> 11684 9145 78% 0.09K 254 46 1016K anon_vma
> 9855 8734 88% 0.58K 365 27 5840K inode_cache
> 9728 9281 95% 0.01K 19 512 76K kmalloc-8
> 8932 4411 49% 0.55K 319 28 5104K radix_tree_node
> 6336 5760 90% 0.25K 198 32 1584K kmalloc-256
> 5632 5632 100% 0.02K 22 256 88K kmalloc-16
> 4998 2627 52% 0.09K 119 42 476K kmalloc-96
> 4998 3893 77% 0.04K 49 102 196K Acpi-Namespace
> 4736 3887 82% 0.03K 37 128 148K kmalloc-32
> 4144 4144 100% 0.07K 74 56 296K Acpi-ParseExt
> 3740 3740 100% 0.02K 22 170 88K numa_policy
> 3486 3023 86% 0.19K 166 21 664K kmalloc-192
> 3200 2047 63% 0.12K 100 32 400K kmalloc-128
> 2304 2074 90% 0.50K 72 32 1152K kmalloc-512
> 2136 2019 94% 0.64K 89 24 1424K proc_inode_cache
> 2080 2080 100% 0.12K 65 32 260K jfs_mp
> 2024 1890 93% 0.70K 88 23 1408K shmem_inode_cache
> 1632 1556 95% 1.00K 51 32 1632K kmalloc-1024
>
>
> I am wondering if anyone is feeling responsible for this bug or if the mid-term
> solution is to move away from jfs?
I also did some tests, when this bug was first reported... but I couln't
re-produce it... currently I have no idea what is wrong there.
I think moving to ext4 or xfs is the best for now... :(
--
regards, TR