2015-04-14 08:22:15

by Samuel Thibault

[permalink] [raw]
Subject: I_CLEAR bug while shrinking dcache

Hello,

This morning I got the following with linux 4.0.0:

Apr 14 08:36:28 type kernel: [46936.950171] kernel BUG at fs/inode.c:1475!
Apr 14 08:36:28 type kernel: [46936.950206] invalid opcode: 0000 [#1] SMP
Apr 14 08:36:28 type kernel: [46936.950246] Modules linked in: tun xt_REDIRECT nf_nat_redirect xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables binfmt_misc nfsd cpufreq_powersave auth_rpcgss cpufreq_stats oid_registry nfs_acl cpufreq_userspace lockd cpufreq_conservative bbswitch(O) grace sunrpc joydev hid_generic usbhid hid cdc_mbim cdc_ncm usbnet mii cdc_acm cdc_wdm arc4 brcmsmac cordic brcmutil b43 nls_utf8 nls_cp437 mac80211 vfat fat cfg80211 x86_pkg_temp_thermal i915 kvm_intel kvm snd_hda_codec_hdmi ssb rng_core pcmcia pcmcia_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 dell_wmi lrw sparse_keymap gf128mul drm_kms_helper dell_laptop glue_helper rfkill ablk_helper cryptd drm psmouse dcdbas evdev serio_raw bcma i2c_i801 acpi_cpufreq tpm_tis wmi i2c_algo_bit tpm i2c_core 8250_fintek video ac battery processor button snd_hda_codec_idt snd_hda_codec_generic ledtrig_timer ledtrig_oneshot ledtrig_heartbeat ledtrig_default_on snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore coretemp ohci_hcd ledtrig_backlight pcspkr i8k firewire_sbp2 firewire_core crc_itu_t loop fuse parport_pc ppdev lp parport autofs4 microcode crc32c_intel ehci_pci ehci_hcd sdhci_pci sr_mod sdhci cdrom sg mmc_core usbcore e1000e ptp usb_common pps_core thermal thermal_sys
Apr 14 08:36:28 type kernel: [46936.951610] CPU: 3 PID: 63 Comm: kswapd0 Tainted: G W O 4.0.0 #87
Apr 14 08:36:28 type kernel: [46936.951666] Hardware name: Dell Inc. Latitude E6420/ , BIOS A14 07/11/2012
Apr 14 08:36:28 type kernel: [46936.951723] task: ffff8800b64a68d0 ti: ffff8800b1310000 task.ti: ffff8800b1310000
Apr 14 08:36:28 type kernel: [46936.951781] RIP: 0010:[<ffffffff8113eb4b>] [<ffffffff8113eb4b>] iput+0x23/0x194
Apr 14 08:36:28 type kernel: [46936.951845] RSP: 0018:ffff8800b1313bd8 EFLAGS: 00010202
Apr 14 08:36:28 type kernel: [46936.951887] RAX: 0000000000000000 RBX: ffff880098c92c68 RCX: ffff8800928883e0
Apr 14 08:36:28 type kernel: [46936.951942] RDX: ffff880098c92d88 RSI: 000000000000f724 RDI: ffff880098c92c68
Apr 14 08:36:28 type kernel: [46936.951997] RBP: ffff880098c92cf0 R08: 0000000000000000 R09: ffff8800b2bddd58
Apr 14 08:36:28 type kernel: [46936.952053] R10: 000000000000b53e R11: ffffffff81a04b80 R12: ffff8800b2bdfc70
Apr 14 08:36:28 type kernel: [46936.952108] R13: ffff8800b1313c68 R14: 0000000000000000 R15: 0000000000000000
Apr 14 08:36:28 type kernel: [46936.952163] FS: 0000000000000000(0000) GS:ffff88013dc60000(0000) knlGS:0000000000000000
Apr 14 08:36:28 type kernel: [46936.952226] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 14 08:36:28 type kernel: [46936.952270] CR2: 00007f4cc1f43920 CR3: 000000000180d000 CR4: 00000000000406e0
Apr 14 08:36:28 type kernel: [46936.952325] Stack:
Apr 14 08:36:28 type kernel: [46936.952342] ffff8800928883e0 ffff8800b2bdfc18 ffff880098c92c68 ffff8800b2bdfc70
Apr 14 08:36:28 type kernel: [46936.952410] ffff8800b1313c68 ffffffff8113b2c1 ffff8800b2bdfc98 ffff880129310c18
Apr 14 08:36:28 type kernel: [46936.952477] ffff8800b2bdfc18 ffffffff8113b6d5 0000000000000000 ffff8800b1313d38
Apr 14 08:36:28 type kernel: [46936.954381] Call Trace:
Apr 14 08:36:28 type kernel: [46936.956290] [<ffffffff8113b2c1>] ? __dentry_kill+0xfa/0x15c
Apr 14 08:36:28 type kernel: [46936.958219] [<ffffffff8113b6d5>] ? shrink_dentry_list+0xfc/0x18a
Apr 14 08:36:28 type kernel: [46936.960159] [<ffffffff8113c26f>] ? prune_dcache_sb+0x49/0x53
Apr 14 08:36:28 type kernel: [46936.962061] [<ffffffff8112d893>] ? super_cache_scan+0xf2/0x16c
Apr 14 08:36:28 type kernel: [46936.963989] [<ffffffff810ef6ad>] ? shrink_slab.part.53.constprop.71+0x1a6/0x2ac
Apr 14 08:36:28 type kernel: [46936.965882] [<ffffffff810f1a38>] ? shrink_zone+0x6b/0x131
Apr 14 08:36:28 type kernel: [46936.967746] [<ffffffff810f26ca>] ? kswapd+0x57a/0x740
Apr 14 08:36:28 type kernel: [46936.969549] [<ffffffff810f2150>] ? zone_reclaim+0x1cb/0x1cb
Apr 14 08:36:28 type kernel: [46936.971322] [<ffffffff81060503>] ? kthread+0xab/0xb3
Apr 14 08:36:28 type kernel: [46936.973026] [<ffffffff81060458>] ? __kthread_parkme+0x5d/0x5d
Apr 14 08:36:28 type kernel: [46936.974694] [<ffffffff81448d58>] ? ret_from_fork+0x58/0x90
Apr 14 08:36:28 type kernel: [46936.976362] [<ffffffff81060458>] ? __kthread_parkme+0x5d/0x5d
Apr 14 08:36:28 type kernel: [46936.977944] Code: ff 05 f2 1c ed 7e 58 c3 48 85 ff 0f 84 8a 01 00 00 41 55 41 54 55 53 48 8d af 88 00 00 00 51 f6 87 a0 00 00 00 40 48 89 fb 74 47 <0f> 0b 83 7b 48 00 48 8b 83 a0 00 00 00 74 53 f6 c4 08 74 4e 4c
Apr 14 08:36:28 type kernel: [46936.982994] RIP [<ffffffff8113eb4b>] iput+0x23/0x194
Apr 14 08:36:28 type kernel: [46936.984566] RSP <ffff8800b1313bd8>
Apr 14 08:36:28 type kernel: [46936.991466] ---[ end trace de34697426b2c75a ]---

So it seems that shrink_dentry_list was trying to kill an inode which
already has I_CLEAR. I'll be trying the attached patch in the next
days.

Samuel


Attachments:
(No filename) (5.16 kB)
dentry-fix (397.00 B)
Download all attachments

2015-04-14 12:44:40

by Al Viro

[permalink] [raw]
Subject: Re: I_CLEAR bug while shrinking dcache

On Tue, Apr 14, 2015 at 10:12:54AM +0200, Samuel Thibault wrote:

> So it seems that shrink_dentry_list was trying to kill an inode which
> already has I_CLEAR. I'll be trying the attached patch in the next
> days.

That patch is just papering over the problem. What you have is a dentry
with ->d_inode pointing to something with I_CLEAR set.

So either
* dentry has already been through __dentry_kill()
* or you have live dentry with ->d_inode somehow not contributing
to refcount of ->d_inode
* or you have an inode with positive refcount and I_CLEAR set.

The first variant can happen, but in that case it should have had
DCACHE_DENTRY_KILLED set by said __dentry_kill() and skipped. The other
two are clear inode refcounting bugs.

Do you have a reproducer for that?

2015-04-14 14:44:11

by Samuel Thibault

[permalink] [raw]
Subject: Re: I_CLEAR bug while shrinking dcache

Al Viro, le Tue 14 Apr 2015 13:44:29 +0100, a ?crit :
> On Tue, Apr 14, 2015 at 10:12:54AM +0200, Samuel Thibault wrote:
> > So it seems that shrink_dentry_list was trying to kill an inode which
> > already has I_CLEAR. I'll be trying the attached patch in the next
> > days.
>
> That patch is just papering over the problem.

I never said it was a proper fix :)

> The first variant can happen, but in that case it should have had
> DCACHE_DENTRY_KILLED set by said __dentry_kill() and skipped. The other
> two are clear inode refcounting bugs.
>
> Do you have a reproducer for that?

It just happened in the middle of the night.

Samuel

2015-04-20 23:07:06

by Samuel Thibault

[permalink] [raw]
Subject: Re: I_CLEAR bug while shrinking dcache

Samuel Thibault, le Tue 14 Apr 2015 16:44:02 +0200, a ?crit :
> Al Viro, le Tue 14 Apr 2015 13:44:29 +0100, a ?crit :
> > The first variant can happen, but in that case it should have had
> > DCACHE_DENTRY_KILLED set by said __dentry_kill() and skipped. The other
> > two are clear inode refcounting bugs.
> >
> > Do you have a reproducer for that?
>
> It just happened in the middle of the night.

I haven't seen it happening again since then.

Samuel