2016-03-05 10:30:25

by Stefan Assmann

[permalink] [raw]
Subject: RIP [<ffffffff812c3b77>] ext4_release_file+0x37/0xc0 with 4.5-rc6 when resuming from suspend

Occasionally when trying to resume from suspend to RAM I get the
following GPF.

Stefan

[ 882.344245] PM: resume of devices complete after 83682.775 msecs
[ 882.420550] PM: Finishing wakeup.
[ 882.420551] Restarting tasks ...·
[ 882.420664] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[ 882.421141] done.
[ 882.422027] usb 3-9.1.1: USB disconnect, device number 33
[ 882.452719] general protection fault: 0000 [#1] SMP·
[ 882.453244] Modules linked in: btrfs xor raid6_pq ufs hfsplus hfs minix msdos jfs xfs libcrc32c xt_CHECKSUM tun fuse ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_addrtype br_netfilter overlay ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_filter ebtable_nat
ebtable_broute bridge ebtables ip6table_raw ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_filter ip6_tables iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security bnep arc4
intel_rapl x86_pkg_temp_thermal iwlmvm snd_hda_codec_realtek coretemp snd_hda_codec_hdmi snd_hda_codec_generic mac80211 vfat fat kvm_intel snd_hda_intel iTCO_wdt iTCO_vendor_support kvm snd_hda_codec btusb iwlwifi uvcvideo btrtl
[ 882.456611] snd_hda_core videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 btbcm snd_hwdep irqbypass snd_seq videobuf2_core btintel snd_seq_device cfg80211 i2c_i801 pcspkr videodev joydev lpc_ich bluetooth thinkpad_acpi snd_pcm media ie31200_edac snd_timer snd mei_me mei edac_core shpchp
soundcore rfkill tpm_tis tpm nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace dm_crypt 8021q garp stp llc mrp i915 crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit drm_kms_helper ghash_clmulni_intel sdhci_pci e1000e sdhci mmc_core drm serio_raw ptp pps_core wmi fjes video sunrpc
[ 882.459687] CPU: 7 PID: 5299 Comm: dhclient Tainted: G W 4.5.0-0.rc6.git1.1.vanilla.knurd.1.fc23.x86_64 #1
[ 882.460305] Hardware name: LENOVO 20EGS0R60P/20EGS0R60P, BIOS GNET73WW (2.21 ) 03/12/2015
[ 882.460923] task: ffff880420921d80 ti: ffff88046819c000 task.ti: ffff88046819c000
[ 882.461578] RIP: 0010:[<ffffffff812c3b77>] [<ffffffff812c3b77>] ext4_release_file+0x37/0xc0
[ 882.462231] RSP: 0018:ffff88046819fbf8 EFLAGS: 00010246
[ 882.462870] RAX: c0c0c0c000000039 RBX: ffff8804698e9000 RCX: 0000000000000001
[ 882.463511] RDX: ffff8804642035d8 RSI: ffff880464203500 RDI: ffff8804698e9000
[ 882.464135] RBP: ffff88046819fc10 R08: 0000000000000000 R09: 0000000000000000
[ 882.464774] R10: ffff8804698e9000 R11: ffff880464203510 R12: 0000000000000010
[ 882.465384] R13: ffff8804698e9000 R14: ffff880464a076a0 R15: ffff88046afee0c0
[ 882.466020] FS: 00007f95c134c8c0(0000) GS:ffff88047e3c0000(0000) knlGS:0000000000000000
[ 882.466637] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 882.467248] CR2: 000055f5eb54eda0 CR3: 0000000001c0a000 CR4: 00000000001406e0
[ 882.467887] Stack:
[ 882.468495] ffff8804698e9000 ffff880464203500 0000000000000010 ffff88046819fc58
[ 882.469118] ffffffff8123b34f ffff8804698e9000 ffff880464203510 ffffffff81f28ed0
[ 882.469758] ffff880420922478 ffff880420921d80 ffff880464203c00 ffff8804653fe968
[ 882.470441] Call Trace:
[ 882.471076] [<ffffffff8123b34f>] __fput+0xdf/0x1f0
[ 882.471688] [<ffffffff8123b49e>] ____fput+0xe/0x10
[ 882.472304] [<ffffffff810c0d03>] task_work_run+0x73/0x90
[ 882.472952] [<ffffffff810a7185>] do_exit+0x2d5/0xb30
[ 882.473556] [<ffffffff810a7a67>] do_group_exit+0x47/0xb0
[ 882.474187] [<ffffffff810b2f51>] get_signal+0x291/0x610
[ 882.474803] [<ffffffff81017287>] do_signal+0x37/0x6b0
[ 882.475449] [<ffffffff811e3fb6>] ? handle_mm_fault+0x866/0x1ae0
[ 882.476097] [<ffffffff81114dda>] ? ktime_get_ts64+0x4a/0xf0
[ 882.476732] [<ffffffff8124daab>] ? poll_select_copy_remaining+0xfb/0x140
[ 882.477365] [<ffffffff8100320c>] exit_to_usermode_loop+0x8c/0xd0
[ 882.478002] [<ffffffff81003d41>] syscall_return_slowpath+0xa1/0xb0
[ 882.478641] [<ffffffff817b830c>] int_ret_from_sys_call+0x25/0x8f
[ 882.479267] Code: 48 89 fb 48 83 ec 08 48 8b 87 68 ff ff ff 48 c1 e8 24 a8 01 75 4e f6 46 44 02 74 0b 8b 83 50 01 00 00 83 f8 01 74 54 48 8b 43 28 <48> 8b 80 58 04 00 00 48 8b 40 68 f6 40 5c 20 74 1d 48 8b 83 68·
[ 882.480630] RIP [<ffffffff812c3b77>] ext4_release_file+0x37/0xc0
[ 882.481277] RSP <ffff88046819fbf8>
[ 882.481937] ---[ end trace 32281f35372d5d05 ]---


2016-03-06 19:10:16

by Theodore Ts'o

[permalink] [raw]
Subject: Re: RIP [<ffffffff812c3b77>] ext4_release_file+0x37/0xc0 with 4.5-rc6 when resuming from suspend

On Sat, Mar 05, 2016 at 11:24:46AM +0100, Stefan Assmann wrote:
> Occasionally when trying to resume from suspend to RAM I get the
> following GPF.
>
> Stefan
>
> [ 882.344245] PM: resume of devices complete after 83682.775 msecs
> [ 882.420550] PM: Finishing wakeup.
> [ 882.420551] Restarting tasks ...?
> [ 882.420664] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
> [ 882.421141] done.
> [ 882.422027] usb 3-9.1.1: USB disconnect, device number 33

Hmm.... did you have a ext4 file system mounted on some kind of USB
attached storage device (USB thumb drive, USB HDD, etc.)?

> [ 882.461578] RIP: 0010:[<ffffffff812c3b77>] [<ffffffff812c3b77>] ext4_release_file+0x37/0xc0

Also, can you translate this address to a file and line number? Use
the command:

addr2line -e /usr/src/linux/vmlinux -i -a ffffffff812c3b77

Also, if you could give the lines around fs/ext4/file.c (unless it is
exactly 4.5-rc6 without any other changes or commit) that would also
be helpful.

I don't see anything in ext4_release_file() that would obviously be a
potential problem after a suspend/reasume. One possibly is that some
other piece of code is corrupting memory.

> [ 882.459687] CPU: 7 PID: 5299 Comm: dhclient Tainted: G W 4.5.0-0.rc6.git1.1.vanilla.knurd.1.fc23.x86_64 #1

Is it always dhclient which is trying to exit?

If you can give a detailed description of your hardware configuration
(what devices, what file systems / partitions you have mounted, etc.)
that would be helpful.

Thanks,

- Ted

2016-03-07 08:31:48

by Stefan Assmann

[permalink] [raw]
Subject: Re: RIP [<ffffffff812c3b77>] ext4_release_file+0x37/0xc0 with 4.5-rc6 when resuming from suspend

On 06.03.2016 20:10, Theodore Ts'o wrote:
> On Sat, Mar 05, 2016 at 11:24:46AM +0100, Stefan Assmann wrote:
>> Occasionally when trying to resume from suspend to RAM I get the
>> following GPF.
>>
>> Stefan
>>
>> [ 882.344245] PM: resume of devices complete after 83682.775 msecs
>> [ 882.420550] PM: Finishing wakeup.
>> [ 882.420551] Restarting tasks ...?
>> [ 882.420664] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
>> [ 882.421141] done.
>> [ 882.422027] usb 3-9.1.1: USB disconnect, device number 33
>
> Hmm.... did you have a ext4 file system mounted on some kind of USB
> attached storage device (USB thumb drive, USB HDD, etc.)?

Hi Ted,

No, this USB hub only has mouse and keyboard attached, but it
occasionally times out on power-on. Something's not right there so it
could be related.

>
>> [ 882.461578] RIP: 0010:[<ffffffff812c3b77>] [<ffffffff812c3b77>] ext4_release_file+0x37/0xc0
>
> Also, can you translate this address to a file and line number? Use
> the command:
>
> addr2line -e /usr/src/linux/vmlinux -i -a ffffffff812c3b77

The kernel is from the Fedora Vanilla Repository [1] and IIRC they don't
provide debuginfo packages. If this provides vital information I can
build the kernel from source and provide the info.

> Also, if you could give the lines around fs/ext4/file.c (unless it is
> exactly 4.5-rc6 without any other changes or commit) that would also
> be helpful.

It is 4.5-rc6 without any changes.

> I don't see anything in ext4_release_file() that would obviously be a
> potential problem after a suspend/reasume. One possibly is that some
> other piece of code is corrupting memory.
>
>> [ 882.459687] CPU: 7 PID: 5299 Comm: dhclient Tainted: G W 4.5.0-0.rc6.git1.1.vanilla.knurd.1.fc23.x86_64 #1
>
> Is it always dhclient which is trying to exit?

Most of the time the screen just stays black, so it's hard to tell. But
while trying to reproduce just now I hit a null pointer exception in
anon_vma_interval_tree_remove(), which happened right after initializing
the exact same USB hub. Your corruption theory is getting more likely.

>
> If you can give a detailed description of your hardware configuration
> (what devices, what file systems / partitions you have mounted, etc.)
> that would be helpful.

Using a Lenovo w541 notebook with a Samsung 850 Evo SSD (500GB), no USB
storage devices attached.
Filesystem info
root@w541:~ > mount |grep ext
/dev/mapper/fedora_linux-root on / type ext4 (rw,relatime,data=ordered)
/dev/sda2 on /boot type ext4 (rw,relatime,data=ordered)
/dev/mapper/fedora_linux-home on /home type ext4 (rw,relatime,data=ordered)
/dev/mapper/fedora_linux-home on /var/lib/libvirt/images type ext4 (rw,relatime,data=ordered)
/dev/mapper/fedora_linux-home on /var/lib/docker type ext4 (rw,relatime,data=ordered)

Stefan

[1] https://fedoraproject.org/wiki/Kernel_Vanilla_Repositories

2016-03-08 08:33:49

by Stefan Assmann

[permalink] [raw]
Subject: Re: RIP [<ffffffff812c3b77>] ext4_release_file+0x37/0xc0 with 4.5-rc6 when resuming from suspend

On 07.03.2016 09:31, Stefan Assmann wrote:
> On 06.03.2016 20:10, Theodore Ts'o wrote:
>> On Sat, Mar 05, 2016 at 11:24:46AM +0100, Stefan Assmann wrote:
> It is 4.5-rc6 without any changes.
>
>> I don't see anything in ext4_release_file() that would obviously be a
>> potential problem after a suspend/reasume. One possibly is that some
>> other piece of code is corrupting memory.
>>
>>> [ 882.459687] CPU: 7 PID: 5299 Comm: dhclient Tainted: G W 4.5.0-0.rc6.git1.1.vanilla.knurd.1.fc23.x86_64 #1
>>
>> Is it always dhclient which is trying to exit?
>
> Most of the time the screen just stays black, so it's hard to tell. But
> while trying to reproduce just now I hit a null pointer exception in
> anon_vma_interval_tree_remove(), which happened right after initializing
> the exact same USB hub. Your corruption theory is getting more likely.

It's not a USB problem. I disconnected the USB hub and the corruption
remained. However, the backtrace was preceded by
[ 32.778070] snd_hda_codec_hdmi hdaudioC0D0: HDMI: pin nid 4 not registered

After making sure snd_hda_codec_hdmi did not get loaded during boot I've
not experienced the issue anymore. Of course I've only tried this a few
times so far, but it looks promising.
Anyway thanks for taking a look, I'll probably follow up on this with
the sound folks.

Stefan