2015-11-08 22:05:37

by Denis Bychkov

[permalink] [raw]
Subject: 4.3 kernel panics when MMC/SDHC card is inserted on thinkpad

The only started in 4.3 kernel (at least RC-5), 4.2.x does not have
this problem. The kernel panic happens immediately after the SDHC card
is inserted, reproducibility is 100%. If the system boots up with the
card already inserted, it will crash as soon as sdhci_pci module is
loaded. If the module is unloaded/blacklisted, obviously, nothing
happens as the system does not see the MMC card reader.
The machine is Lenovo thinkpad T-510 laptop with Intel Westmere
CPU/3400 series chipset running 64-bit kernel 4.3.0.

(somewhat) relevant kernel configuration bits:
# CONFIG_CALGARY_IOMMU is not set
CONFIG_IOMMU_HELPER=y
CONFIG_VFIO_IOMMU_TYPE1=m
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y
# Generic IOMMU Pagetable Support
CONFIG_IOMMU_IOVA=y
# CONFIG_AMD_IOMMU is not set
CONFIG_INTEL_IOMMU=y
CONFIG_INTEL_IOMMU_DEFAULT_ON=y
CONFIG_INTEL_IOMMU_FLOPPY_WA=y
# CONFIG_IOMMU_STRESS is not set
CONFIG_KVM_INTEL=m
CONFIG_PCI_MMCONFIG=y
# Supported MMC/SDIO adapters
CONFIG_MMC=m
# CONFIG_MMC_DEBUG is not set
# CONFIG_MMC_CLKGATE is not set
# MMC/SD/SDIO Card Drivers
CONFIG_MMC_BLOCK=m
CONFIG_MMC_BLOCK_MINORS=8
CONFIG_MMC_BLOCK_BOUNCE=y
CONFIG_MMC_TEST=m
# MMC/SD/SDIO Host Controller Drivers
CONFIG_MMC_SDHCI=m
CONFIG_MMC_SDHCI_PCI=m
CONFIG_MMC_RICOH_MMC=y
CONFIG_MMC_SDHCI_ACPI=m

Card reader device:
0d:00.0 SD Host controller: Ricoh Co Ltd MMC/SD Host Controller (rev 01)
Subsystem: Lenovo MMC/SD Host Controller
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at f2100000 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Power Management version 3
Capabilities: [80] Express Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [800] Advanced Error Reporting
Kernel driver in use: sdhci-pci
Kernel modules: sdhci_pci

The panic report caught via netconsole:

[22946.904308] ------------[ cut here ]------------
[22946.906564] kernel BUG at drivers/iommu/intel-iommu.c:3485!
[22946.908801] invalid opcode: 0000 [#1] PREEMPT SMP
[22946.911113] Modules linked in: netconsole dm_mod bnep
cpufreq_powersave cpufreq_stats cpufreq_conservative cpufreq_userspace
coretemp intel_powerclamp kvm_intel kvm crct10dif_pclmul crc32_pclmul
jitterentropy_rng hmac sha256_ssse3 sha256_generic drbg
snd_hda_codec_hdmi ansi_cprng gpio_ich iTCO_wdt iTCO_vendor_support
aesni_intel arc4 aes_x86_64 nouveau mxm_wmi lrw gf128mul glue_helper
ablk_helper iwldvm cryptd psmouse mac80211 uvcvideo serio_raw pcspkr
nd_e820 videobuf2_vmalloc ttm evdev videobuf2_memops i2c_algo_bit
mousedev btusb videobuf2_core btrtl drm_kms_helper v4l2_common mac_hid
btbcm videodev btintel drm snd_hda_codec_conexant bluetooth
snd_hda_codec_generic iwlwifi syscopyarea sysfillrect sysimgblt
fb_sys_fops snd_hda_intel snd_hda_codec cfg80211 snd_hda_core
snd_hwdep i2c_i801 thinkpad_acpi lpc_ich snd_pcm sg mfd_core nvram
i2c_core snd_timer intel_ips rfkill hwmon snd mei_me soundcore
intel_agp mei tpm_tis intel_gtt shpchp tpm agpgart battery rtc_cmos ac
video thermal wmi acpi_cpufreq button processor tp_smapi(O)
thinkpad_ec(O) autofs4 ext4 crc16 mbcache jbd2 btrfs xor hid_generic
usbhid hid raid6_pq sr_mod cdrom sd_mod uas usb_storage firewire_ohci
ahci libahci crc32c_intel libata atkbd sdhci_pci scsi_mod ehci_pci
sdhci ehci_hcd e1000e firewire_core mmc_core crc_itu_t ptp usbcore
usb_common pps_core
[22946.929431] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O
4.3.0-westmere #1
[22946.932551] Hardware name: LENOVO 4313CTO/4313CTO, BIOS 6MET92WW
(1.52 ) 09/26/2012
[22946.935701] task: ffff88023231a580 ti: ffff88023232c000 task.ti:
ffff88023232c000
[22946.938878] RIP: 0010:[<ffffffff813cacd0>] [<ffffffff813cacd0>]
intel_unmap+0x1d0/0x210
[22946.942117] RSP: 0018:ffff88023bd83da8 EFLAGS: 00010046
[22946.945341] RAX: 0000000000000000 RBX: ffff880231ea5580 RCX: 0000000000000002
[22946.948592] RDX: 0000000000000000 RSI: 00000000fffebda0 RDI: ffff880231e7d098
[22946.951855] RBP: ffff88023bd83de0 R08: 0000000000000000 R09: 0000000000000000
[22946.955131] R10: 00000000563f08fc R11: 000000001849050d R12: ffff880231e7d098
[22946.958423] R13: ffff8800bacbbc20 R14: 00000000fffebda0 R15: 0000000000000000
[22946.961723] FS: 0000000000000000(0000) GS:ffff88023bd80000(0000)
knlGS:0000000000000000
[22946.965051] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[22946.968387] CR2: 00000000e4d9c0e0 CR3: 0000000001a0c000 CR4: 00000000000006e0
[22946.971760] Stack:
[22946.975131] ffff8800bacbbc60 0000000000000000 ffff880231ea5580
ffff880231ea5580
[22946.978598] ffff8800bacbbc20 0000000000000010 0000000000000000
ffff88023bd83df0
[22946.982064] ffffffff813cad22 ffff88023bd83e48 ffffffffc01090c2
0000000000000282
[22946.985546] Call Trace:
[22946.988984] <IRQ>
[22946.989016] [<ffffffff813cad22>] intel_unmap_sg+0x12/0x20
[22946.995844] [<ffffffffc01090c2>] sdhci_finish_data+0x142/0x340 [sdhci]
[22946.999296] [<ffffffffc0109f54>] sdhci_irq+0x484/0x9b5 [sdhci]
[22947.002759] [<ffffffff81078dea>] ? notifier_call_chain+0x4a/0x70
[22947.006222] [<ffffffff810affa9>] handle_irq_event_percpu+0x39/0x1b0
[22947.009694] [<ffffffff810b0160>] handle_irq_event+0x40/0x60
[22947.013160] [<ffffffff810b2e82>] handle_fasteoi_irq+0xc2/0x180
[22947.016633] [<ffffffff810070aa>] handle_irq+0x1a/0x30
[22947.020095] [<ffffffff81563ed7>] do_IRQ+0x57/0xf0
[22947.023553] [<ffffffff81562001>] common_interrupt+0x81/0x81
[22947.026992] <EOI>
[22947.027023] [<ffffffff8142736e>] ? cpuidle_enter_state+0x13e/0x2b0
[22947.033852] [<ffffffff81427363>] ? cpuidle_enter_state+0x133/0x2b0
[22947.037286] [<ffffffff81427517>] cpuidle_enter+0x17/0x20
[22947.040717] [<ffffffff81099382>] call_cpuidle+0x32/0x60
[22947.044131] [<ffffffff814274f3>] ? cpuidle_select+0x13/0x20
[22947.047554] [<ffffffff8109964e>] cpu_startup_entry+0x29e/0x360
[22947.050969] [<ffffffff8103539b>] start_secondary+0x15b/0x190
[22947.054379] Code: 01 44 29 f1 e8 12 c6 ff ff 4c 89 ee 4c 89 ff e8
b7 8d ff ff 4c 89 e7 e8 0f c7 ff ff 48 83 c4 10 5b 41 5c 41 5d 41 5e
41 5f 5d c3 <0f> 0b 49 8b 54 24 50 48 85 d2 74 29 4c 8b 45 d0 4c 89 f1
48 c7
[22947.058834] RIP [<ffffffff813cacd0>] intel_unmap+0x1d0/0x210
[22947.062568] RSP <ffff88023bd83da8>
[22947.066285] ---[ end trace 12b22e7424e94db4 ]---
[22947.069999] Kernel panic - not syncing: Fatal exception in interrupt
[22947.073803] Kernel Offset: disabled
[22947.077240] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

--
Denis


2015-12-15 16:01:11

by Ulf Hansson

[permalink] [raw]
Subject: Re: 4.3 kernel panics when MMC/SDHC card is inserted on thinkpad

+Adrian

On 8 November 2015 at 23:05, Denis Bychkov <[email protected]> wrote:
> The only started in 4.3 kernel (at least RC-5), 4.2.x does not have
> this problem. The kernel panic happens immediately after the SDHC card
> is inserted, reproducibility is 100%. If the system boots up with the
> card already inserted, it will crash as soon as sdhci_pci module is
> loaded. If the module is unloaded/blacklisted, obviously, nothing
> happens as the system does not see the MMC card reader.
> The machine is Lenovo thinkpad T-510 laptop with Intel Westmere
> CPU/3400 series chipset running 64-bit kernel 4.3.0.
>
> (somewhat) relevant kernel configuration bits:
> # CONFIG_CALGARY_IOMMU is not set
> CONFIG_IOMMU_HELPER=y
> CONFIG_VFIO_IOMMU_TYPE1=m
> CONFIG_IOMMU_API=y
> CONFIG_IOMMU_SUPPORT=y
> # Generic IOMMU Pagetable Support
> CONFIG_IOMMU_IOVA=y
> # CONFIG_AMD_IOMMU is not set
> CONFIG_INTEL_IOMMU=y
> CONFIG_INTEL_IOMMU_DEFAULT_ON=y
> CONFIG_INTEL_IOMMU_FLOPPY_WA=y
> # CONFIG_IOMMU_STRESS is not set
> CONFIG_KVM_INTEL=m
> CONFIG_PCI_MMCONFIG=y
> # Supported MMC/SDIO adapters
> CONFIG_MMC=m
> # CONFIG_MMC_DEBUG is not set
> # CONFIG_MMC_CLKGATE is not set
> # MMC/SD/SDIO Card Drivers
> CONFIG_MMC_BLOCK=m
> CONFIG_MMC_BLOCK_MINORS=8
> CONFIG_MMC_BLOCK_BOUNCE=y
> CONFIG_MMC_TEST=m
> # MMC/SD/SDIO Host Controller Drivers
> CONFIG_MMC_SDHCI=m
> CONFIG_MMC_SDHCI_PCI=m
> CONFIG_MMC_RICOH_MMC=y
> CONFIG_MMC_SDHCI_ACPI=m
>
> Card reader device:
> 0d:00.0 SD Host controller: Ricoh Co Ltd MMC/SD Host Controller (rev 01)
> Subsystem: Lenovo MMC/SD Host Controller
> Flags: bus master, fast devsel, latency 0, IRQ 16
> Memory at f2100000 (32-bit, non-prefetchable) [size=256]
> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> Capabilities: [78] Power Management version 3
> Capabilities: [80] Express Endpoint, MSI 00
> Capabilities: [100] Virtual Channel
> Capabilities: [800] Advanced Error Reporting
> Kernel driver in use: sdhci-pci
> Kernel modules: sdhci_pci
>
> The panic report caught via netconsole:
>
> [22946.904308] ------------[ cut here ]------------
> [22946.906564] kernel BUG at drivers/iommu/intel-iommu.c:3485!
> [22946.908801] invalid opcode: 0000 [#1] PREEMPT SMP
> [22946.911113] Modules linked in: netconsole dm_mod bnep
> cpufreq_powersave cpufreq_stats cpufreq_conservative cpufreq_userspace
> coretemp intel_powerclamp kvm_intel kvm crct10dif_pclmul crc32_pclmul
> jitterentropy_rng hmac sha256_ssse3 sha256_generic drbg
> snd_hda_codec_hdmi ansi_cprng gpio_ich iTCO_wdt iTCO_vendor_support
> aesni_intel arc4 aes_x86_64 nouveau mxm_wmi lrw gf128mul glue_helper
> ablk_helper iwldvm cryptd psmouse mac80211 uvcvideo serio_raw pcspkr
> nd_e820 videobuf2_vmalloc ttm evdev videobuf2_memops i2c_algo_bit
> mousedev btusb videobuf2_core btrtl drm_kms_helper v4l2_common mac_hid
> btbcm videodev btintel drm snd_hda_codec_conexant bluetooth
> snd_hda_codec_generic iwlwifi syscopyarea sysfillrect sysimgblt
> fb_sys_fops snd_hda_intel snd_hda_codec cfg80211 snd_hda_core
> snd_hwdep i2c_i801 thinkpad_acpi lpc_ich snd_pcm sg mfd_core nvram
> i2c_core snd_timer intel_ips rfkill hwmon snd mei_me soundcore
> intel_agp mei tpm_tis intel_gtt shpchp tpm agpgart battery rtc_cmos ac
> video thermal wmi acpi_cpufreq button processor tp_smapi(O)
> thinkpad_ec(O) autofs4 ext4 crc16 mbcache jbd2 btrfs xor hid_generic
> usbhid hid raid6_pq sr_mod cdrom sd_mod uas usb_storage firewire_ohci
> ahci libahci crc32c_intel libata atkbd sdhci_pci scsi_mod ehci_pci
> sdhci ehci_hcd e1000e firewire_core mmc_core crc_itu_t ptp usbcore
> usb_common pps_core
> [22946.929431] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O
> 4.3.0-westmere #1
> [22946.932551] Hardware name: LENOVO 4313CTO/4313CTO, BIOS 6MET92WW
> (1.52 ) 09/26/2012
> [22946.935701] task: ffff88023231a580 ti: ffff88023232c000 task.ti:
> ffff88023232c000
> [22946.938878] RIP: 0010:[<ffffffff813cacd0>] [<ffffffff813cacd0>]
> intel_unmap+0x1d0/0x210
> [22946.942117] RSP: 0018:ffff88023bd83da8 EFLAGS: 00010046
> [22946.945341] RAX: 0000000000000000 RBX: ffff880231ea5580 RCX: 0000000000000002
> [22946.948592] RDX: 0000000000000000 RSI: 00000000fffebda0 RDI: ffff880231e7d098
> [22946.951855] RBP: ffff88023bd83de0 R08: 0000000000000000 R09: 0000000000000000
> [22946.955131] R10: 00000000563f08fc R11: 000000001849050d R12: ffff880231e7d098
> [22946.958423] R13: ffff8800bacbbc20 R14: 00000000fffebda0 R15: 0000000000000000
> [22946.961723] FS: 0000000000000000(0000) GS:ffff88023bd80000(0000)
> knlGS:0000000000000000
> [22946.965051] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [22946.968387] CR2: 00000000e4d9c0e0 CR3: 0000000001a0c000 CR4: 00000000000006e0
> [22946.971760] Stack:
> [22946.975131] ffff8800bacbbc60 0000000000000000 ffff880231ea5580
> ffff880231ea5580
> [22946.978598] ffff8800bacbbc20 0000000000000010 0000000000000000
> ffff88023bd83df0
> [22946.982064] ffffffff813cad22 ffff88023bd83e48 ffffffffc01090c2
> 0000000000000282
> [22946.985546] Call Trace:
> [22946.988984] <IRQ>
> [22946.989016] [<ffffffff813cad22>] intel_unmap_sg+0x12/0x20
> [22946.995844] [<ffffffffc01090c2>] sdhci_finish_data+0x142/0x340 [sdhci]
> [22946.999296] [<ffffffffc0109f54>] sdhci_irq+0x484/0x9b5 [sdhci]
> [22947.002759] [<ffffffff81078dea>] ? notifier_call_chain+0x4a/0x70
> [22947.006222] [<ffffffff810affa9>] handle_irq_event_percpu+0x39/0x1b0
> [22947.009694] [<ffffffff810b0160>] handle_irq_event+0x40/0x60
> [22947.013160] [<ffffffff810b2e82>] handle_fasteoi_irq+0xc2/0x180
> [22947.016633] [<ffffffff810070aa>] handle_irq+0x1a/0x30
> [22947.020095] [<ffffffff81563ed7>] do_IRQ+0x57/0xf0
> [22947.023553] [<ffffffff81562001>] common_interrupt+0x81/0x81
> [22947.026992] <EOI>
> [22947.027023] [<ffffffff8142736e>] ? cpuidle_enter_state+0x13e/0x2b0
> [22947.033852] [<ffffffff81427363>] ? cpuidle_enter_state+0x133/0x2b0
> [22947.037286] [<ffffffff81427517>] cpuidle_enter+0x17/0x20
> [22947.040717] [<ffffffff81099382>] call_cpuidle+0x32/0x60
> [22947.044131] [<ffffffff814274f3>] ? cpuidle_select+0x13/0x20
> [22947.047554] [<ffffffff8109964e>] cpu_startup_entry+0x29e/0x360
> [22947.050969] [<ffffffff8103539b>] start_secondary+0x15b/0x190
> [22947.054379] Code: 01 44 29 f1 e8 12 c6 ff ff 4c 89 ee 4c 89 ff e8
> b7 8d ff ff 4c 89 e7 e8 0f c7 ff ff 48 83 c4 10 5b 41 5c 41 5d 41 5e
> 41 5f 5d c3 <0f> 0b 49 8b 54 24 50 48 85 d2 74 29 4c 8b 45 d0 4c 89 f1
> 48 c7
> [22947.058834] RIP [<ffffffff813cacd0>] intel_unmap+0x1d0/0x210
> [22947.062568] RSP <ffff88023bd83da8>
> [22947.066285] ---[ end trace 12b22e7424e94db4 ]---
> [22947.069999] Kernel panic - not syncing: Fatal exception in interrupt
> [22947.073803] Kernel Offset: disabled
> [22947.077240] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
>

Hi Denis,

Thanks for reporting and sorry for the delay!

Unfortunate, this isn't really my area of expertise and I don't have
the HW. In other words, I don't think I will be able to help much.

Instead, I am looping in Adrian Hunter, who might be able to have a
look at this.

Kind regards
Uffe

2015-12-16 07:54:12

by Adrian Hunter

[permalink] [raw]
Subject: Re: 4.3 kernel panics when MMC/SDHC card is inserted on thinkpad

On 15/12/15 18:01, Ulf Hansson wrote:
> +Adrian
>
> On 8 November 2015 at 23:05, Denis Bychkov <[email protected]> wrote:
>> The only started in 4.3 kernel (at least RC-5), 4.2.x does not have
>> this problem. The kernel panic happens immediately after the SDHC card
>> is inserted, reproducibility is 100%. If the system boots up with the
>> card already inserted, it will crash as soon as sdhci_pci module is
>> loaded. If the module is unloaded/blacklisted, obviously, nothing
>> happens as the system does not see the MMC card reader.
>> The machine is Lenovo thinkpad T-510 laptop with Intel Westmere
>> CPU/3400 series chipset running 64-bit kernel 4.3.0.
>>
>> (somewhat) relevant kernel configuration bits:
>> # CONFIG_CALGARY_IOMMU is not set
>> CONFIG_IOMMU_HELPER=y
>> CONFIG_VFIO_IOMMU_TYPE1=m
>> CONFIG_IOMMU_API=y
>> CONFIG_IOMMU_SUPPORT=y
>> # Generic IOMMU Pagetable Support
>> CONFIG_IOMMU_IOVA=y
>> # CONFIG_AMD_IOMMU is not set
>> CONFIG_INTEL_IOMMU=y
>> CONFIG_INTEL_IOMMU_DEFAULT_ON=y
>> CONFIG_INTEL_IOMMU_FLOPPY_WA=y
>> # CONFIG_IOMMU_STRESS is not set
>> CONFIG_KVM_INTEL=m
>> CONFIG_PCI_MMCONFIG=y
>> # Supported MMC/SDIO adapters
>> CONFIG_MMC=m
>> # CONFIG_MMC_DEBUG is not set
>> # CONFIG_MMC_CLKGATE is not set
>> # MMC/SD/SDIO Card Drivers
>> CONFIG_MMC_BLOCK=m
>> CONFIG_MMC_BLOCK_MINORS=8
>> CONFIG_MMC_BLOCK_BOUNCE=y
>> CONFIG_MMC_TEST=m
>> # MMC/SD/SDIO Host Controller Drivers
>> CONFIG_MMC_SDHCI=m
>> CONFIG_MMC_SDHCI_PCI=m
>> CONFIG_MMC_RICOH_MMC=y
>> CONFIG_MMC_SDHCI_ACPI=m
>>
>> Card reader device:
>> 0d:00.0 SD Host controller: Ricoh Co Ltd MMC/SD Host Controller (rev 01)
>> Subsystem: Lenovo MMC/SD Host Controller
>> Flags: bus master, fast devsel, latency 0, IRQ 16
>> Memory at f2100000 (32-bit, non-prefetchable) [size=256]
>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>> Capabilities: [78] Power Management version 3
>> Capabilities: [80] Express Endpoint, MSI 00
>> Capabilities: [100] Virtual Channel
>> Capabilities: [800] Advanced Error Reporting
>> Kernel driver in use: sdhci-pci
>> Kernel modules: sdhci_pci
>>
>> The panic report caught via netconsole:
>>
>> [22946.904308] ------------[ cut here ]------------
>> [22946.906564] kernel BUG at drivers/iommu/intel-iommu.c:3485!
>> [22946.908801] invalid opcode: 0000 [#1] PREEMPT SMP
>> [22946.911113] Modules linked in: netconsole dm_mod bnep
>> cpufreq_powersave cpufreq_stats cpufreq_conservative cpufreq_userspace
>> coretemp intel_powerclamp kvm_intel kvm crct10dif_pclmul crc32_pclmul
>> jitterentropy_rng hmac sha256_ssse3 sha256_generic drbg
>> snd_hda_codec_hdmi ansi_cprng gpio_ich iTCO_wdt iTCO_vendor_support
>> aesni_intel arc4 aes_x86_64 nouveau mxm_wmi lrw gf128mul glue_helper
>> ablk_helper iwldvm cryptd psmouse mac80211 uvcvideo serio_raw pcspkr
>> nd_e820 videobuf2_vmalloc ttm evdev videobuf2_memops i2c_algo_bit
>> mousedev btusb videobuf2_core btrtl drm_kms_helper v4l2_common mac_hid
>> btbcm videodev btintel drm snd_hda_codec_conexant bluetooth
>> snd_hda_codec_generic iwlwifi syscopyarea sysfillrect sysimgblt
>> fb_sys_fops snd_hda_intel snd_hda_codec cfg80211 snd_hda_core
>> snd_hwdep i2c_i801 thinkpad_acpi lpc_ich snd_pcm sg mfd_core nvram
>> i2c_core snd_timer intel_ips rfkill hwmon snd mei_me soundcore
>> intel_agp mei tpm_tis intel_gtt shpchp tpm agpgart battery rtc_cmos ac
>> video thermal wmi acpi_cpufreq button processor tp_smapi(O)
>> thinkpad_ec(O) autofs4 ext4 crc16 mbcache jbd2 btrfs xor hid_generic
>> usbhid hid raid6_pq sr_mod cdrom sd_mod uas usb_storage firewire_ohci
>> ahci libahci crc32c_intel libata atkbd sdhci_pci scsi_mod ehci_pci
>> sdhci ehci_hcd e1000e firewire_core mmc_core crc_itu_t ptp usbcore
>> usb_common pps_core
>> [22946.929431] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O
>> 4.3.0-westmere #1
>> [22946.932551] Hardware name: LENOVO 4313CTO/4313CTO, BIOS 6MET92WW
>> (1.52 ) 09/26/2012
>> [22946.935701] task: ffff88023231a580 ti: ffff88023232c000 task.ti:
>> ffff88023232c000
>> [22946.938878] RIP: 0010:[<ffffffff813cacd0>] [<ffffffff813cacd0>]
>> intel_unmap+0x1d0/0x210
>> [22946.942117] RSP: 0018:ffff88023bd83da8 EFLAGS: 00010046
>> [22946.945341] RAX: 0000000000000000 RBX: ffff880231ea5580 RCX: 0000000000000002
>> [22946.948592] RDX: 0000000000000000 RSI: 00000000fffebda0 RDI: ffff880231e7d098
>> [22946.951855] RBP: ffff88023bd83de0 R08: 0000000000000000 R09: 0000000000000000
>> [22946.955131] R10: 00000000563f08fc R11: 000000001849050d R12: ffff880231e7d098
>> [22946.958423] R13: ffff8800bacbbc20 R14: 00000000fffebda0 R15: 0000000000000000
>> [22946.961723] FS: 0000000000000000(0000) GS:ffff88023bd80000(0000)
>> knlGS:0000000000000000
>> [22946.965051] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [22946.968387] CR2: 00000000e4d9c0e0 CR3: 0000000001a0c000 CR4: 00000000000006e0
>> [22946.971760] Stack:
>> [22946.975131] ffff8800bacbbc60 0000000000000000 ffff880231ea5580
>> ffff880231ea5580
>> [22946.978598] ffff8800bacbbc20 0000000000000010 0000000000000000
>> ffff88023bd83df0
>> [22946.982064] ffffffff813cad22 ffff88023bd83e48 ffffffffc01090c2
>> 0000000000000282
>> [22946.985546] Call Trace:
>> [22946.988984] <IRQ>
>> [22946.989016] [<ffffffff813cad22>] intel_unmap_sg+0x12/0x20
>> [22946.995844] [<ffffffffc01090c2>] sdhci_finish_data+0x142/0x340 [sdhci]
>> [22946.999296] [<ffffffffc0109f54>] sdhci_irq+0x484/0x9b5 [sdhci]
>> [22947.002759] [<ffffffff81078dea>] ? notifier_call_chain+0x4a/0x70
>> [22947.006222] [<ffffffff810affa9>] handle_irq_event_percpu+0x39/0x1b0
>> [22947.009694] [<ffffffff810b0160>] handle_irq_event+0x40/0x60
>> [22947.013160] [<ffffffff810b2e82>] handle_fasteoi_irq+0xc2/0x180
>> [22947.016633] [<ffffffff810070aa>] handle_irq+0x1a/0x30
>> [22947.020095] [<ffffffff81563ed7>] do_IRQ+0x57/0xf0
>> [22947.023553] [<ffffffff81562001>] common_interrupt+0x81/0x81
>> [22947.026992] <EOI>
>> [22947.027023] [<ffffffff8142736e>] ? cpuidle_enter_state+0x13e/0x2b0
>> [22947.033852] [<ffffffff81427363>] ? cpuidle_enter_state+0x133/0x2b0
>> [22947.037286] [<ffffffff81427517>] cpuidle_enter+0x17/0x20
>> [22947.040717] [<ffffffff81099382>] call_cpuidle+0x32/0x60
>> [22947.044131] [<ffffffff814274f3>] ? cpuidle_select+0x13/0x20
>> [22947.047554] [<ffffffff8109964e>] cpu_startup_entry+0x29e/0x360
>> [22947.050969] [<ffffffff8103539b>] start_secondary+0x15b/0x190
>> [22947.054379] Code: 01 44 29 f1 e8 12 c6 ff ff 4c 89 ee 4c 89 ff e8
>> b7 8d ff ff 4c 89 e7 e8 0f c7 ff ff 48 83 c4 10 5b 41 5c 41 5d 41 5e
>> 41 5f 5d c3 <0f> 0b 49 8b 54 24 50 48 85 d2 74 29 4c 8b 45 d0 4c 89 f1
>> 48 c7
>> [22947.058834] RIP [<ffffffff813cacd0>] intel_unmap+0x1d0/0x210
>> [22947.062568] RSP <ffff88023bd83da8>
>> [22947.066285] ---[ end trace 12b22e7424e94db4 ]---
>> [22947.069999] Kernel panic - not syncing: Fatal exception in interrupt
>> [22947.073803] Kernel Offset: disabled
>> [22947.077240] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
>>
>
> Hi Denis,
>
> Thanks for reporting and sorry for the delay!
>
> Unfortunate, this isn't really my area of expertise and I don't have
> the HW. In other words, I don't think I will be able to help much.
>
> Instead, I am looping in Adrian Hunter, who might be able to have a
> look at this.

Have you tried bisecting to find the commit that causes this?

2016-12-05 01:05:24

by David F

[permalink] [raw]
Subject: Re: 4.3 kernel panics when MMC/SDHC card is inserted on thinkpad

On 16/12/15 09:50, Adrian Hunter wrote:
>
>
> On 15/12/15 18:01, Ulf Hansson wrote:
>> +Adrian
>>
>> On 8 November 2015 at 23:05, Denis Bychkov <[email protected]> wrote:
>>> The only started in 4.3 kernel (at least RC-5), 4.2.x does not have
>>> this problem. The kernel panic happens immediately after the SDHC card
>>> is inserted, reproducibility is 100%. If the system boots up with the
>>> card already inserted, it will crash as soon as sdhci_pci module is
>>> loaded. If the module is unloaded/blacklisted, obviously, nothing
>>> happens as the system does not see the MMC card reader.
>>> The machine is Lenovo thinkpad T-510 laptop with Intel Westmere
>>> CPU/3400 series chipset running 64-bit kernel 4.3.0.
>>>
>>
>> Hi Denis,
>>
>> Thanks for reporting and sorry for the delay!
>>
>> Unfortunate, this isn't really my area of expertise and I don't have
>> the HW. In other words, I don't think I will be able to help much.
>>
>> Instead, I am looping in Adrian Hunter, who might be able to have a
>> look at this.
>
> Have you tried bisecting to find the commit that causes this?
>

Hello,

I have been experiencing the same panics when inserting SDHC cards.

My environment differs from Denis in the following way:
-Thinkpad T420 with i5-2520M

Apart from that, identical symptoms. Issue still present in 4.9-rc7.

I note that if you disable Intel VT-D in the BIOS, this issue does not
occur.

I ran the requested git bisect today between 4.2 and 4.3-rc1 and it
narrowed down to the following commit:

# git bisect bad
f303e50766298feac17c8715e29ecd14b2c12680 is the first bad commit
commit f303e50766298feac17c8715e29ecd14b2c12680
Author: Joerg Roedel <[email protected]>
Date: Thu Jul 23 18:37:13 2015 +0200

iommu/vt-d: Avoid duplicate device_domain_info structures

When a 'struct device_domain_info' is created as an alias
for another device, this struct will not be re-used when the
real device is encountered. Fix that to avoid duplicate
device_domain_info structures being added.

Signed-off-by: Joerg Roedel <[email protected]>

:040000 040000 8f5a7521ef1cdbd8e82b625c5348a1210fe1bf5d
5d2a156956e5cadc9ab35c48b661bbd5fe9d5587 M drivers


Below is my oops, I apologize if netconsole has formatted it badly:

Aug 19 13:32:20 taz [ 156.425627] ------------[ cut here ]------------
Aug 19 13:32:20 taz [ 156.428136] kernel BUG at
drivers/iommu/intel-iommu.c:3682!
Aug 19 13:32:20 taz [ 156.430630] invalid opcode: 0000 [#1] PREEMPT SMP
Aug 19 13:32:20 taz [ 156.433138] Modules linked in:
Aug 19 13:32:20 taz nf_nat_sip
Aug 19 13:32:20 taz nf_conntrack_sip
Aug 19 13:32:20 taz ebtable_filter
Aug 19 13:32:20 taz ebtables
Aug 19 13:32:20 taz uvcvideo
Aug 19 13:32:20 taz videobuf2_vmalloc
Aug 19 13:32:20 taz videobuf2_memops
Aug 19 13:32:20 taz videobuf2_v4l2
Aug 19 13:32:20 taz videobuf2_core
Aug 19 13:32:20 taz iwldvm
Aug 19 13:32:20 taz mac80211
Aug 19 13:32:20 taz iwlwifi
Aug 19 13:32:20 taz
Aug 19 13:32:20 taz [ 156.435979] CPU: 0 PID: 0 Comm: swapper/0 Not
tainted 4.7.0-c2h2 #2
Aug 19 13:32:20 taz [ 156.438694] Hardware name: LENOVO
4236AR1/4236AR1, BIOS 83ET78WW (1.48 ) 01/21/2016
Aug 19 13:32:20 taz [ 156.441423] task: ffffffff8260e540 ti:
ffffffff82600000 task.ti: ffffffff82600000
Aug 19 13:32:20 taz [ 156.444155] RIP: 0010:[<ffffffff816acae3>]
Aug 19 13:32:20 taz [<ffffffff816acae3>] intel_unmap+0x1f3/0x200
Aug 19 13:32:20 taz [ 156.446949] RSP: 0018:ffff88041e203e38 EFLAGS:
00010046
Aug 19 13:32:20 taz [ 156.449706] RAX: 0000000000000000 RBX:
ffff88040b8fc0a0 RCX: 0000000000000a98
Aug 19 13:32:20 taz [ 156.452535] RDX: 0000000000000000 RSI:
00000000fffec000 RDI: ffff88040b8fc0a0
Aug 19 13:32:20 taz [ 156.455411] RBP: ffff88041e203e68 R08:
0000000000000000 R09: 0000000000000010
Aug 19 13:32:20 taz [ 156.458276] R10: 0000000057b750b4 R11:
003b9aca00000000 R12: 0000000000000001
Aug 19 13:32:20 taz [ 156.461125] R13: 00000000fffec000 R14:
00000000fffec000 R15: 0000000000001000
Aug 19 13:32:20 taz [ 156.464001] FS: 0000000000000000(0000)
GS:ffff88041e200000(0000) knlGS:0000000000000000
Aug 19 13:32:20 taz [ 156.466936] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Aug 19 13:32:20 taz [ 156.469913] CR2: 00007f02dde4a000 CR3:
0000000002607000 CR4: 00000000000406f0
Aug 19 13:32:20 taz [ 156.472917] Stack:
Aug 19 13:32:20 taz [ 156.475883] ffff8800da7308a0
Aug 19 13:32:20 taz 0000000000001000
Aug 19 13:32:20 taz 0000000000000001
Aug 19 13:32:20 taz ffff88040b8fc0a0
Aug 19 13:32:20 taz
Aug 19 13:32:20 taz [ 156.478991] 00000000fffec000
Aug 19 13:32:20 taz 0000000000000001
Aug 19 13:32:20 taz ffff88041e203ea0
Aug 19 13:32:20 taz ffffffff816acb5e
Aug 19 13:32:20 taz
Aug 19 13:32:20 taz [ 156.482132] ffff8800da730540
Aug 19 13:32:20 taz ffff88040bdf7cc8
Aug 19 13:32:20 taz ffff8800da730788
Aug 19 13:32:20 taz 0000000000000247
Aug 19 13:32:20 taz
Aug 19 13:32:20 taz [ 156.485338] Call Trace:
Aug 19 13:32:20 taz [ 156.488574] <IRQ>
Aug 19 13:32:20 taz 156.488574] <IRQ> ace:
30540a000 CR3: 0000000002607000 CR4: 00000000000406f0
Aug 19 13:32:20 taz [ 156.488613] [<ffffffff816acb5e>]
intel_unmap_sg+0x6e/0x80
Aug 19 13:32:20 taz [ 156.494835] [<ffffffff81b5493e>]
sdhci_tasklet_finish+0x16e/0x1d0
Aug 19 13:32:20 taz [ 156.498025] [<ffffffff810dd47e>]
tasklet_action+0x16e/0x180
Aug 19 13:32:20 taz [ 156.501239] [<ffffffff810dd744>]
__do_softirq+0x94/0x2d0
Aug 19 13:32:20 taz [ 156.504431] [<ffffffff810ddad6>] irq_exit+0x96/0xa0
Aug 19 13:32:20 taz [ 156.507605] [<ffffffff8108b41b>] do_IRQ+0x5b/0xf0
Aug 19 13:32:20 taz [ 156.510795] [<ffffffff81ec72c2>]
common_interrupt+0x82/0x82
Aug 19 13:32:20 taz [ 156.513991] <EOI>
Aug 19 13:32:20 taz 156.513991] <EOI> ffff81ec72c2>]
common_interrupt+0x82/0x82
Aug 19 13:32:20 taz [ 156.514028] [<ffffffff810b6256>] ?
native_safe_halt+0x6/0x10
Aug 19 13:32:20 taz [ 156.520424] [<ffffffff8162e973>]
arch_safe_halt+0x9/0xd
Aug 19 13:32:20 taz [ 156.523499] [<ffffffff8162f149>]
acpi_safe_halt+0x1d/0x26
Aug 19 13:32:20 taz [ 156.526460] [<ffffffff8162f16d>]
acpi_idle_do_entry+0x1b/0x2b
Aug 19 13:32:20 taz [ 156.529297] [<ffffffff8162f4d3>]
acpi_idle_enter+0x1de/0x200
Aug 19 13:32:20 taz [ 156.532129] [<ffffffff81092499>] ?
sched_clock+0x9/0x10
Aug 19 13:32:20 taz [ 156.534950] [<ffffffff81b3b708>]
cpuidle_enter_state+0x88/0x2b0
Aug 19 13:32:20 taz [ 156.537785] [<ffffffff81b3b952>]
cpuidle_enter+0x12/0x20
Aug 19 13:32:20 taz [ 156.540626] [<ffffffff811157e5>]
call_cpuidle+0x25/0x40
Aug 19 13:32:20 taz [ 156.543469] [<ffffffff81115aca>]
cpu_startup_entry+0x1ca/0x350
Aug 19 13:32:20 taz [ 156.546318] [<ffffffff81ebf28f>] rest_init+0x7f/0x90
Aug 19 13:32:20 taz [ 156.549156] [<ffffffff827e3f4a>]
start_kernel+0x40a/0x417
Aug 19 13:32:20 taz [ 156.551972] [<ffffffff827e3120>] ?
early_idt_handler_array+0x120/0x120
Aug 19 13:32:20 taz [ 156.554795] [<ffffffff827e3481>]
x86_64_start_reservations+0x2f/0x31
Aug 19 13:32:20 taz [ 156.557617] [<ffffffff827e35be>]
x86_64_start_kernel+0x13b/0x14a
Aug 19 13:32:20 taz [ 156.560411] Code:
Aug 19 13:32:20 taz cf
Aug 19 13:32:20 taz ff
Aug 19 13:32:20 taz ff
Aug 19 13:32:20 taz 41
Aug 19 13:32:20 taz 8d
Aug 19 13:32:20 taz 57
Aug 19 13:32:20 taz 01
Aug 19 13:32:20 taz be
Aug 19 13:32:20 taz 08
Aug 19 13:32:20 taz 00
Aug 19 13:32:20 taz 00
Aug 19 13:32:20 taz 00
Aug 19 13:32:20 taz 48
Aug 19 13:32:20 taz c7
Aug 19 13:32:20 taz c7
Aug 19 13:32:20 taz 18
Aug 19 13:32:20 taz 60
Aug 19 13:32:20 taz 7b
Aug 19 13:32:20 taz 82
Aug 19 13:32:20 taz 48
Aug 19 13:32:20 taz 63
Aug 19 13:32:20 taz d2
Aug 19 13:32:20 taz e8
Aug 19 13:32:20 taz 0d
Aug 19 13:32:20 taz 07
Aug 19 13:32:20 taz ef
Aug 19 13:32:20 taz ff
Aug 19 13:32:20 taz 3b
Aug 19 13:32:20 taz 05
Aug 19 13:32:20 taz 8b
Aug 19 13:32:20 taz a6
Aug 19 13:32:20 taz 10
Aug 19 13:32:20 taz 01
Aug 19 13:32:20 taz 41
Aug 19 13:32:20 taz 89
Aug 19 13:32:20 taz c7
Aug 19 13:32:20 taz 7c
Aug 19 13:32:20 taz d5
Aug 19 13:32:20 taz e9
Aug 19 13:32:20 taz c6
Aug 19 13:32:20 taz fe
Aug 19 13:32:20 taz ff
Aug 19 13:32:20 taz ff
Aug 19 13:32:20 taz 0b
Aug 19 13:32:20 taz e8
Aug 19 13:32:20 taz 16
Aug 19 13:32:20 taz 65
Aug 19 13:32:20 taz 95
Aug 19 13:32:20 taz ff
Aug 19 13:32:20 taz e9
Aug 19 13:32:20 taz 3e
Aug 19 13:32:20 taz ff
Aug 19 13:32:20 taz ff
Aug 19 13:32:20 taz ff
Aug 19 13:32:20 taz 90
Aug 19 13:32:20 taz 55
Aug 19 13:32:20 taz 48
Aug 19 13:32:20 taz 89
Aug 19 13:32:20 taz f0
Aug 19 13:32:20 taz 48
Aug 19 13:32:20 taz 89
Aug 19 13:32:20 taz e5
Aug 19 13:32:20 taz 41
Aug 19 13:32:20 taz
Aug 19 13:32:20 taz [ 156.567252] RIP
Aug 19 13:32:20 taz [<ffffffff816acae3>] intel_unmap+0x1f3/0x200
Aug 19 13:32:20 taz [ 156.570418] RSP <ffff88041e203e38>
Aug 19 13:32:20 taz [ 156.573544] ---[ end trace 3df10ff5fe14bb13 ]---

Please let me know if I can be of further assistance.

Regards,
David

2016-12-05 10:22:13

by Jörg Rödel

[permalink] [raw]
Subject: Re: 4.3 kernel panics when MMC/SDHC card is inserted on thinkpad

Hi David,

On Sun, Dec 04, 2016 at 06:57:57PM -0600, David F wrote:
> Aug 19 13:32:20 taz [ 156.425627] ------------[ cut here ]------------
> Aug 19 13:32:20 taz [ 156.428136] kernel BUG at
> drivers/iommu/intel-iommu.c:3682!

This BUG_ON triggered because the IOMMU driver can't find a domain for
the device passed to intel_unmap. This looks like an IOMMU bug, but I am
not 100% sure yet, because if there is no domain for a device the
intel_map_page path returns 0 and the intel_unmap function should not be
called.

I need a couple of things to track this down. Can you please build a
kernel with CONFIG_DMA_API_DEBUG=y and boot the kernel with IOMMU
disabled? Insert and remove an SD-Card with this kernel and send me a
full dmesg.

Please also send me the output of 'lspci -v' and a full dmesg with IOMMU
enabled and the BUG triggered.


Thanks,

Joerg

2016-12-05 12:33:17

by David F

[permalink] [raw]
Subject: Re: 4.3 kernel panics when MMC/SDHC card is inserted on thinkpad

On 12/05/2016 04:20 AM, Joerg Roedel wrote:
> Hi David,
>
> On Sun, Dec 04, 2016 at 06:57:57PM -0600, David F wrote:
>> Aug 19 13:32:20 taz [ 156.425627] ------------[ cut here ]------------
>> Aug 19 13:32:20 taz [ 156.428136] kernel BUG at
>> drivers/iommu/intel-iommu.c:3682!
>
> This BUG_ON triggered because the IOMMU driver can't find a domain for
> the device passed to intel_unmap. This looks like an IOMMU bug, but I am
> not 100% sure yet, because if there is no domain for a device the
> intel_map_page path returns 0 and the intel_unmap function should not be
> called.
>
> I need a couple of things to track this down. Can you please build a
> kernel with CONFIG_DMA_API_DEBUG=y and boot the kernel with IOMMU
> disabled? Insert and remove an SD-Card with this kernel and send me a
> full dmesg.
>
> Please also send me the output of 'lspci -v' and a full dmesg with IOMMU
> enabled and the BUG triggered.
>
>
> Thanks,
>
> Joerg
>

Hello Joerg,

Thanks for looking at this.

I have attached the output you requested.

I did enable the DMA_API_DEBUG, but I did not notice any additional
output anywhere with this debug enabled. I verified it was enabled
looking at /proc/config.gz. My apologies if I missed something, but
hopefully the output is what you need.


Thanks,
David


Attachments:
lspci.txt (8.04 kB)
dmesg-dmadebug-iommu_disabled.txt (63.27 kB)
dmesg_full_andnetcons_bug_triggered.txt (71.50 kB)
Download all attachments