2023-07-11 16:12:23

by Johan Hovold

[permalink] [raw]
Subject: Oops on /proc/interrupt access with 6.5-rc1

Hi,

Konrad reported on IRC that he hit a segfault and hang when watch:ing
/proc/interrupts with 6.5-rc1.

I tried simply catting it and hit the below oops immediately with my
X13s (aarch64).

Commit 721255b9826b ("genirq: Use a maple tree for interrupt descriptor
management") stood out when skimming the log, and Marc soon suggested
the same possible culprit on IRC.

I have not been able to reproduce it with the maple tree patch reverted,
but I hit it again after adding it back. Did not trigger immediately
after boot though, I had had the machine idling for a few minutes in
between.

Marc asked for a dump so figured I'd CC the list as well.

Johan


[ 2546.693932] Unable to handle kernel paging request at virtual address ffff80008106bb19
[ 2546.695148] Mem abort info:
[ 2546.695562] ESR = 0x0000000096000007
[ 2546.695976] EC = 0x25: DABT (current EL), IL = 32 bits
[ 2546.696394] SET = 0, FnV = 0
[ 2546.696807] EA = 0, S1PTW = 0
[ 2546.697220] FSC = 0x07: level 3 translation fault
[ 2546.697642] Data abort info:
[ 2546.698066] ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
[ 2546.698494] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 2546.698922] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 2546.699355] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000002d7a05000
[ 2546.699792] [ffff80008106bb19] pgd=10000001000a5003, p4d=10000001000a5003, pud=10000001000a6003, pmd=1000000100d5a003, pte=0000000000000000
[ 2546.700387] Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP
[ 2546.700796] Modules linked in: snd_soc_wsa883x q6prm_clocks q6apm_lpass_dais snd_q6dsp_common q6apm_dai q6prm michael_mic cbc des_generic libdes ecb algif_skcipher md5 algif_hash af_alg ip6_tables xt_LOG nf_log_syslog ipt_REJECT nf_reject_ipv4 xt_tcpudp snd_q6apm xt_conntrack nf_conntrack libcrc32c nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter r8152 mii qrtr_mhi panel_edp snd_soc_hdmi_codec venus_enc venus_dec apr videobuf2_dma_contig videobuf2_memops fastrpc qrtr_smd rpmsg_ctrl rpmsg_char qcom_pm8008_regulator qcom_battmgr pmic_glink_altmode ath11k_pci ath11k venus_core snd_soc_wcd938x v4l2_mem2mem hci_uart mac80211 msm videobuf2_v4l2 snd_soc_wcd938x_sdw snd_soc_sc8280xp libarc4 btqca snd_soc_qcom_common regmap_sdw snd_soc_lpass_rx_macro videodev snd_soc_lpass_va_macro soundwire_qcom leds_qcom_lpg snd_soc_lpass_wsa_macro bluetooth snd_soc_lpass_tx_macro qcom_spmi_adc_tm5 snd_soc_wcd_mbhc snd_soc_qcom_sdw snd_soc_lpass_macro_common cfg80211 gpu_sched gpio_sbu_mux videobuf2_common qcom_spmi_temp_alarm snd_soc_core
[ 2546.700875] qcom_spmi_adc5 ecdh_generic drm_display_helper ecc qcom_pon mc snd_compress qcom_q6v5_pas industrialio rtc_pm8xxx reboot_mode phy_qcom_qmp_combo mhi led_class_multicolor nvmem_qcom_spmi_sdam drm_dp_aux_bus rfkill qcom_vadc_common snd_pcm qcom_pil_info drm_kms_helper qcom_common phy_qcom_edp qcom_pm8008 qcom_stats qrtr qcom_glink_smem snd_timer typec videocc_sc8280xp icc_bwmon qcom_q6v5 phy_qcom_qmp_usb pinctrl_sc8280xp_lpass_lpi regmap_i2c snd qcom_sysmon phy_qcom_snps_femto_v2 pmic_glink soundwire_bus pinctrl_lpass_lpi pdr_interface lpasscc_sc8280xp icc_osm_l3 mdt_loader soundcore socinfo qcom_wdt qcom_rng qmi_helpers pwm_bl drm dm_mod ip_tables x_tables ipv6 pcie_qcom crc8 phy_qcom_qmp_pcie nvme nvme_core hid_multitouch i2c_qcom_geni i2c_hid_of i2c_hid i2c_core
[ 2546.705703] CPU: 4 PID: 610 Comm: cat Not tainted 6.5.0-rc1 #45
[ 2546.706287] Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET53W (1.25 ) 10/12/2022
[ 2546.706880] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 2546.707476] pc : string+0x4c/0xfc
[ 2546.708080] lr : vsnprintf+0x170/0x748
[ 2546.708674] sp : ffff800083563ac0
[ 2546.709265] x29: ffff800083563ac0 x28: ffff11b942bca791 x27: ffffbb03f92e0974
[ 2546.709866] x26: ffffbb03f92e0974 x25: 0000000000000020 x24: 0000000000000871
[ 2546.710476] x23: 00000000ffffffd8 x22: ffffbb03f9161778 x21: ffff800083563c10
[ 2546.711083] x20: ffff11b942bca78f x19: ffff11b942bcb000 x18: 0000000000000020
[ 2546.711688] x17: 0000000000000000 x16: 0000000000000000 x15: ffffffffffffffff
[ 2546.712297] x14: 0000000000000001 x13: 0000000000000003 x12: ffff11b942bca783
[ 2546.712910] x11: 0000000000000000 x10: 0000000000000020 x9 : 0000000000000000
[ 2546.713522] x8 : 00000000ffffffff x7 : ffff800083563c10 x6 : 0000000000000020
[ 2546.714133] x5 : ffff11b942bcb000 x4 : 0000000000000000 x3 : ffff0a00ffffff04
[ 2546.714752] x2 : ffff80008106bb19 x1 : ffffffffffffffff x0 : ffff11b942bca791
[ 2546.715362] Call trace:
[ 2546.715962] string+0x4c/0xfc
[ 2546.716557] vsnprintf+0x170/0x748
[ 2546.717152] seq_printf+0xb4/0xd0
[ 2546.717746] show_interrupts+0x2f4/0x4e8
[ 2546.718345] seq_read_iter+0x3bc/0x4ac
[ 2546.718940] proc_reg_read_iter+0x84/0xd8
[ 2546.719539] vfs_read+0x1d4/0x294
[ 2546.720137] ksys_read+0x68/0xf4
[ 2546.720735] __arm64_sys_read+0x1c/0x28
[ 2546.721335] invoke_syscall+0x48/0x114
[ 2546.721934] el0_svc_common.constprop.0+0x60/0x10c
[ 2546.722536] do_el0_svc+0x30/0x88
[ 2546.723132] el0_svc+0x40/0xac
[ 2546.723729] el0t_64_sync_handler+0xc0/0xc4
[ 2546.724329] el0t_64_sync+0x190/0x194
[ 2546.724930] Code: 91000400 110004e1 eb08009f 540000e0 (38646846)
[ 2546.725536] ---[ end trace 0000000000000000 ]---
[ 2546.726143] note: cat[610] exited with irqs disabled
[ 2546.726781] note: cat[610] exited with preempt_count 1



2023-07-11 17:11:27

by Shanker Donthineni

[permalink] [raw]
Subject: Re: Oops on /proc/interrupt access with 6.5-rc1

Hi,

On 7/11/23 10:51, Johan Hovold wrote:
> External email: Use caution opening links or attachments
>
>
> Hi,
>
> Konrad reported on IRC that he hit a segfault and hang when watch:ing
> /proc/interrupts with 6.5-rc1.
>
> I tried simply catting it and hit the below oops immediately with my
> X13s (aarch64).
>

I have successfully verified the execution of the "cat /proc/interrupts" command
on the NVIDIA-GRACE server platform, using v6.5.0-rc1, without any errors. I
conducted tests using 8, 16 and 72 CPUs by setting the max number of CPUs
(maxcpus=). Not able to reproduce the Oops, tried ~10 times.

root@Grace# uname -a
Linux Grace 6.5.0-rc1 #2 SMP Tue Jul 11 11:13:59 CDT 2023 aarch64 GNU/Linux

root@Grace# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
9: 0 0 0 0 0 0 0 0 GICv3 25 Level vgic
10: 0 0 0 0 0 0 0 0 GICv3 30 Level kvm guest ptimer
11: 0 0 0 0 0 0 0 0 GICv3 27 Level kvm guest vtimer
12: 3315 1855 1750 3268 10540 2394 8336 1607 GICv3 26 Level arch_timer
18: 0 0 0 0 0 0 0 0 GICv3 276 Edge arm-smmu-v3-evtq
19: 0 0 0 0 0 0 0 0 GICv3 277 Edge arm-smmu-v3-gerror
20: 0 0 0 0 0 0 0 0 GICv3 285 Edge arm-smmu-v3-evtq
21: 0 0 0 0 0 0 0 0 GICv3 286 Edge arm-smmu-v3-gerror
22: 0 0 0 0 0 0 0 0 GICv3 294 Edge arm-smmu-v3-evtq
23: 0 0 0 0 0 0 0 0 GICv3 295 Edge arm-smmu-v3-gerror
24: 0 0 0 0 0 0 0 0 GICv3 303 Edge arm-smmu-v3-evtq
25: 0 0 0 0 0 0 0 0 GICv3 304 Edge arm-smmu-v3-gerror
26: 3 0 0 0 0 0 0 0 GICv3 312 Edge arm-smmu-v3-evtq
27: 0 0 0 0 0 0 0 0 GICv3 313 Edge arm-smmu-v3-gerror
33: 0 0 0 0 0 0 0 0 GICv3 226 Level ACPI:Ged
34: 0 0 0 0 0 0 0 0 GICv3 227 Level ACPI:Ged
65: 1724 0 0 0 0 0 0 0 GICv3 202 Level uart-pl011
68: 0 0 0 0 0 0 0 0 ITS-MSI 1077444608 Edge ehci_hcd:usb1
69: 0 0 0 0 0 0 0 0 GICv3 23 Level arm-pmu
84: 0 150 0 0 0 0 0 0 ITS-MSI 1075314688 Edge nvme0q0
85: 0 0 0 0 0 0 0 0 ITS-MSI 1075314689 Edge nvme0q1
86: 0 0 0 0 0 0 0 0 ITS-MSI 1075314690 Edge nvme0q2
87: 0 0 0 0 0 0 0 0 ITS-MSI 1075314691 Edge nvme0q3
88: 0 0 0 0 0 0 0 0 ITS-MSI 1075314692 Edge nvme0q4
89: 0 0 0 0 10 0 0 0 ITS-MSI 1075314693 Edge nvme0q5
90: 0 0 0 0 0 0 0 0 ITS-MSI 1075314694 Edge nvme0q6
91: 0 0 0 0 0 0 0 0 ITS-MSI 1075314695 Edge nvme0q7
92: 0 0 0 0 0 0 0 0 ITS-MSI 1075314696 Edge nvme0q8
93: 0 0 0 0 0 0 0 0 ITS-MSI 1075314697 Edge nvme0q9
94: 0 0 0 0 0 0 0 0 ITS-MSI 1075314698 Edge nvme0q10
95: 0 0 0 0 0 0 0 0 ITS-MSI 1075314699 Edge nvme0q11
96: 0 0 0 0 0 0 0 0 ITS-MSI 1075314700 Edge nvme0q12
97: 0 0 0 0 0 0 0 0 ITS-MSI 1075314701 Edge nvme0q13
98: 0 0 0 0 0 0 0 0 ITS-MSI 1075314702 Edge nvme0q14
99: 0 0 0 0 0 0 0 0 ITS-MSI 1075314703 Edge nvme0q15
100: 0 0 0 0 0 0 0 0 ITS-MSI 1075314704 Edge nvme0q16
101: 0 0 0 0 0 0 0 0 ITS-MSI 1075314705 Edge nvme0q17
102: 0 0 0 0 0 0 0 0 ITS-MSI 1075314706 Edge nvme0q18
103: 0 0 0 0 0 0 0 0 ITS-MSI 1075314707 Edge nvme0q19
104: 0 0 0 0 0 0 0 0 ITS-MSI 1075314708 Edge nvme0q20
105: 0 0 0 0 0 0 0 0 ITS-MSI 1075314709 Edge nvme0q21
106: 0 0 0 0 0 0 0 0 ITS-MSI 1075314710 Edge nvme0q22
107: 0 0 0 0 0 0 0 0 ITS-MSI 1075314711 Edge nvme0q23
108: 0 0 0 0 0 0 0 0 ITS-MSI 1075314712 Edge nvme0q24
109: 0 0 0 0 0 0 0 0 ITS-MSI 1075314713 Edge nvme0q25
110: 0 0 0 0 0 0 0 0 ITS-MSI 1075314714 Edge nvme0q26
111: 0 0 0 0 0 0 0 0 ITS-MSI 1075314715 Edge nvme0q27
112: 0 0 0 0 0 0 0 0 ITS-MSI 1075314716 Edge nvme0q28
113: 0 0 0 0 0 0 0 0 ITS-MSI 1075314717 Edge nvme0q29
114: 0 0 0 0 0 0 0 0 ITS-MSI 1075314718 Edge nvme0q30
115: 0 0 0 0 0 0 0 0 ITS-MSI 1075314719 Edge nvme0q31
116: 0 0 0 0 0 0 0 0 ITS-MSI 1075314720 Edge nvme0q32
117: 0 0 0 0 0 0 0 0 ITS-MSI 1075314721 Edge nvme0q33
118: 0 0 0 0 0 0 0 0 ITS-MSI 1075314722 Edge nvme0q34
119: 0 0 0 0 0 0 0 0 ITS-MSI 1075314723 Edge nvme0q35
120: 0 0 0 0 0 0 0 0 ITS-MSI 1075314724 Edge nvme0q36
121: 0 0 0 0 0 0 0 0 ITS-MSI 1075314725 Edge nvme0q37
122: 0 0 0 0 0 0 0 0 ITS-MSI 1075314726 Edge nvme0q38
123: 0 0 0 0 0 0 0 0 ITS-MSI 1075314727 Edge nvme0q39
124: 0 0 0 0 0 0 0 0 ITS-MSI 1075314728 Edge nvme0q40
125: 0 0 0 0 0 0 0 0 ITS-MSI 1075314729 Edge nvme0q41
126: 0 0 0 0 0 0 0 0 ITS-MSI 1075314730 Edge nvme0q42
127: 0 0 0 0 0 0 0 0 ITS-MSI 1075314731 Edge nvme0q43
128: 0 0 0 0 0 0 0 0 ITS-MSI 1075314732 Edge nvme0q44
129: 0 0 0 0 0 0 0 0 ITS-MSI 1075314733 Edge nvme0q45
130: 0 0 0 0 0 0 0 0 ITS-MSI 1075314734 Edge nvme0q46
131: 0 0 0 0 0 0 0 0 ITS-MSI 1075314735 Edge nvme0q47
132: 0 0 0 0 0 0 0 0 ITS-MSI 1075314736 Edge nvme0q48
133: 0 0 0 0 0 0 0 0 ITS-MSI 1075314737 Edge nvme0q49
134: 0 0 0 0 0 0 0 0 ITS-MSI 1075314738 Edge nvme0q50
135: 0 0 0 0 0 0 0 0 ITS-MSI 1075314739 Edge nvme0q51
136: 0 0 0 0 0 0 0 0 ITS-MSI 1075314740 Edge nvme0q52
137: 0 0 0 0 0 0 0 0 ITS-MSI 1075314741 Edge nvme0q53
138: 0 0 0 0 0 0 0 0 ITS-MSI 1075314742 Edge nvme0q54
139: 0 0 0 0 0 0 0 0 ITS-MSI 1075314743 Edge nvme0q55
140: 0 0 0 0 0 0 0 0 ITS-MSI 1075314744 Edge nvme0q56
141: 0 0 0 0 0 0 0 0 ITS-MSI 1075314745 Edge nvme0q57
142: 0 0 0 0 0 0 0 0 ITS-MSI 1075314746 Edge nvme0q58
143: 0 0 0 0 0 0 0 0 ITS-MSI 1075314747 Edge nvme0q59
144: 0 0 0 0 0 0 0 0 ITS-MSI 1075314748 Edge nvme0q60
145: 0 0 0 0 0 0 0 0 ITS-MSI 1075314749 Edge nvme0q61
146: 0 0 0 0 0 0 0 0 ITS-MSI 1075314750 Edge nvme0q62
147: 0 0 0 0 0 0 0 0 ITS-MSI 1075314751 Edge nvme0q63
148: 0 0 0 0 0 0 0 0 ITS-MSI 1075314752 Edge nvme0q64
149: 0 0 0 0 0 0 0 0 ITS-MSI 1075314753 Edge nvme0q65
150: 0 0 0 0 0 0 0 0 ITS-MSI 1075314754 Edge nvme0q66
151: 0 0 0 0 0 0 0 0 ITS-MSI 1075314755 Edge nvme0q67
152: 0 0 0 0 0 0 0 0 ITS-MSI 1075314756 Edge nvme0q68
153: 0 0 0 0 0 0 0 0 ITS-MSI 1075314757 Edge nvme0q69
154: 0 0 0 0 0 0 0 0 ITS-MSI 1075314758 Edge nvme0q70
155: 0 0 0 0 0 0 0 0 ITS-MSI 1075314759 Edge nvme0q71
156: 0 0 0 0 0 0 0 0 ITS-MSI 1075314760 Edge nvme0q72
IPI0: 15 3 7 18 16 23 19 22 Rescheduling interrupts
IPI1: 4429 473 294 1307 3926 1535 1897 216 Function call interrupts
IPI2: 0 0 0 0 0 0 0 0 CPU stop interrupts
IPI3: 0 0 0 0 0 0 0 0 CPU stop (for crash dump) interrupts
IPI4: 0 0 0 0 0 0 0 0 Timer broadcast interrupts
IPI5: 0 0 0 0 0 0 0 0 IRQ work interrupts
IPI6: 0 0 0 0 0 0 0 0 CPU wake-up interrupts
Err: 0

> Commit 721255b9826b ("genirq: Use a maple tree for interrupt descriptor
> management") stood out when skimming the log, and Marc soon suggested
> the same possible culprit on IRC.
>
> I have not been able to reproduce it with the maple tree patch reverted,
> but I hit it again after adding it back. Did not trigger immediately
> after boot though, I had had the machine idling for a few minutes in
> between.
>
> Marc asked for a dump so figured I'd CC the list as well.
>
> Johan
>
>
> [ 2546.693932] Unable to handle kernel paging request at virtual address ffff80008106bb19
> [ 2546.695148] Mem abort info:
> [ 2546.695562] ESR = 0x0000000096000007
> [ 2546.695976] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 2546.696394] SET = 0, FnV = 0
> [ 2546.696807] EA = 0, S1PTW = 0
> [ 2546.697220] FSC = 0x07: level 3 translation fault
> [ 2546.697642] Data abort info:
> [ 2546.698066] ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
> [ 2546.698494] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [ 2546.698922] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [ 2546.699355] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000002d7a05000
> [ 2546.699792] [ffff80008106bb19] pgd=10000001000a5003, p4d=10000001000a5003, pud=10000001000a6003, pmd=1000000100d5a003, pte=0000000000000000
> [ 2546.700387] Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP
> [ 2546.700796] Modules linked in: snd_soc_wsa883x q6prm_clocks q6apm_lpass_dais snd_q6dsp_common q6apm_dai q6prm michael_mic cbc des_generic libdes ecb algif_skcipher md5 algif_hash af_alg ip6_tables xt_LOG nf_log_syslog ipt_REJECT nf_reject_ipv4 xt_tcpudp snd_q6apm xt_conntrack nf_conntrack libcrc32c nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter r8152 mii qrtr_mhi panel_edp snd_soc_hdmi_codec venus_enc venus_dec apr videobuf2_dma_contig videobuf2_memops fastrpc qrtr_smd rpmsg_ctrl rpmsg_char qcom_pm8008_regulator qcom_battmgr pmic_glink_altmode ath11k_pci ath11k venus_core snd_soc_wcd938x v4l2_mem2mem hci_uart mac80211 msm videobuf2_v4l2 snd_soc_wcd938x_sdw snd_soc_sc8280xp libarc4 btqca snd_soc_qcom_common regmap_sdw snd_soc_lpass_rx_macro videodev snd_soc_lpass_va_macro soundwire_qcom leds_qcom_lpg snd_soc_lpass_wsa_macro bluetooth snd_soc_lpass_tx_macro qcom_spmi_adc_tm5 snd_soc_wcd_mbhc snd_soc_qcom_sdw snd_soc_lpass_macro_common cfg80211 gpu_sched gpio_sbu_mux videobuf2_common qcom_spmi_temp_alarm snd_soc_core
> [ 2546.700875] qcom_spmi_adc5 ecdh_generic drm_display_helper ecc qcom_pon mc snd_compress qcom_q6v5_pas industrialio rtc_pm8xxx reboot_mode phy_qcom_qmp_combo mhi led_class_multicolor nvmem_qcom_spmi_sdam drm_dp_aux_bus rfkill qcom_vadc_common snd_pcm qcom_pil_info drm_kms_helper qcom_common phy_qcom_edp qcom_pm8008 qcom_stats qrtr qcom_glink_smem snd_timer typec videocc_sc8280xp icc_bwmon qcom_q6v5 phy_qcom_qmp_usb pinctrl_sc8280xp_lpass_lpi regmap_i2c snd qcom_sysmon phy_qcom_snps_femto_v2 pmic_glink soundwire_bus pinctrl_lpass_lpi pdr_interface lpasscc_sc8280xp icc_osm_l3 mdt_loader soundcore socinfo qcom_wdt qcom_rng qmi_helpers pwm_bl drm dm_mod ip_tables x_tables ipv6 pcie_qcom crc8 phy_qcom_qmp_pcie nvme nvme_core hid_multitouch i2c_qcom_geni i2c_hid_of i2c_hid i2c_core
> [ 2546.705703] CPU: 4 PID: 610 Comm: cat Not tainted 6.5.0-rc1 #45
> [ 2546.706287] Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET53W (1.25 ) 10/12/2022
> [ 2546.706880] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 2546.707476] pc : string+0x4c/0xfc
> [ 2546.708080] lr : vsnprintf+0x170/0x748
> [ 2546.708674] sp : ffff800083563ac0
> [ 2546.709265] x29: ffff800083563ac0 x28: ffff11b942bca791 x27: ffffbb03f92e0974
> [ 2546.709866] x26: ffffbb03f92e0974 x25: 0000000000000020 x24: 0000000000000871
> [ 2546.710476] x23: 00000000ffffffd8 x22: ffffbb03f9161778 x21: ffff800083563c10
> [ 2546.711083] x20: ffff11b942bca78f x19: ffff11b942bcb000 x18: 0000000000000020
> [ 2546.711688] x17: 0000000000000000 x16: 0000000000000000 x15: ffffffffffffffff
> [ 2546.712297] x14: 0000000000000001 x13: 0000000000000003 x12: ffff11b942bca783
> [ 2546.712910] x11: 0000000000000000 x10: 0000000000000020 x9 : 0000000000000000
> [ 2546.713522] x8 : 00000000ffffffff x7 : ffff800083563c10 x6 : 0000000000000020
> [ 2546.714133] x5 : ffff11b942bcb000 x4 : 0000000000000000 x3 : ffff0a00ffffff04
> [ 2546.714752] x2 : ffff80008106bb19 x1 : ffffffffffffffff x0 : ffff11b942bca791
> [ 2546.715362] Call trace:
> [ 2546.715962] string+0x4c/0xfc
> [ 2546.716557] vsnprintf+0x170/0x748
> [ 2546.717152] seq_printf+0xb4/0xd0
> [ 2546.717746] show_interrupts+0x2f4/0x4e8
> [ 2546.718345] seq_read_iter+0x3bc/0x4ac
> [ 2546.718940] proc_reg_read_iter+0x84/0xd8
> [ 2546.719539] vfs_read+0x1d4/0x294
> [ 2546.720137] ksys_read+0x68/0xf4
> [ 2546.720735] __arm64_sys_read+0x1c/0x28
> [ 2546.721335] invoke_syscall+0x48/0x114
> [ 2546.721934] el0_svc_common.constprop.0+0x60/0x10c
> [ 2546.722536] do_el0_svc+0x30/0x88
> [ 2546.723132] el0_svc+0x40/0xac
> [ 2546.723729] el0t_64_sync_handler+0xc0/0xc4
> [ 2546.724329] el0t_64_sync+0x190/0x194
> [ 2546.724930] Code: 91000400 110004e1 eb08009f 540000e0 (38646846)
> [ 2546.725536] ---[ end trace 0000000000000000 ]---
> [ 2546.726143] note: cat[610] exited with irqs disabled
> [ 2546.726781] note: cat[610] exited with preempt_count 1
>

2023-07-11 18:55:32

by Marc Zyngier

[permalink] [raw]
Subject: Re: Oops on /proc/interrupt access with 6.5-rc1

On Tue, 11 Jul 2023 16:51:10 +0100,
Johan Hovold <[email protected]> wrote:
>
> Hi,
>
> Konrad reported on IRC that he hit a segfault and hang when watch:ing
> /proc/interrupts with 6.5-rc1.
>
> I tried simply catting it and hit the below oops immediately with my
> X13s (aarch64).
>
> Commit 721255b9826b ("genirq: Use a maple tree for interrupt descriptor
> management") stood out when skimming the log, and Marc soon suggested
> the same possible culprit on IRC.
>
> I have not been able to reproduce it with the maple tree patch reverted,
> but I hit it again after adding it back. Did not trigger immediately
> after boot though, I had had the machine idling for a few minutes in
> between.
>
> Marc asked for a dump so figured I'd CC the list as well.

Thanks for that. I've been trying to reproduce this locally, but
failed so far. I'll try a different part of the zoo to see if I get
more luck.

I wonder if you have a driver that periodically allocates an interrupt
and then frees it...

[...]

> [ 2546.693932] Unable to handle kernel paging request at virtual address ffff80008106bb19

The VA seems legitimate, and not unusual for a string.

> [ 2546.695148] Mem abort info:
> [ 2546.695562] ESR = 0x0000000096000007
> [ 2546.695976] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 2546.696394] SET = 0, FnV = 0
> [ 2546.696807] EA = 0, S1PTW = 0
> [ 2546.697220] FSC = 0x07: level 3 translation fault
> [ 2546.697642] Data abort info:
> [ 2546.698066] ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
> [ 2546.698494] CM = 0, WnR = 0, TnD = 0, TagAccess = 0

This is a read, but we don't have any valid syndrome information.

Could you try and enable KASAN?

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2023-07-12 09:18:20

by Johan Hovold

[permalink] [raw]
Subject: Re: Oops on /proc/interrupt access with 6.5-rc1

On Tue, Jul 11, 2023 at 07:14:02PM +0100, Marc Zyngier wrote:
> On Tue, 11 Jul 2023 16:51:10 +0100,
> Johan Hovold <[email protected]> wrote:

> > Konrad reported on IRC that he hit a segfault and hang when watch:ing
> > /proc/interrupts with 6.5-rc1.
> >
> > I tried simply catting it and hit the below oops immediately with my
> > X13s (aarch64).

> I wonder if you have a driver that periodically allocates an interrupt
> and then frees it...

I checked by instrumenting the descriptor allocator, but that does not
appear to be the case.

> > [ 2546.693932] Unable to handle kernel paging request at virtual address ffff80008106bb19
>
> The VA seems legitimate, and not unusual for a string.
>
> > [ 2546.695148] Mem abort info:
> > [ 2546.695562] ESR = 0x0000000096000007
> > [ 2546.695976] EC = 0x25: DABT (current EL), IL = 32 bits
> > [ 2546.696394] SET = 0, FnV = 0
> > [ 2546.696807] EA = 0, S1PTW = 0
> > [ 2546.697220] FSC = 0x07: level 3 translation fault
> > [ 2546.697642] Data abort info:
> > [ 2546.698066] ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
> > [ 2546.698494] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>
> This is a read, but we don't have any valid syndrome information.
>
> Could you try and enable KASAN?

Just reproduced it with KASAN enabled. See splat below.

Johan

[ 537.007382] ==================================================================
[ 537.007536] BUG: KASAN: vmalloc-out-of-bounds in string+0xec/0x1ec
[ 537.007635] Read of size 1 at addr ffff8000813478d0 by task cat/533

[ 537.007752] CPU: 6 PID: 533 Comm: cat Not tainted 6.5.0-rc1 #4
[ 537.007836] Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET53W (1.25 ) 10/12/2022
[ 537.007947] Call trace:
[ 537.007984] dump_backtrace+0x9c/0x11c
[ 537.008042] show_stack+0x18/0x24
[ 537.008092] dump_stack_lvl+0x60/0xac
[ 537.008147] print_address_description.constprop.0+0x84/0x394
[ 537.008231] kasan_report+0x110/0x144
[ 537.008287] __asan_load1+0x60/0x6c
[ 537.008338] string+0xec/0x1ec
[ 537.008386] vsnprintf+0x224/0x8b8
[ 537.008438] seq_printf+0x164/0x194
[ 537.008491] show_interrupts+0x40c/0x5e8
[ 537.008551] seq_read_iter+0x5d0/0x738
[ 537.008605] proc_reg_read_iter+0xe8/0x140
[ 537.008668] vfs_read+0x33c/0x444
[ 537.008720] ksys_read+0xc4/0x168
[ 537.008770] __arm64_sys_read+0x44/0x58
[ 537.008827] invoke_syscall+0x60/0x190
[ 537.008884] el0_svc_common.constprop.0+0x80/0x154
[ 537.008955] do_el0_svc+0x38/0xa0
[ 537.009006] el0_svc+0x44/0x90
[ 537.009053] el0t_64_sync_handler+0xc0/0xc4
[ 537.009115] el0t_64_sync+0x190/0x194

[ 537.009201] Memory state around the buggy address:
[ 537.009269] ffff800081347780: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[ 537.009365] ffff800081347800: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[ 537.009462] >ffff800081347880: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[ 537.009557] ^
[ 537.009637] ffff800081347900: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[ 537.009733] ffff800081347980: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[ 537.009829] ==================================================================
[ 537.009925] Disabling lock debugging due to kernel taint
[ 537.009998] Unable to handle kernel paging request at virtual address ffff8000813478d0
[ 537.010100] Mem abort info:
[ 537.010139] ESR = 0x0000000096000007
[ 537.010191] EC = 0x25: DABT (current EL), IL = 32 bits
[ 537.010261] SET = 0, FnV = 0
[ 537.010304] EA = 0, S1PTW = 0
[ 537.010347] FSC = 0x07: level 3 translation fault
[ 537.013925] Data abort info:
[ 537.017460] ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
[ 537.021044] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 537.024609] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 537.028157] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000032ef05000
[ 537.031731] [ffff8000813478d0] pgd=10000001000de003, p4d=10000001000de003, pud=10000001000df003, pmd=1000000100f0f003, pte=0000000000000000
[ 537.035463] Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP
[ 537.039178] Modules linked in: snd_soc_wsa883x q6prm_clocks q6apm_lpass_dais snd_q6dsp_common q6apm_dai q6prm michael_mic cbc des_generic libdes ecb algif_skcipher md5 algif_hash af_alg ip6_tables xt_LOG nf_log_syslog ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_conntrack snd_q6apm nf_conntrack libcrc32c nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter r8152 qrtr_mhi mii panel_edp snd_soc_hdmi_codec fastrpc qrtr_smd venus_dec venus_enc videobuf2_dma_contig rpmsg_ctrl apr videobuf2_memops rpmsg_char qcom_pm8008_regulator pmic_glink_altmode qcom_battmgr qcom_spmi_adc5 leds_qcom_lpg qcom_spmi_temp_alarm qcom_pon qcom_spmi_adc_tm5 led_class_multicolor reboot_mode industrialio rtc_pm8xxx qcom_vadc_common nvmem_qcom_spmi_sdam hci_uart msm btqca snd_soc_sc8280xp ath11k_pci snd_soc_qcom_common gpu_sched ath11k bluetooth qcom_pm8008 snd_soc_qcom_sdw regmap_i2c gpio_sbu_mux venus_core ecdh_generic ecc drm_display_helper snd_soc_wcd938x mac80211 v4l2_mem2mem videobuf2_v4l2 snd_soc_wcd938x_sdw snd_soc_lpass_tx_macro qcom_stats
[ 537.039382] snd_soc_lpass_va_macro snd_soc_lpass_rx_macro regmap_sdw snd_soc_lpass_wsa_macro soundwire_qcom snd_soc_wcd_mbhc snd_soc_lpass_macro_common videodev drm_dp_aux_bus libarc4 qcom_q6v5_pas phy_qcom_edp videobuf2_common snd_soc_core mc qcom_pil_info videocc_sc8280xp snd_compress qcom_common snd_pcm icc_bwmon cfg80211 qcom_glink_smem phy_qcom_qmp_combo qcom_q6v5 drm_kms_helper snd_timer rfkill phy_qcom_qmp_usb qcom_sysmon qrtr pmic_glink typec mhi snd pinctrl_sc8280xp_lpass_lpi pdr_interface mdt_loader soundwire_bus phy_qcom_snps_femto_v2 pinctrl_lpass_lpi lpasscc_sc8280xp qmi_helpers soundcore pwm_bl socinfo icc_osm_l3 qcom_wdt qcom_rng drm dm_mod ip_tables x_tables ipv6 pcie_qcom crc8 phy_qcom_qmp_pcie nvme nvme_core hid_multitouch i2c_qcom_geni i2c_hid_of i2c_hid i2c_core
[ 537.081081] CPU: 6 PID: 533 Comm: cat Tainted: G B 6.5.0-rc1 #4
[ 537.086148] Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET53W (1.25 ) 10/12/2022
[ 537.091295] pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 537.096402] pc : string+0xec/0x1ec
[ 537.101518] lr : string+0xec/0x1ec
[ 537.106554] sp : ffff8000802876f0
[ 537.111608] x29: ffff8000802876f0 x28: ffffcfdaf4ef5144 x27: 1ffff00010050f06
[ 537.116753] x26: ffff0a00ffffff04 x25: ffff37069f4a4790 x24: 0000000000000000
[ 537.121933] x23: 00000000ffffffff x22: 1ffff00010050eea x21: ffff37059f4a5000
[ 537.127075] x20: ffff8000813478d0 x19: ffff37059f4a4791 x18: 0000000000000000
[ 537.132149] x17: 0000000000000000 x16: ffffcfdaf413a5d4 x15: 0000000000000000
[ 537.137241] x14: 1ffff00010050dec x13: 0000000041b58ab3 x12: ffff79fb5ec6191d
[ 537.142267] x11: 1ffff9fb5ec6191c x10: ffff79fb5ec6191c x9 : dfff800000000000
[ 537.147258] x8 : 00008604a139e6e4 x7 : ffffcfdaf630c8e7 x6 : 0000000000000001
[ 537.152222] x5 : ffffcfdaf630c8e0 x4 : ffff79fb5ec6191d x3 : ffffcfdaf41015cc
[ 537.157174] x2 : 0000000000000001 x1 : ffff3705815a4e00 x0 : 0000000000000001
[ 537.162123] Call trace:
[ 537.167016] string+0xec/0x1ec
[ 537.171904] vsnprintf+0x224/0x8b8
[ 537.176750] seq_printf+0x164/0x194
[ 537.181551] show_interrupts+0x40c/0x5e8
[ 537.186359] seq_read_iter+0x5d0/0x738
[ 537.191160] proc_reg_read_iter+0xe8/0x140
[ 537.195983] vfs_read+0x33c/0x444
[ 537.200785] ksys_read+0xc4/0x168
[ 537.205557] __arm64_sys_read+0x44/0x58
[ 537.210342] invoke_syscall+0x60/0x190
[ 537.215131] el0_svc_common.constprop.0+0x80/0x154
[ 537.219946] do_el0_svc+0x38/0xa0
[ 537.224761] el0_svc+0x44/0x90
[ 537.229569] el0t_64_sync_handler+0xc0/0xc4
[ 537.234393] el0t_64_sync+0x190/0x194
[ 537.239204] Code: eb19027f 540000a0 aa1403e0 97d8fc79 (38401697)
[ 537.244064] ---[ end trace 0000000000000000 ]---
[ 537.248926] note: cat[533] exited with irqs disabled
[ 537.254196] note: cat[533] exited with preempt_count 1

2023-07-12 11:40:11

by Johan Hovold

[permalink] [raw]
Subject: Re: Oops on /proc/interrupt access with 6.5-rc1

On Tue, Jul 11, 2023 at 07:14:02PM +0100, Marc Zyngier wrote:
> On Tue, 11 Jul 2023 16:51:10 +0100,
> Johan Hovold <[email protected]> wrote:

> > Konrad reported on IRC that he hit a segfault and hang when watch:ing
> > /proc/interrupts with 6.5-rc1.
> >
> > I tried simply catting it and hit the below oops immediately with my
> > X13s (aarch64).
> >
> > Commit 721255b9826b ("genirq: Use a maple tree for interrupt descriptor
> > management") stood out when skimming the log, and Marc soon suggested
> > the same possible culprit on IRC.

Turns out we had a buggy patch that requested an irq using a stack
allocated name. So false alarm.

Sorry about the noise.

Johan