2020-08-27 08:33:52

by Naresh Kamboju

[permalink] [raw]
Subject: Kernel panic : Unable to handle kernel paging request at virtual address - dead address between user and kernel address ranges

arm64 dragonboard db410c boot failed while running linux next 20200827 kernel.

metadata:
git branch: master
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
git commit: 88abac0b753dfdd85362a26d2da8277cb1e0842b
git describe: next-20200827
make_kernelversion: 5.9.0-rc2
kernel-config:
https://builds.tuxbuild.com/vThV35pOF_GMlWdiTs3Bdw/kernel.config

Boot log,

[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd030]
[ 0.000000] Linux version 5.9.0-rc2-next-20200827
(TuxBuild@12963d21faa5) (aarch64-linux-gnu-gcc (Debian 9.3.0-8) 9.3.0,
GNU ld (GNU Binutils for Debian) 2.34) #1 SMP PREEMPT Thu Aug 27
05:19:00 UTC 2020
[ 0.000000] Machine model: Qualcomm Technologies, Inc. APQ 8016 SBC
[ 0.000000] efi: UEFI not found.
[ 0.000000] [Firmware Bug]: Kernel image misaligned at boot, please
fix your bootloader!
<trmi>
[ 3.451425] i2c_qup 78ba000.i2c: using default clock-frequency 100000
[ 3.451491] i2c_qup 78ba000.i2c:
[ 3.451491] tx channel not available
[ 3.493455] sdhci: Secure Digital Host Controller Interface driver
[ 3.493508] sdhci: Copyright(c) Pierre Ossman
[ 3.500902] Synopsys Designware Multimedia Card Interface Driver
[ 3.507441] sdhci-pltfm: SDHCI platform and OF driver helper
[ 3.514308] Unable to handle kernel paging request at virtual
address dead000000000108
[ 3.514695] Mem abort info:
[ 3.522421] ESR = 0x96000044
[ 3.525096] EC = 0x25: DABT (current EL), IL = 32 bits
[ 3.528236] SET = 0, FnV = 0
[ 3.533703] EA = 0, S1PTW = 0
[ 3.536561] Data abort info:
[ 3.539601] ISV = 0, ISS = 0x00000044
[ 3.542727] CM = 0, WnR = 1
[ 3.546287] [dead000000000108] address between user and kernel address ranges
[ 3.549414] Internal error: Oops: 96000044 [#1] PREEMPT SMP
[ 3.556520] Modules linked in:
[ 3.561901] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.9.0-rc2-next-20200827 #1
[ 3.565034] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 3.572584] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
[ 3.579271] pc : __clk_put+0x40/0x140
[ 3.584556] lr : __clk_put+0x2c/0x140
[ 3.588373] sp : ffff80001002bb00
[ 3.592016] x29: ffff80001002bb00 x28: 000000000000002e
[ 3.595320] x27: ffff000009f7ba68 x26: ffff80001146d878
[ 3.600703] x25: ffff00003fcfd8f8 x24: ffff00003d0bc410
[ 3.605999] x23: ffff80001146d0e0 x22: ffff000009f7ba40
[ 3.611293] x21: ffff00003d0bc400 x20: ffff000009f7b580
[ 3.616588] x19: ffff00003bccc780 x18: 0000000007824000
[ 3.621883] x17: ffff000009f7ba00 x16: ffff000009f7b5d0
[ 3.627177] x15: ffff800011966cf8 x14: ffffffffffffffff
[ 3.632472] x13: ffff800012917000 x12: ffff800012917000
[ 3.637769] x11: 0000000000000020 x10: 0101010101010101
[ 3.643063] x9 : ffff8000107a984c x8 : 7f7f7f7f7f7f7f7f
[ 3.648358] x7 : ffff000009fd8000 x6 : ffff80001237a000
[ 3.653653] x5 : 0000000000000000 x4 : ffff000009fd8000
[ 3.658949] x3 : ffff8000124e6768 x2 : ffff000009fd8000
[ 3.664243] x1 : ffff00003bccca80 x0 : dead000000000100
[ 3.669539] Call trace:
[ 3.674830] __clk_put+0x40/0x140
[ 3.677003] clk_put+0x18/0x28
[ 3.680477] dev_pm_opp_put_clkname+0x30/0x58
[ 3.683431] sdhci_msm_probe+0x284/0x9a0
[ 3.687857] platform_drv_probe+0x5c/0xb0
[ 3.691847] really_probe+0xf0/0x4d8
[ 3.695753] driver_probe_device+0xfc/0x168
[ 3.699399] device_driver_attach+0x7c/0x88
[ 3.703306] __driver_attach+0xac/0x178
[ 3.707472] bus_for_each_dev+0x78/0xc8
[ 3.711291] driver_attach+0x2c/0x38
[ 3.715110] bus_add_driver+0x14c/0x230
[ 3.718929] driver_register+0x6c/0x128
[ 3.722489] __platform_driver_register+0x50/0x60
[ 3.726312] sdhci_msm_driver_init+0x24/0x30
[ 3.731173] do_one_initcall+0x4c/0x2c0
[ 3.735511] kernel_init_freeable+0x21c/0x284
[ 3.739072] kernel_init+0x1c/0x120
[ 3.743582] ret_from_fork+0x10/0x30
[ 3.746885] Code: 35000720 a9438660 f9000020 b4000040 (f9000401)
[ 3.750720] ---[ end trace a8d4100497387a2e ]---
[ 3.756736] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b
[ 3.761392] SMP: stopping secondary CPUs
[ 3.768877] Kernel Offset: 0x80000 from 0xffff800010000000
[ 3.772924] PHYS_OFFSET: 0x80000000
[ 3.778216] CPU features: 0x0240002,24802005
[ 3.781602] Memory Limit: none

full test log,
https://qa-reports.linaro.org/lkft/linux-next-oe/build/next-20200827/testrun/3123101/suite/linux-log-parser/test/check-kernel-oops-1714695/log

--
Linaro LKFT
https://lkft.linaro.org


2020-08-27 09:11:37

by Viresh Kumar

[permalink] [raw]
Subject: Re: Kernel panic : Unable to handle kernel paging request at virtual address - dead address between user and kernel address ranges

+Rajendra

On 27-08-20, 14:02, Naresh Kamboju wrote:
> arm64 dragonboard db410c boot failed while running linux next 20200827 kernel.
>
> metadata:
> git branch: master
> git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> git commit: 88abac0b753dfdd85362a26d2da8277cb1e0842b
> git describe: next-20200827
> make_kernelversion: 5.9.0-rc2
> kernel-config:
> https://builds.tuxbuild.com/vThV35pOF_GMlWdiTs3Bdw/kernel.config
>
> Boot log,
>
> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd030]
> [ 0.000000] Linux version 5.9.0-rc2-next-20200827
> (TuxBuild@12963d21faa5) (aarch64-linux-gnu-gcc (Debian 9.3.0-8) 9.3.0,
> GNU ld (GNU Binutils for Debian) 2.34) #1 SMP PREEMPT Thu Aug 27
> 05:19:00 UTC 2020
> [ 0.000000] Machine model: Qualcomm Technologies, Inc. APQ 8016 SBC
> [ 0.000000] efi: UEFI not found.
> [ 0.000000] [Firmware Bug]: Kernel image misaligned at boot, please
> fix your bootloader!
> <trmi>
> [ 3.451425] i2c_qup 78ba000.i2c: using default clock-frequency 100000
> [ 3.451491] i2c_qup 78ba000.i2c:
> [ 3.451491] tx channel not available
> [ 3.493455] sdhci: Secure Digital Host Controller Interface driver
> [ 3.493508] sdhci: Copyright(c) Pierre Ossman
> [ 3.500902] Synopsys Designware Multimedia Card Interface Driver
> [ 3.507441] sdhci-pltfm: SDHCI platform and OF driver helper
> [ 3.514308] Unable to handle kernel paging request at virtual
> address dead000000000108
> [ 3.514695] Mem abort info:
> [ 3.522421] ESR = 0x96000044
> [ 3.525096] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 3.528236] SET = 0, FnV = 0
> [ 3.533703] EA = 0, S1PTW = 0
> [ 3.536561] Data abort info:
> [ 3.539601] ISV = 0, ISS = 0x00000044
> [ 3.542727] CM = 0, WnR = 1
> [ 3.546287] [dead000000000108] address between user and kernel address ranges
> [ 3.549414] Internal error: Oops: 96000044 [#1] PREEMPT SMP
> [ 3.556520] Modules linked in:
> [ 3.561901] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 5.9.0-rc2-next-20200827 #1
> [ 3.565034] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> [ 3.572584] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
> [ 3.579271] pc : __clk_put+0x40/0x140
> [ 3.584556] lr : __clk_put+0x2c/0x140
> [ 3.588373] sp : ffff80001002bb00
> [ 3.592016] x29: ffff80001002bb00 x28: 000000000000002e
> [ 3.595320] x27: ffff000009f7ba68 x26: ffff80001146d878
> [ 3.600703] x25: ffff00003fcfd8f8 x24: ffff00003d0bc410
> [ 3.605999] x23: ffff80001146d0e0 x22: ffff000009f7ba40
> [ 3.611293] x21: ffff00003d0bc400 x20: ffff000009f7b580
> [ 3.616588] x19: ffff00003bccc780 x18: 0000000007824000
> [ 3.621883] x17: ffff000009f7ba00 x16: ffff000009f7b5d0
> [ 3.627177] x15: ffff800011966cf8 x14: ffffffffffffffff
> [ 3.632472] x13: ffff800012917000 x12: ffff800012917000
> [ 3.637769] x11: 0000000000000020 x10: 0101010101010101
> [ 3.643063] x9 : ffff8000107a984c x8 : 7f7f7f7f7f7f7f7f
> [ 3.648358] x7 : ffff000009fd8000 x6 : ffff80001237a000
> [ 3.653653] x5 : 0000000000000000 x4 : ffff000009fd8000
> [ 3.658949] x3 : ffff8000124e6768 x2 : ffff000009fd8000
> [ 3.664243] x1 : ffff00003bccca80 x0 : dead000000000100
> [ 3.669539] Call trace:
> [ 3.674830] __clk_put+0x40/0x140
> [ 3.677003] clk_put+0x18/0x28
> [ 3.680477] dev_pm_opp_put_clkname+0x30/0x58
> [ 3.683431] sdhci_msm_probe+0x284/0x9a0
> [ 3.687857] platform_drv_probe+0x5c/0xb0
> [ 3.691847] really_probe+0xf0/0x4d8
> [ 3.695753] driver_probe_device+0xfc/0x168
> [ 3.699399] device_driver_attach+0x7c/0x88
> [ 3.703306] __driver_attach+0xac/0x178
> [ 3.707472] bus_for_each_dev+0x78/0xc8
> [ 3.711291] driver_attach+0x2c/0x38
> [ 3.715110] bus_add_driver+0x14c/0x230
> [ 3.718929] driver_register+0x6c/0x128
> [ 3.722489] __platform_driver_register+0x50/0x60
> [ 3.726312] sdhci_msm_driver_init+0x24/0x30
> [ 3.731173] do_one_initcall+0x4c/0x2c0
> [ 3.735511] kernel_init_freeable+0x21c/0x284
> [ 3.739072] kernel_init+0x1c/0x120
> [ 3.743582] ret_from_fork+0x10/0x30
> [ 3.746885] Code: 35000720 a9438660 f9000020 b4000040 (f9000401)
> [ 3.750720] ---[ end trace a8d4100497387a2e ]---
> [ 3.756736] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b
> [ 3.761392] SMP: stopping secondary CPUs
> [ 3.768877] Kernel Offset: 0x80000 from 0xffff800010000000
> [ 3.772924] PHYS_OFFSET: 0x80000000
> [ 3.778216] CPU features: 0x0240002,24802005
> [ 3.781602] Memory Limit: none
>
> full test log,
> https://qa-reports.linaro.org/lkft/linux-next-oe/build/next-20200827/testrun/3123101/suite/linux-log-parser/test/check-kernel-oops-1714695/log
>
> --
> Linaro LKFT
> https://lkft.linaro.org

--
viresh

2020-08-27 09:13:41

by Naresh Kamboju

[permalink] [raw]
Subject: Re: Kernel panic : Unable to handle kernel paging request at virtual address - dead address between user and kernel address ranges

On Thu, 27 Aug 2020 at 14:02, Naresh Kamboju <[email protected]> wrote:
>
> arm64 dragonboard db410c boot failed while running linux next 20200827 kernel.
>
> metadata:
> git branch: master
> git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> git commit: 88abac0b753dfdd85362a26d2da8277cb1e0842b
> git describe: next-20200827
> make_kernelversion: 5.9.0-rc2
> kernel-config:
> https://builds.tuxbuild.com/vThV35pOF_GMlWdiTs3Bdw/kernel.config

The reported issue is started from linux next tag next-20200825.

BAD: next-20200825
GOOD: next-20200824

We are working on git bisect and boot testing on db410c and get back to you.

>
> Boot log,
>
> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd030]
> [ 0.000000] Linux version 5.9.0-rc2-next-20200827
> (TuxBuild@12963d21faa5) (aarch64-linux-gnu-gcc (Debian 9.3.0-8) 9.3.0,
> GNU ld (GNU Binutils for Debian) 2.34) #1 SMP PREEMPT Thu Aug 27
> 05:19:00 UTC 2020
> [ 0.000000] Machine model: Qualcomm Technologies, Inc. APQ 8016 SBC
> [ 0.000000] efi: UEFI not found.
> [ 0.000000] [Firmware Bug]: Kernel image misaligned at boot, please
> fix your bootloader!
> <trmi>
> [ 3.451425] i2c_qup 78ba000.i2c: using default clock-frequency 100000
> [ 3.451491] i2c_qup 78ba000.i2c:
> [ 3.451491] tx channel not available
> [ 3.493455] sdhci: Secure Digital Host Controller Interface driver
> [ 3.493508] sdhci: Copyright(c) Pierre Ossman
> [ 3.500902] Synopsys Designware Multimedia Card Interface Driver
> [ 3.507441] sdhci-pltfm: SDHCI platform and OF driver helper
> [ 3.514308] Unable to handle kernel paging request at virtual
> address dead000000000108
> [ 3.514695] Mem abort info:
> [ 3.522421] ESR = 0x96000044
> [ 3.525096] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 3.528236] SET = 0, FnV = 0
> [ 3.533703] EA = 0, S1PTW = 0
> [ 3.536561] Data abort info:
> [ 3.539601] ISV = 0, ISS = 0x00000044
> [ 3.542727] CM = 0, WnR = 1
> [ 3.546287] [dead000000000108] address between user and kernel address ranges
> [ 3.549414] Internal error: Oops: 96000044 [#1] PREEMPT SMP
> [ 3.556520] Modules linked in:
> [ 3.561901] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 5.9.0-rc2-next-20200827 #1
> [ 3.565034] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> [ 3.572584] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
> [ 3.579271] pc : __clk_put+0x40/0x140
> [ 3.584556] lr : __clk_put+0x2c/0x140
> [ 3.588373] sp : ffff80001002bb00
> [ 3.592016] x29: ffff80001002bb00 x28: 000000000000002e
> [ 3.595320] x27: ffff000009f7ba68 x26: ffff80001146d878
> [ 3.600703] x25: ffff00003fcfd8f8 x24: ffff00003d0bc410
> [ 3.605999] x23: ffff80001146d0e0 x22: ffff000009f7ba40
> [ 3.611293] x21: ffff00003d0bc400 x20: ffff000009f7b580
> [ 3.616588] x19: ffff00003bccc780 x18: 0000000007824000
> [ 3.621883] x17: ffff000009f7ba00 x16: ffff000009f7b5d0
> [ 3.627177] x15: ffff800011966cf8 x14: ffffffffffffffff
> [ 3.632472] x13: ffff800012917000 x12: ffff800012917000
> [ 3.637769] x11: 0000000000000020 x10: 0101010101010101
> [ 3.643063] x9 : ffff8000107a984c x8 : 7f7f7f7f7f7f7f7f
> [ 3.648358] x7 : ffff000009fd8000 x6 : ffff80001237a000
> [ 3.653653] x5 : 0000000000000000 x4 : ffff000009fd8000
> [ 3.658949] x3 : ffff8000124e6768 x2 : ffff000009fd8000
> [ 3.664243] x1 : ffff00003bccca80 x0 : dead000000000100
> [ 3.669539] Call trace:
> [ 3.674830] __clk_put+0x40/0x140
> [ 3.677003] clk_put+0x18/0x28
> [ 3.680477] dev_pm_opp_put_clkname+0x30/0x58
> [ 3.683431] sdhci_msm_probe+0x284/0x9a0
> [ 3.687857] platform_drv_probe+0x5c/0xb0
> [ 3.691847] really_probe+0xf0/0x4d8
> [ 3.695753] driver_probe_device+0xfc/0x168
> [ 3.699399] device_driver_attach+0x7c/0x88
> [ 3.703306] __driver_attach+0xac/0x178
> [ 3.707472] bus_for_each_dev+0x78/0xc8
> [ 3.711291] driver_attach+0x2c/0x38
> [ 3.715110] bus_add_driver+0x14c/0x230
> [ 3.718929] driver_register+0x6c/0x128
> [ 3.722489] __platform_driver_register+0x50/0x60
> [ 3.726312] sdhci_msm_driver_init+0x24/0x30
> [ 3.731173] do_one_initcall+0x4c/0x2c0
> [ 3.735511] kernel_init_freeable+0x21c/0x284
> [ 3.739072] kernel_init+0x1c/0x120
> [ 3.743582] ret_from_fork+0x10/0x30
> [ 3.746885] Code: 35000720 a9438660 f9000020 b4000040 (f9000401)
> [ 3.750720] ---[ end trace a8d4100497387a2e ]---
> [ 3.756736] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b
> [ 3.761392] SMP: stopping secondary CPUs
> [ 3.768877] Kernel Offset: 0x80000 from 0xffff800010000000
> [ 3.772924] PHYS_OFFSET: 0x80000000
> [ 3.778216] CPU features: 0x0240002,24802005
> [ 3.781602] Memory Limit: none
>
> full test log,
> https://qa-reports.linaro.org/lkft/linux-next-oe/build/next-20200827/testrun/3123101/suite/linux-log-parser/test/check-kernel-oops-1714695/log
>
> --
> Linaro LKFT
> https://lkft.linaro.org

2020-08-27 09:49:33

by Arnd Bergmann

[permalink] [raw]
Subject: Re: Kernel panic : Unable to handle kernel paging request at virtual address - dead address between user and kernel address ranges

On Thu, Aug 27, 2020 at 11:08 AM Viresh Kumar <[email protected]> wrote:
>
> +Rajendra
>
> On 27-08-20, 14:02, Naresh Kamboju wrote:
> > arm64 dragonboard db410c boot failed while running linux next 20200827 kernel.
> >
> > metadata:
> > git branch: master
> > git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> > git commit: 88abac0b753dfdd85362a26d2da8277cb1e0842b
> > git describe: next-20200827
> > make_kernelversion: 5.9.0-rc2
> > kernel-config:
> > https://builds.tuxbuild.com/vThV35pOF_GMlWdiTs3Bdw/kernel.config
> >
> > Boot log,
> >
> > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd030]
> > [ 0.000000] Linux version 5.9.0-rc2-next-20200827
> > (TuxBuild@12963d21faa5) (aarch64-linux-gnu-gcc (Debian 9.3.0-8) 9.3.0,
> > GNU ld (GNU Binutils for Debian) 2.34) #1 SMP PREEMPT Thu Aug 27
> > 05:19:00 UTC 2020
> > [ 0.000000] Machine model: Qualcomm Technologies, Inc. APQ 8016 SBC
> > [ 0.000000] efi: UEFI not found.
> > [ 0.000000] [Firmware Bug]: Kernel image misaligned at boot, please
> > fix your bootloader!
> > <trmi>
> > [ 3.451425] i2c_qup 78ba000.i2c: using default clock-frequency 100000
> > [ 3.451491] i2c_qup 78ba000.i2c:
> > [ 3.451491] tx channel not available
> > [ 3.493455] sdhci: Secure Digital Host Controller Interface driver
> > [ 3.493508] sdhci: Copyright(c) Pierre Ossman
> > [ 3.500902] Synopsys Designware Multimedia Card Interface Driver
> > [ 3.507441] sdhci-pltfm: SDHCI platform and OF driver helper
> > [ 3.514308] Unable to handle kernel paging request at virtual
> > address dead000000000108

This is where the address comes from:

#define POISON_POINTER_DELTA _AC(CONFIG_ILLEGAL_POINTER_VALUE, UL)
#define LIST_POISON1 ((void *) 0x100 + POISON_POINTER_DELTA)

static inline void hlist_del(struct hlist_node *n)
{
__hlist_del(n);
n->next = LIST_POISON1;
n->pprev = LIST_POISON2;
}

> > [ 3.514695] Mem abort info:
> > [ 3.522421] ESR = 0x96000044
> > [ 3.525096] EC = 0x25: DABT (current EL), IL = 32 bits
> > [ 3.528236] SET = 0, FnV = 0
> > [ 3.533703] EA = 0, S1PTW = 0
> > [ 3.536561] Data abort info:
> > [ 3.539601] ISV = 0, ISS = 0x00000044
> > [ 3.542727] CM = 0, WnR = 1
> > [ 3.546287] [dead000000000108] address between user and kernel address ranges
> > [ 3.549414] Internal error: Oops: 96000044 [#1] PREEMPT SMP
> > [ 3.556520] Modules linked in:
> > [ 3.561901] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > 5.9.0-rc2-next-20200827 #1
> > [ 3.565034] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> > [ 3.572584] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
> > [ 3.579271] pc : __clk_put+0x40/0x140
> > [ 3.584556] lr : __clk_put+0x2c/0x140

Fairly sure this is from the hlist_del(), meaning we try to remove the
same list object a second time, after it was already removed.

> > [ 3.588373] sp : ffff80001002bb00
> > [ 3.592016] x29: ffff80001002bb00 x28: 000000000000002e
> > [ 3.595320] x27: ffff000009f7ba68 x26: ffff80001146d878
> > [ 3.600703] x25: ffff00003fcfd8f8 x24: ffff00003d0bc410
> > [ 3.605999] x23: ffff80001146d0e0 x22: ffff000009f7ba40
> > [ 3.611293] x21: ffff00003d0bc400 x20: ffff000009f7b580
> > [ 3.616588] x19: ffff00003bccc780 x18: 0000000007824000
> > [ 3.621883] x17: ffff000009f7ba00 x16: ffff000009f7b5d0
> > [ 3.627177] x15: ffff800011966cf8 x14: ffffffffffffffff
> > [ 3.632472] x13: ffff800012917000 x12: ffff800012917000
> > [ 3.637769] x11: 0000000000000020 x10: 0101010101010101
> > [ 3.643063] x9 : ffff8000107a984c x8 : 7f7f7f7f7f7f7f7f
> > [ 3.648358] x7 : ffff000009fd8000 x6 : ffff80001237a000
> > [ 3.653653] x5 : 0000000000000000 x4 : ffff000009fd8000
> > [ 3.658949] x3 : ffff8000124e6768 x2 : ffff000009fd8000
> > [ 3.664243] x1 : ffff00003bccca80 x0 : dead000000000100
> > [ 3.669539] Call trace:
> > [ 3.674830] __clk_put+0x40/0x140
> > [ 3.677003] clk_put+0x18/0x28
> > [ 3.680477] dev_pm_opp_put_clkname+0x30/0x58
> > [ 3.683431] sdhci_msm_probe+0x284/0x9a0

dev_pm_opp_put_clkname() is part of the error handling in the
probe function, so I would deduct there are two problems:

- something failed during the probe and the driver is trying
to unwind
- the error handling it self is buggy and tries to undo something
again that has already been undone.

> > [ 3.687857] platform_drv_probe+0x5c/0xb0
> > [ 3.691847] really_probe+0xf0/0x4d8
> > [ 3.695753] driver_probe_device+0xfc/0x168
> > [ 3.699399] device_driver_attach+0x7c/0x88
> > [ 3.703306] __driver_attach+0xac/0x178
> > [ 3.707472] bus_for_each_dev+0x78/0xc8
> > [ 3.711291] driver_attach+0x2c/0x38
> > [ 3.715110] bus_add_driver+0x14c/0x230
> > [ 3.718929] driver_register+0x6c/0x128
> > [ 3.722489] __platform_driver_register+0x50/0x60
> > [ 3.726312] sdhci_msm_driver_init+0x24/0x30
> > [ 3.731173] do_one_initcall+0x4c/0x2c0
> > [ 3.735511] kernel_init_freeable+0x21c/0x284
> > [ 3.739072] kernel_init+0x1c/0x120
> > [ 3.743582] ret_from_fork+0x10/0x30
> > [ 3.746885] Code: 35000720 a9438660 f9000020 b4000040 (f9000401)
> > [ 3.750720] ---[ end trace a8d4100497387a2e ]---
> > [ 3.756736] Kernel panic - not syncing: Attempted to kill init!
> > exitcode=0x0000000b
> > [ 3.761392] SMP: stopping secondary CPUs
> > [ 3.768877] Kernel Offset: 0x80000 from 0xffff800010000000
> > [ 3.772924] PHYS_OFFSET: 0x80000000
> > [ 3.778216] CPU features: 0x0240002,24802005
> > [ 3.781602] Memory Limit: none
> >
> > full test log,
> > https://qa-reports.linaro.org/lkft/linux-next-oe/build/next-20200827/testrun/3123101/suite/linux-log-parser/test/check-kernel-oops-1714695/log

Naresh writes later:
> The reported issue is started from linux next tag next-20200825.
> BAD: next-20200825
> GOOD: next-20200824

This points to Viresh's
d05a7238fe1c mmc: sdhci-msm: Unconditionally call dev_pm_opp_of_remove_table()

Most likely this is not the entire problem but it uncovered a preexisting
bug.

Arnd

2020-08-27 10:14:23

by Viresh Kumar

[permalink] [raw]
Subject: Re: Kernel panic : Unable to handle kernel paging request at virtual address - dead address between user and kernel address ranges

On 27-08-20, 11:48, Arnd Bergmann wrote:
> > > [ 3.680477] dev_pm_opp_put_clkname+0x30/0x58
> > > [ 3.683431] sdhci_msm_probe+0x284/0x9a0
>
> dev_pm_opp_put_clkname() is part of the error handling in the
> probe function, so I would deduct there are two problems:
>
> - something failed during the probe and the driver is trying
> to unwind
> - the error handling it self is buggy and tries to undo something
> again that has already been undone.

Right.

> This points to Viresh's
> d05a7238fe1c mmc: sdhci-msm: Unconditionally call dev_pm_opp_of_remove_table()

I completely forgot that Ulf already pushed this patch and I was
wondering on which of the OPP core changes I wrote have done this :(

> Most likely this is not the entire problem but it uncovered a preexisting
> bug.

I think this is.

Naresh: Can you please test with this diff ?

diff --git a/drivers/mmc/host/sdhci-msm.c b/drivers/mmc/host/sdhci-msm.c
index b7e47107a31a..401839a97b57 100644
--- a/drivers/mmc/host/sdhci-msm.c
+++ b/drivers/mmc/host/sdhci-msm.c
@@ -2286,7 +2286,7 @@ static int sdhci_msm_probe(struct platform_device *pdev)
ret = dev_pm_opp_of_add_table(&pdev->dev);
if (ret != -ENODEV) {
dev_err(&pdev->dev, "Invalid OPP table in Device tree\n");
- goto opp_cleanup;
+ goto opp_put_clkname;
}

/* Vote for maximum clock rate for maximum performance */
@@ -2451,6 +2451,7 @@ static int sdhci_msm_probe(struct platform_device *pdev)
msm_host->bulk_clks);
opp_cleanup:
dev_pm_opp_of_remove_table(&pdev->dev);
+opp_put_clkname:
dev_pm_opp_put_clkname(msm_host->opp_table);
bus_clk_disable:
if (!IS_ERR(msm_host->bus_clk))

--
viresh

2020-08-27 15:18:10

by Naresh Kamboju

[permalink] [raw]
Subject: Re: Kernel panic : Unable to handle kernel paging request at virtual address - dead address between user and kernel address ranges

On Thu, 27 Aug 2020 at 15:42, Viresh Kumar <[email protected]> wrote:
>
> On 27-08-20, 11:48, Arnd Bergmann wrote:
> > > > [ 3.680477] dev_pm_opp_put_clkname+0x30/0x58
> > > > [ 3.683431] sdhci_msm_probe+0x284/0x9a0
> >
> > dev_pm_opp_put_clkname() is part of the error handling in the
> > probe function, so I would deduct there are two problems:
> >
> > - something failed during the probe and the driver is trying
> > to unwind
> > - the error handling it self is buggy and tries to undo something
> > again that has already been undone.
>
> Right.
>
> > This points to Viresh's
> > d05a7238fe1c mmc: sdhci-msm: Unconditionally call dev_pm_opp_of_remove_table()
>
> I completely forgot that Ulf already pushed this patch and I was
> wondering on which of the OPP core changes I wrote have done this :(
>
> > Most likely this is not the entire problem but it uncovered a preexisting
> > bug.
>
> I think this is.
>
> Naresh: Can you please test with this diff ?

I have applied your patch and tested but still see the reported problem.
Link to test job,
https://lkft.validation.linaro.org/scheduler/job/1715677#L1886

- Naresh

2020-08-28 09:24:13

by Naresh Kamboju

[permalink] [raw]
Subject: Re: Kernel panic : Unable to handle kernel paging request at virtual address - dead address between user and kernel address ranges

On Thu, 27 Aug 2020 at 17:06, Naresh Kamboju <[email protected]> wrote:
>
> On Thu, 27 Aug 2020 at 15:42, Viresh Kumar <[email protected]> wrote:
> >
> > On 27-08-20, 11:48, Arnd Bergmann wrote:
> > > > > [ 3.680477] dev_pm_opp_put_clkname+0x30/0x58
> > > > > [ 3.683431] sdhci_msm_probe+0x284/0x9a0
> > >
> > > dev_pm_opp_put_clkname() is part of the error handling in the
> > > probe function, so I would deduct there are two problems:
> > >
> > > - something failed during the probe and the driver is trying
> > > to unwind
> > > - the error handling it self is buggy and tries to undo something
> > > again that has already been undone.
> >
> > Right.
> >
> > > This points to Viresh's
> > > d05a7238fe1c mmc: sdhci-msm: Unconditionally call dev_pm_opp_of_remove_table()
> >
> > I completely forgot that Ulf already pushed this patch and I was
> > wondering on which of the OPP core changes I wrote have done this :(
> >
> > > Most likely this is not the entire problem but it uncovered a preexisting
> > > bug.
> >
> > I think this is.
> >
> > Naresh: Can you please test with this diff ?
>
> I have applied your patch and tested but still see the reported problem.

The git bisect shows that the first bad commit is,
d05a7238fe1c mmc: sdhci-msm: Unconditionally call dev_pm_opp_of_remove_table()

Reported-by: Naresh Kamboju <[email protected]>
Reported-by: Anders Roxell <[email protected]>

>
> - Naresh

2020-08-28 09:38:40

by Ulf Hansson

[permalink] [raw]
Subject: Re: Kernel panic : Unable to handle kernel paging request at virtual address - dead address between user and kernel address ranges

On Fri, 28 Aug 2020 at 11:22, Naresh Kamboju <[email protected]> wrote:
>
> On Thu, 27 Aug 2020 at 17:06, Naresh Kamboju <[email protected]> wrote:
> >
> > On Thu, 27 Aug 2020 at 15:42, Viresh Kumar <[email protected]> wrote:
> > >
> > > On 27-08-20, 11:48, Arnd Bergmann wrote:
> > > > > > [ 3.680477] dev_pm_opp_put_clkname+0x30/0x58
> > > > > > [ 3.683431] sdhci_msm_probe+0x284/0x9a0
> > > >
> > > > dev_pm_opp_put_clkname() is part of the error handling in the
> > > > probe function, so I would deduct there are two problems:
> > > >
> > > > - something failed during the probe and the driver is trying
> > > > to unwind
> > > > - the error handling it self is buggy and tries to undo something
> > > > again that has already been undone.
> > >
> > > Right.
> > >
> > > > This points to Viresh's
> > > > d05a7238fe1c mmc: sdhci-msm: Unconditionally call dev_pm_opp_of_remove_table()
> > >
> > > I completely forgot that Ulf already pushed this patch and I was
> > > wondering on which of the OPP core changes I wrote have done this :(
> > >
> > > > Most likely this is not the entire problem but it uncovered a preexisting
> > > > bug.
> > >
> > > I think this is.
> > >
> > > Naresh: Can you please test with this diff ?
> >
> > I have applied your patch and tested but still see the reported problem.
>
> The git bisect shows that the first bad commit is,
> d05a7238fe1c mmc: sdhci-msm: Unconditionally call dev_pm_opp_of_remove_table()
>
> Reported-by: Naresh Kamboju <[email protected]>
> Reported-by: Anders Roxell <[email protected]>

I am not sure what version of the patch you tested. However, I have
dropped Viresh's v1 and replaced it with v2 [1]. It's available for
testing at:

https://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc.git next

Can you please check if it still causes problems, then I will drop it, again.

Kind regards
Uffe

[1] https://lkml.org/lkml/2020/8/28/43

2020-08-28 10:12:56

by Naresh Kamboju

[permalink] [raw]
Subject: Re: Kernel panic : Unable to handle kernel paging request at virtual address - dead address between user and kernel address ranges

On Fri, 28 Aug 2020 at 15:05, Ulf Hansson <[email protected]> wrote:
>
> On Fri, 28 Aug 2020 at 11:22, Naresh Kamboju <[email protected]> wrote:
> >
> > On Thu, 27 Aug 2020 at 17:06, Naresh Kamboju <[email protected]> wrote:
> > >
> > > On Thu, 27 Aug 2020 at 15:42, Viresh Kumar <[email protected]> wrote:
> > > >
> > > > On 27-08-20, 11:48, Arnd Bergmann wrote:
> > > > > > > [ 3.680477] dev_pm_opp_put_clkname+0x30/0x58
> > > > > > > [ 3.683431] sdhci_msm_probe+0x284/0x9a0
> > > > >
> > > > > dev_pm_opp_put_clkname() is part of the error handling in the
> > > > > probe function, so I would deduct there are two problems:
> > > > >
> > > > > - something failed during the probe and the driver is trying
> > > > > to unwind
> > > > > - the error handling it self is buggy and tries to undo something
> > > > > again that has already been undone.
> > > >
> > > > Right.
> > > >
> > > > > This points to Viresh's
> > > > > d05a7238fe1c mmc: sdhci-msm: Unconditionally call dev_pm_opp_of_remove_table()
> > > >
> > > > I completely forgot that Ulf already pushed this patch and I was
> > > > wondering on which of the OPP core changes I wrote have done this :(
> > > >
> > > > > Most likely this is not the entire problem but it uncovered a preexisting
> > > > > bug.
> > > >
> > > > I think this is.
> > > >
> > > > Naresh: Can you please test with this diff ?
> > >
> > > I have applied your patch and tested but still see the reported problem.
> >
> > The git bisect shows that the first bad commit is,
> > d05a7238fe1c mmc: sdhci-msm: Unconditionally call dev_pm_opp_of_remove_table()
> >
> > Reported-by: Naresh Kamboju <[email protected]>
> > Reported-by: Anders Roxell <[email protected]>
>
> I am not sure what version of the patch you tested.

I have applied The v2 patch series on top of linux next-20200824.
and tested again the reported kernel panic still there on db410c [1]

https://lkft.validation.linaro.org/scheduler/job/1717611#L1874

- Naresh

2020-08-28 10:32:47

by Anders Roxell

[permalink] [raw]
Subject: Re: Kernel panic : Unable to handle kernel paging request at virtual address - dead address between user and kernel address ranges

On Fri, 28 Aug 2020 at 11:35, Ulf Hansson <[email protected]> wrote:
>
> On Fri, 28 Aug 2020 at 11:22, Naresh Kamboju <[email protected]> wrote:
> >
> > On Thu, 27 Aug 2020 at 17:06, Naresh Kamboju <[email protected]> wrote:
> > >
> > > On Thu, 27 Aug 2020 at 15:42, Viresh Kumar <[email protected]> wrote:
> > > >
> > > > On 27-08-20, 11:48, Arnd Bergmann wrote:
> > > > > > > [ 3.680477] dev_pm_opp_put_clkname+0x30/0x58
> > > > > > > [ 3.683431] sdhci_msm_probe+0x284/0x9a0
> > > > >
> > > > > dev_pm_opp_put_clkname() is part of the error handling in the
> > > > > probe function, so I would deduct there are two problems:
> > > > >
> > > > > - something failed during the probe and the driver is trying
> > > > > to unwind
> > > > > - the error handling it self is buggy and tries to undo something
> > > > > again that has already been undone.
> > > >
> > > > Right.
> > > >
> > > > > This points to Viresh's
> > > > > d05a7238fe1c mmc: sdhci-msm: Unconditionally call dev_pm_opp_of_remove_table()
> > > >
> > > > I completely forgot that Ulf already pushed this patch and I was
> > > > wondering on which of the OPP core changes I wrote have done this :(
> > > >
> > > > > Most likely this is not the entire problem but it uncovered a preexisting
> > > > > bug.
> > > >
> > > > I think this is.
> > > >
> > > > Naresh: Can you please test with this diff ?
> > >
> > > I have applied your patch and tested but still see the reported problem.
> >
> > The git bisect shows that the first bad commit is,
> > d05a7238fe1c mmc: sdhci-msm: Unconditionally call dev_pm_opp_of_remove_table()
> >
> > Reported-by: Naresh Kamboju <[email protected]>
> > Reported-by: Anders Roxell <[email protected]>
>
> I am not sure what version of the patch you tested. However, I have
> dropped Viresh's v1 and replaced it with v2 [1]. It's available for
> testing at:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc.git next
>
> Can you please check if it still causes problems, then I will drop it, again.

I tried to run with a kernel from your tree and I could see the same
kernel panic on db410c [1].

Cheers,
Anders
[1] https://lkft.validation.linaro.org/scheduler/job/1717770#L1912

2020-08-28 12:27:06

by Ulf Hansson

[permalink] [raw]
Subject: Re: Kernel panic : Unable to handle kernel paging request at virtual address - dead address between user and kernel address ranges

On Fri, 28 Aug 2020 at 12:29, Anders Roxell <[email protected]> wrote:
>
> On Fri, 28 Aug 2020 at 11:35, Ulf Hansson <[email protected]> wrote:
> >
> > On Fri, 28 Aug 2020 at 11:22, Naresh Kamboju <[email protected]> wrote:
> > >
> > > On Thu, 27 Aug 2020 at 17:06, Naresh Kamboju <[email protected]> wrote:
> > > >
> > > > On Thu, 27 Aug 2020 at 15:42, Viresh Kumar <[email protected]> wrote:
> > > > >
> > > > > On 27-08-20, 11:48, Arnd Bergmann wrote:
> > > > > > > > [ 3.680477] dev_pm_opp_put_clkname+0x30/0x58
> > > > > > > > [ 3.683431] sdhci_msm_probe+0x284/0x9a0
> > > > > >
> > > > > > dev_pm_opp_put_clkname() is part of the error handling in the
> > > > > > probe function, so I would deduct there are two problems:
> > > > > >
> > > > > > - something failed during the probe and the driver is trying
> > > > > > to unwind
> > > > > > - the error handling it self is buggy and tries to undo something
> > > > > > again that has already been undone.
> > > > >
> > > > > Right.
> > > > >
> > > > > > This points to Viresh's
> > > > > > d05a7238fe1c mmc: sdhci-msm: Unconditionally call dev_pm_opp_of_remove_table()
> > > > >
> > > > > I completely forgot that Ulf already pushed this patch and I was
> > > > > wondering on which of the OPP core changes I wrote have done this :(
> > > > >
> > > > > > Most likely this is not the entire problem but it uncovered a preexisting
> > > > > > bug.
> > > > >
> > > > > I think this is.
> > > > >
> > > > > Naresh: Can you please test with this diff ?
> > > >
> > > > I have applied your patch and tested but still see the reported problem.
> > >
> > > The git bisect shows that the first bad commit is,
> > > d05a7238fe1c mmc: sdhci-msm: Unconditionally call dev_pm_opp_of_remove_table()
> > >
> > > Reported-by: Naresh Kamboju <[email protected]>
> > > Reported-by: Anders Roxell <[email protected]>
> >
> > I am not sure what version of the patch you tested. However, I have
> > dropped Viresh's v1 and replaced it with v2 [1]. It's available for
> > testing at:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc.git next
> >
> > Can you please check if it still causes problems, then I will drop it, again.
>
> I tried to run with a kernel from your tree and I could see the same
> kernel panic on db410c [1].

Anders, Naresh - thanks for testing and reporting. I am dropping the
patch from my tree.

Viresh, I suggest to keep Anders/Naresh in the cc, for the next
version. Then I can wait for their tested-by tag before I apply again.

Kind regards
Uffe

2020-08-31 04:47:20

by Viresh Kumar

[permalink] [raw]
Subject: Re: Kernel panic : Unable to handle kernel paging request at virtual address - dead address between user and kernel address ranges

On 28-08-20, 14:23, Ulf Hansson wrote:
> Anders, Naresh - thanks for testing and reporting. I am dropping the
> patch from my tree.
>
> Viresh, I suggest to keep Anders/Naresh in the cc, for the next
> version. Then I can wait for their tested-by tag before I apply again.

Sorry for the trouble, I thought you will wait for a bit before
applying the patch to see test results from Naresh, but you were fast
enough as well :)

--
viresh