2019-04-21 08:21:13

by Qu Wenruo

[permalink] [raw]
Subject: [aarch64] Kernel crash on v5.1-rc5, __arch_copy_from_user+0x1bc/0x240

Hi,

Just hit one crash on v5.1-rc5 kernel, on ext4 filesystem.

The workload is some git checkout.
(Yeah, I'm compiling aarch64 kernel native, and have already compiled
mesa with panfrost successfully)

IIRC it's not the first time of crash, when checking out mesa, it
crashed before, but I didn't get the crash log.
The fs passes fsck.ext4 -f.

It doesn't look like it's ext4 to be blamed, but the ARM part.

Any idea on this bug?

Thanks,
Qu

[ 1575.697824] SError Interrupt on CPU1, code 0xbf000000 -- SEr
[ 1575.697827] CPU: 1 PID: 8593 Comm: git Tainted: G 75.697828]
Hardware)
[ 1575.697829] pstate: 20000005 (nzCv daif -PAN -UAO)
[ 1575.697831] pc : __arch_copy_from_user+0x1bc/0x240
[ 1575.697832] lr : copyin+0x54/0x68
[ 1575.697833] sp : ffff000019bbbb20
[ 1575.697834] x29: ffff000019bbbb20 x28: 0000000000000000
[ 1575.697837] x27: 0000000000001000 x26: ffff000010eac0f0
[ 1575.697839] x25: ffff000019bbbd60 x24: 0000000000001000
[ 1575.697842] x23: 0000000000001000 x22: ffff800009c01000
[ 1575.697844] x21: ffff000019bbbd50 x20: 0000000000001000
[ 1575.697846] x19: 0000000000000000 x18: 0000000000000000
[ 1575.697849] x17: 0000000000000000 x16: 0000000000000000

[ 1575.697851] x15: 0000000000000000 x14: 656c62616e653e2d

[ 1575.697854] x13: 7069686328206669 x12: 09090a3b31203d2b

[ 1575.697856] x11: 20746e756f635f65 x10: 6c62616e653e2d70

[ 1575.697858] x9 : 69686309090a7b20 x8 : 702026262031203d

[ 1575.697861] x7 : 3d20746e756f635f x6 : ffff800009c00098

[ 1575.697863] x5 : ffff800009c01000 x4 : 0000000000000008

[ 1575.697865] x3 : 5748206573616309 x2 : 0000000000000ef8

[ 1575.697868] x1 : 0000fffffbe5e850 x0 : ffff800009c00000

[ 1575.697871] Kernel panic - not syncing: Asynchronous SError Interrupt

[ 1575.697872] CPU: 1 PID: 8593 Comm: git Tainted: G C
5.1.0-rc51
[ 1575.697873] Hardware name: Pine64 RockPro64 (DT)

[ 1575.697874] Call trace:

[ 1575.697876] dump_backtrace+0x0/0x168

[ 1575.697876] show_stack+0x24/0x30

[ 1575.697877] dump_stack+0xac/0xd4

[ 1575.697878] panic+0x150/0x2e8

[ 1575.697879] __stack_chk_fail+0x0/0x28

[ 1575.697880] arm64_serror_panic+0x80/0x8c

[ 1575.697881] do_serror+0x11c/0x120

[ 1575.697883] el1_error+0x84/0xf8

[ 1575.697884] __arch_copy_from_user+0x1bc/0x240

[ 1575.697885] iov_iter_copy_from_user_atomic+0xe4/0x3b8

[ 1575.697886] generic_perform_write+0xe8/0x1a8

[ 1575.697887] __generic_file_write_iter+0x134/0x1a0

[ 1575.697888] ext4_file_write_iter+0x1fc/0x380

[ 1575.697889] new_sync_write+0x108/0x158

[ 1575.697890] __vfs_write+0x74/0x90

[ 1575.697891] vfs_write+0xac/0x1b8

[ 1575.697892] ksys_write+0x74/0xe8

[ 1575.697893] __arm64_sys_write+0x24/0x30

[ 1575.697894] el0_svc_handler+0x90/0x118

[ 1575.697895] el0_svc+0x8/0xc

[ 1575.698031] SMP: stopping secondary CPUs

[ 1575.698032] Kernel Offset: disabled

[ 1575.698033] CPU features: 0x002,21006008

[ 1575.698034] Memory Limit: none


Attachments:
signature.asc (488.00 B)
OpenPGP digital signature

2019-04-21 09:13:09

by Qu Wenruo

[permalink] [raw]
Subject: Re: [aarch64] Kernel crash on v5.1-rc5, __arch_copy_from_user+0x1bc/0x240



On 2019/4/21 下午4:20, Qu Wenruo wrote:
> Hi,
>
> Just hit one crash on v5.1-rc5 kernel, on ext4 filesystem.

Well, also hit the same one in v5.0.8 kernel.

Exact the same backtrace.

Really not sure which part is to blame, ARM or ext4?

Thanks,
Qu

[ 46.252636] rk_gmac-dwmac fe300000.ethernet eth0: Link is Ux
[ 46.253425] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

[ 2784.731946] SError Interrupt on CPU1, code 0xbf000000 -- SError

[ 2784.731949] CPU: 1 PID: 6121 Comm: git Tainted: G C
5.0.8-1-A1
[ 2784.731951] Hardware name: Pine64 RockPro64 (DT)

[ 2784.731952] pstate: 20000005 (nzCv daif -PAN -UAO)

[ 2784.731953] pc : __arch_copy_from_user+0x1dc/0x240

[ 2784.731954] lr : copyin+0x54/0x68

[ 2784.731955] sp : ffff000016e8bb70

[ 2784.731957] x29: ffff000016e8bb70 x28: 0000000000000000

[ 2784.731959] x27: 0000000000001000 x26: ffff000010e59e28

[ 2784.731962] x25: ffff000016e8bdc0 x24: 0000000000001000

[ 2784.731964] x23: 0000000000001000 x22: ffff800009c01000

[ 2784.731967] x21: ffff000016e8bd88 x20: 0000000000001000

[ 2784.731969] x19: 0000000000000000 x18: 0000000000000000

[ 2784.731971] x17: 0000000000000000 x16: 0000000000000000

[ 2784.731974] x15: 0000000000000000 x14: 3a2266664f544872

[ 2784.731976] x13: 65746e756f432220 x12: 0a2c223122203a22

[ 2784.731979] x11: 65726f6366664f22 x10: 2020202020202020

[ 2784.731981] x9 : 0a7b202020200a2c x8 : 7d202020200a2233

[ 2784.731983] x7 : 2c322c312c302220 x6 : ffff800009c00138

[ 2784.731986] x5 : ffff800009c01000 x4 : 0000000000000008

[ 2784.731988] x3 : 3178302c36613178 x2 : 0000000000000e78

[ 2784.731990] x1 : 0000ffffdfecf2e0 x0 : ffff800009c00000

[ 2784.731993] Kernel panic - not syncing: Asynchronous SError Interrupt

[ 2784.731995] CPU: 1 PID: 6121 Comm: git Tainted: G C
5.0.8-1-A1
[ 2784.731996] Hardware name: Pine64 RockPro64 (DT)

[ 2784.731997] Call trace:

[ 2784.731998] dump_backtrace+0x0/0x1c8

[ 2784.731999] show_stack+0x24/0x30

[ 2784.732000] dump_stack+0x98/0xbc

[ 2784.732001] panic+0x14c/0x2e0

[ 2784.732002] __stack_chk_fail+0x0/0x28

[ 2784.732003] arm64_serror_panic+0x80/0x8c

[ 2784.732004] is_valid_bugaddr+0x0/0x1c

[ 2784.732005] el1_error+0x7c/0xe4

[ 2784.732006] __arch_copy_from_user+0x1dc/0x240

[ 2784.732007] iov_iter_copy_from_user_atomic+0xe4/0x368

[ 2784.732008] generic_perform_write+0xe8/0x1a8

[ 2784.732009] __generic_file_write_iter+0x134/0x1a0

[ 2784.732011] ext4_file_write_iter+0x1fc/0x380

[ 2784.732011] __vfs_write+0x150/0x188

[ 2784.732013] vfs_write+0xac/0x1b8

[ 2784.732014] ksys_write+0x6c/0xd0

[ 2784.732014] __arm64_sys_write+0x24/0x30

[ 2784.732016] el0_svc_handler+0x94/0x118

[ 2784.732016] el0_svc+0x8/0xc

[ 2784.732152] SMP: stopping secondary CPUs

[ 2784.732153] Kernel Offset: disabled

[ 2784.732154] CPU features: 0x002,21006008

[ 2784.732155] Memory Limit: none

[ 2784.751574] ---[ end Kernel panic - not syncing: Asynchronous SError
Interru-

>
> The workload is some git checkout.
> (Yeah, I'm compiling aarch64 kernel native, and have already compiled
> mesa with panfrost successfully)
>
> IIRC it's not the first time of crash, when checking out mesa, it
> crashed before, but I didn't get the crash log.
> The fs passes fsck.ext4 -f.
>
> It doesn't look like it's ext4 to be blamed, but the ARM part.
>
> Any idea on this bug?
>
> Thanks,
> Qu
>
> [ 1575.697824] SError Interrupt on CPU1, code 0xbf000000 -- SEr
> [ 1575.697827] CPU: 1 PID: 8593 Comm: git Tainted: G 75.697828]
> Hardware)
> [ 1575.697829] pstate: 20000005 (nzCv daif -PAN -UAO)
> [ 1575.697831] pc : __arch_copy_from_user+0x1bc/0x240
> [ 1575.697832] lr : copyin+0x54/0x68
> [ 1575.697833] sp : ffff000019bbbb20
> [ 1575.697834] x29: ffff000019bbbb20 x28: 0000000000000000
> [ 1575.697837] x27: 0000000000001000 x26: ffff000010eac0f0
> [ 1575.697839] x25: ffff000019bbbd60 x24: 0000000000001000
> [ 1575.697842] x23: 0000000000001000 x22: ffff800009c01000
> [ 1575.697844] x21: ffff000019bbbd50 x20: 0000000000001000
> [ 1575.697846] x19: 0000000000000000 x18: 0000000000000000
> [ 1575.697849] x17: 0000000000000000 x16: 0000000000000000
>
> [ 1575.697851] x15: 0000000000000000 x14: 656c62616e653e2d
>
> [ 1575.697854] x13: 7069686328206669 x12: 09090a3b31203d2b
>
> [ 1575.697856] x11: 20746e756f635f65 x10: 6c62616e653e2d70
>
> [ 1575.697858] x9 : 69686309090a7b20 x8 : 702026262031203d
>
> [ 1575.697861] x7 : 3d20746e756f635f x6 : ffff800009c00098
>
> [ 1575.697863] x5 : ffff800009c01000 x4 : 0000000000000008
>
> [ 1575.697865] x3 : 5748206573616309 x2 : 0000000000000ef8
>
> [ 1575.697868] x1 : 0000fffffbe5e850 x0 : ffff800009c00000
>
> [ 1575.697871] Kernel panic - not syncing: Asynchronous SError Interrupt
>
> [ 1575.697872] CPU: 1 PID: 8593 Comm: git Tainted: G C
> 5.1.0-rc51
> [ 1575.697873] Hardware name: Pine64 RockPro64 (DT)
>
> [ 1575.697874] Call trace:
>
> [ 1575.697876] dump_backtrace+0x0/0x168
>
> [ 1575.697876] show_stack+0x24/0x30
>
> [ 1575.697877] dump_stack+0xac/0xd4
>
> [ 1575.697878] panic+0x150/0x2e8
>
> [ 1575.697879] __stack_chk_fail+0x0/0x28
>
> [ 1575.697880] arm64_serror_panic+0x80/0x8c
>
> [ 1575.697881] do_serror+0x11c/0x120
>
> [ 1575.697883] el1_error+0x84/0xf8
>
> [ 1575.697884] __arch_copy_from_user+0x1bc/0x240
>
> [ 1575.697885] iov_iter_copy_from_user_atomic+0xe4/0x3b8
>
> [ 1575.697886] generic_perform_write+0xe8/0x1a8
>
> [ 1575.697887] __generic_file_write_iter+0x134/0x1a0
>
> [ 1575.697888] ext4_file_write_iter+0x1fc/0x380
>
> [ 1575.697889] new_sync_write+0x108/0x158
>
> [ 1575.697890] __vfs_write+0x74/0x90
>
> [ 1575.697891] vfs_write+0xac/0x1b8
>
> [ 1575.697892] ksys_write+0x74/0xe8
>
> [ 1575.697893] __arm64_sys_write+0x24/0x30
>
> [ 1575.697894] el0_svc_handler+0x90/0x118
>
> [ 1575.697895] el0_svc+0x8/0xc
>
> [ 1575.698031] SMP: stopping secondary CPUs
>
> [ 1575.698032] Kernel Offset: disabled
>
> [ 1575.698033] CPU features: 0x002,21006008
>
> [ 1575.698034] Memory Limit: none
>


Attachments:
signature.asc (488.00 B)
OpenPGP digital signature

2019-04-21 13:28:42

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [aarch64] Kernel crash on v5.1-rc5, __arch_copy_from_user+0x1bc/0x240

On Sun, Apr 21, 2019 at 05:12:50PM +0800, Qu Wenruo wrote:
>
>
> On 2019/4/21 下午4:20, Qu Wenruo wrote:
> > Hi,
> >
> > Just hit one crash on v5.1-rc5 kernel, on ext4 filesystem.
>
> Well, also hit the same one in v5.0.8 kernel.
>
> Exact the same backtrace.
>
> Really not sure which part is to blame, ARM or ext4?

You probably have faulty hardware:

https://community.arm.com/developer/ip-products/processors/f/cortex-a-forum/3205/re-what-is-serror-detailed-explanation-is-required


2019-04-22 01:14:46

by Qu Wenruo

[permalink] [raw]
Subject: Re: [aarch64] Kernel crash on v5.1-rc5, __arch_copy_from_user+0x1bc/0x240



On 2019/4/21 下午9:28, Matthew Wilcox wrote:
> On Sun, Apr 21, 2019 at 05:12:50PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2019/4/21 下午4:20, Qu Wenruo wrote:
>>> Hi,
>>>
>>> Just hit one crash on v5.1-rc5 kernel, on ext4 filesystem.
>>
>> Well, also hit the same one in v5.0.8 kernel.
>>
>> Exact the same backtrace.
>>
>> Really not sure which part is to blame, ARM or ext4?
>
> You probably have faulty hardware:
>
> https://community.arm.com/developer/ip-products/processors/f/cortex-a-forum/3205/re-what-is-serror-detailed-explanation-is-required
>
You're right.

I tried memtester, and kernel also crashed.

Maybe it's really faulty memory or I'm using wrong memory speed.

Anyway, thanks for the info,
Qu


Attachments:
signature.asc (488.00 B)
OpenPGP digital signature

2019-04-24 13:50:11

by James Morse

[permalink] [raw]
Subject: Re: [aarch64] Kernel crash on v5.1-rc5, __arch_copy_from_user+0x1bc/0x240

Hi Qu,

On 22/04/2019 02:14, Qu Wenruo wrote:
> On 2019/4/21 下午9:28, Matthew Wilcox wrote:
>> On Sun, Apr 21, 2019 at 05:12:50PM +0800, Qu Wenruo wrote:
>>> On 2019/4/21 下午4:20, Qu Wenruo wrote:
>>>> Just hit one crash on v5.1-rc5 kernel, on ext4 filesystem.
>>>
>>> Well, also hit the same one in v5.0.8 kernel.
>>>
>>> Exact the same backtrace.
>>>
>>> Really not sure which part is to blame, ARM or ext4?
>>
>> You probably have faulty hardware:

> I tried memtester, and kernel also crashed.
>
> Maybe it's really faulty memory or I'm using wrong memory speed.

As another option: there may be no memory at this physical address.
If your board only has 1G of memory, but the bootloader/DT is reporting 2G, you could see
SError like this (assuming this is the first access to that page).

SError can be a fatal interrupt from the hardware, its also how the CPU tells us about
'asynchronous external abort'. In this case it could be an attempt to access a physical
address where nothing exists.


Thanks,

James

2019-04-24 13:55:09

by Qu Wenruo

[permalink] [raw]
Subject: Re: [aarch64] Kernel crash on v5.1-rc5, __arch_copy_from_user+0x1bc/0x240



On 2019/4/24 下午9:50, James Morse wrote:
> Hi Qu,
>
> On 22/04/2019 02:14, Qu Wenruo wrote:
>> On 2019/4/21 下午9:28, Matthew Wilcox wrote:
>>> On Sun, Apr 21, 2019 at 05:12:50PM +0800, Qu Wenruo wrote:
>>>> On 2019/4/21 下午4:20, Qu Wenruo wrote:
>>>>> Just hit one crash on v5.1-rc5 kernel, on ext4 filesystem.
>>>>
>>>> Well, also hit the same one in v5.0.8 kernel.
>>>>
>>>> Exact the same backtrace.
>>>>
>>>> Really not sure which part is to blame, ARM or ext4?
>>>
>>> You probably have faulty hardware:
>
>> I tried memtester, and kernel also crashed.
>>
>> Maybe it's really faulty memory or I'm using wrong memory speed.
>
> As another option: there may be no memory at this physical address.
> If your board only has 1G of memory, but the bootloader/DT is reporting 2G, you could see
> SError like this (assuming this is the first access to that page).

Device tree is correct and it matches.

The reason is the memory controller is not initialized correctly by Uboot.

After reverting Uboot from upstream one (with basic support) to legacy
one, no such problem anymore.

To my surprise, it's UBoot to initialize the memory controller, not the
rockchip idbloader nor the trusted image.

>
> SError can be a fatal interrupt from the hardware, its also how the CPU tells us about
> 'asynchronous external abort'. In this case it could be an attempt to access a physical
> address where nothing exists.

So it's the equvilent of X86 machine check exception?

Anyway, at least I learned something about the new ARM world.

Thanks,
Qu

>
>
> Thanks,
>
> James
>


Attachments:
signature.asc (488.00 B)
OpenPGP digital signature