2019-04-08 17:10:35

by Daniel Mack

[permalink] [raw]
Subject: 5.1.0-rc4: Oops in __rpc_execute() when trying to boot from NFS

Hi,

I'm seeing the Oops below when trying to boot 5.1.0-rc4 on an ARM PXA3xx
platform. v5.0 did not show this effect with the same cmdline.

Relevant bits from the config are:

CONFIG_NFS_FS=y
CONFIG_NFS_V2=y
CONFIG_NFS_V3=y
# CONFIG_NFS_V3_ACL is not set
# CONFIG_NFS_V4 is not set
# CONFIG_NFS_SWAP is not set
CONFIG_ROOT_NFS=y
CONFIG_NFS_FSCACHE=y
# CONFIG_NFSD is not set
CONFIG_NFS_COMMON=y

Anything else I can provide? I could bisect the issue if that helps.


Best regards,
Daniel


> [ 10.226301] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [ 10.234626] pgd = (ptrval)
> [ 10.237325] [00000000] *pgd=00000000
> [ 10.241157] Internal error: Oops: 80000005 [#1] PREEMPT ARM
> [ 10.246708] Modules linked in:
> [ 10.249750] CPU: 0 PID: 1 Comm: swapper Not tainted 5.1.0-rc4+ #585
> [ 10.255967] Hardware name: Marvell PXA3xx (Device Tree Support)
> [ 10.261840] PC is at (null)
> [ 10.264809] LR is at __rpc_execute+0x84/0x314
> [ 10.269134] pc : [<00000000>] lr : [<c05d32cc>] psr: a8000013
> [ 10.275352] sp : c603dbf8 ip : 00000002 fp : c0a1b4f4
> [ 10.280531] r10: c663be40 r9 : c6721200 r8 : c663be40
> [ 10.285710] r7 : 00000000 r6 : 00000000 r5 : c0a07008 r4 : c6630aa0
> [ 10.292190] r3 : 00000000 r2 : ffff8ecc r1 : 00001770 r0 : c6630aa0
> [ 10.298671] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
> [ 10.305750] Control: 0000397f Table: a0004018 DAC: 00000071
> [ 10.311453] Process swapper (pid: 1, stack limit = 0x(ptrval))
> [ 10.317243] Stack: (0xc603dbf8 to 0xc603e000)
> [ 10.321570] dbe0: c6630aa0 00000000
> [ 10.329694] dc00: 8a50acc0 8a50acc0 c6630aa0 c674f010 c603dc30 c674f000 c663be40 c05ca0b4
> [ 10.337817] dc20: c0a07008 c0a07008 c603dc74 c05ca218 00000000 c674f000 00000000 00000000
> [ 10.345941] dc40: c603dc74 c074c414 00000000 00000000 00000610 8a50acc0 00000000 c0a07008
> [ 10.354067] dc60: c674f000 c6658e40 c603ddec c05ca2b8 c6754400 c074c428 00000000 00000000
> [ 10.362191] dc80: 00000000 8a50acc0 c6721200 c674f000 c603dd30 c05cbfa8 c603dd30 c0a07008
> [ 10.370315] dca0: c0a07008 c05cc1d0 00000006 c0a34820 00000000 c6742a7c 00000010 c6658d60
> [ 10.378438] dcc0: 00000000 00000000 00000000 00000000 c6721100 c600a600 c600a610 00000026
> [ 10.386563] dce0: c6742670 0001cb00 0001c200 c0a9147c c6742670 00000013 c0a07008 8a50acc0
> [ 10.394688] dd00: c6658d80 c603dd88 c0a07008 c02b4e3c 00000000 c663be40 c603ddec c603de1c
> [ 10.402811] dd20: 00000000 c6658d80 c603dd10 00000000 c0a34820 00000006 c6742a7c 00000010
> [ 10.410934] dd40: 00000000 00000000 c6658d60 00000000 c0710a98 00000000 00000003 00000001
> [ 10.419060] dd60: 00000000 00000000 00000000 8a50acc0 c603de1c c6742a00 c0a07008 c603de1c
> [ 10.427185] dd80: c603ddec c02ab388 c6742a7c 00000010 c6658d60 c6658d80 00000003 00000006
> [ 10.435310] dda0: c663be40 00000000 c603ddec c603de1c c0a34820 8a50acc0 c0a1b6cc c0a1b6cc
> [ 10.443434] ddc0: c6742a00 c603de80 c603de1c c02ab400 c0a07008 00000000 00008000 c6721200
> [ 10.451558] dde0: c0a07008 c6742a00 c6665036 0000000c 00000010 c6742a00 c6665036 c02ab5c0
> [ 10.459682] de00: 00000080 c01c811c c6658d60 00000000 c6004780 00001000 c6060010 c666503b
> [ 10.467807] de20: c6665047 c6060000 c603def4 c0a07008 000003e8 c01c9934 c6721222 c01b6c60
> [ 10.475931] de40: c6721222 c672120d 00000015 8a50acc0 00000cc0 c0a1b6cc c672120d c6721200
> [ 10.484055] de60: c0a07008 00008000 c6721200 c663be40 c0a1b4f4 c02ac7cc 00000000 c67211c0
> [ 10.492180] de80: c02a9f28 c02a9d84 c6742a00 00000000 c663be40 8a50acc0 c0a07008 c67211c0
> [ 10.500304] dea0: c6721180 00000020 c0a07008 c6665000 c67211c0 c6658d20 c0a1b4f4 c01ec264
> [ 10.508429] dec0: c67211c0 c01be90c c67211c0 c6665000 00000001 00000000 c6721180 c01ddf68
> [ 10.516552] dee0: 00000000 00000015 00000000 c6721180 c0924e01 c6020010 c50156e8 c6721180
> [ 10.524677] df00: c08115c0 00008000 c6658d20 8a50acc0 c09248f9 00008000 c6658d20 c6721180
> [ 10.532801] df20: c6665000 c07b7578 000003e8 00000000 00000000 c01de48c c6665000 c07b7578
> [ 10.540925] df40: 00000005 c0a07008 c0a09408 c0a3e6d0 00000006 c090134c c09248f9 00000000
> [ 10.549049] df60: 00000000 c0924e01 c09248f9 8a50acc0 00000000 c0a3e6d0 c0921858 00000000
> [ 10.557172] df80: 00000000 00000000 00000000 c0901550 fffffffe 00002710 00000000 00000000
> [ 10.565297] dfa0: c06748d8 c06748e0 00000000 c01010e0 00000000 00000000 00000000 00000000
> [ 10.573420] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [ 10.581544] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
> [ 10.589702] [<c05d32cc>] (__rpc_execute) from [<c05ca0b4>] (rpc_run_task+0x16c/0x178)
> [ 10.597492] [<c05ca0b4>] (rpc_run_task) from [<c05ca218>] (rpc_call_sync+0x80/0xe0)
> [ 10.605102] [<c05ca218>] (rpc_call_sync) from [<c05ca2b8>] (rpc_ping+0x40/0x64)
> [ 10.612372] [<c05ca2b8>] (rpc_ping) from [<c05cbfa8>] (rpc_create_xprt+0xd4/0x16c)
> [ 10.619897] [<c05cbfa8>] (rpc_create_xprt) from [<c05cc1d0>] (rpc_create+0x190/0x1c8)
> [ 10.627680] [<c05cc1d0>] (rpc_create) from [<c02b4e3c>] (nfs_mount+0xd8/0x180)
> [ 10.634873] [<c02b4e3c>] (nfs_mount) from [<c02ab388>] (nfs_request_mount.constprop.3+0x120/0x140)
> [ 10.643779] [<c02ab388>] (nfs_request_mount.constprop.3) from [<c02ab400>] (nfs_try_mount+0x58/0x1d0)
> [ 10.652935] [<c02ab400>] (nfs_try_mount) from [<c02ac7cc>] (nfs_fs_mount+0x3f4/0x4ec)
> [ 10.660741] [<c02ac7cc>] (nfs_fs_mount) from [<c01ec264>] (legacy_get_tree+0x20/0x44)
> [ 10.668545] [<c01ec264>] (legacy_get_tree) from [<c01be90c>] (vfs_get_tree+0x50/0x118)
> [ 10.676427] [<c01be90c>] (vfs_get_tree) from [<c01ddf68>] (do_mount+0x990/0xc04)
> [ 10.683782] [<c01ddf68>] (do_mount) from [<c01de48c>] (ksys_mount+0x70/0x98)
> [ 10.690788] [<c01de48c>] (ksys_mount) from [<c090134c>] (mount_root+0x64/0x150)
> [ 10.698055] [<c090134c>] (mount_root) from [<c0901550>] (prepare_namespace+0x118/0x178)
> [ 10.706013] [<c0901550>] (prepare_namespace) from [<c06748e0>] (kernel_init+0x8/0x110)
> [ 10.713884] [<c06748e0>] (kernel_init) from [<c01010e0>] (ret_from_fork+0x14/0x34)
> [ 10.721397] Exception stack(0xc603dfb0 to 0xc603dff8)
> [ 10.726407] dfa0: 00000000 00000000 00000000 00000000
> [ 10.734528] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [ 10.742649] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [ 10.749218] Code: bad PC value
> [ 10.752812] ---[ end trace f3e6d8ea2a09d643 ]---


2019-04-08 20:45:31

by Trond Myklebust

[permalink] [raw]
Subject: Re: 5.1.0-rc4: Oops in __rpc_execute() when trying to boot from NFS

On Mon, 2019-04-08 at 19:01 +0200, Daniel Mack wrote:
> Hi,
>
> I'm seeing the Oops below when trying to boot 5.1.0-rc4 on an ARM
> PXA3xx
> platform. v5.0 did not show this effect with the same cmdline.
>
> Relevant bits from the config are:
>
> CONFIG_NFS_FS=y
> CONFIG_NFS_V2=y
> CONFIG_NFS_V3=y
> # CONFIG_NFS_V3_ACL is not set
> # CONFIG_NFS_V4 is not set
> # CONFIG_NFS_SWAP is not set
> CONFIG_ROOT_NFS=y
> CONFIG_NFS_FSCACHE=y
> # CONFIG_NFSD is not set
> CONFIG_NFS_COMMON=y
>
> Anything else I can provide? I could bisect the issue if that helps.
>

Please do bisect if that is at all practical. I'm having trouble
interpreting this Oops.

Thanks!
Trond
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-04-09 16:26:38

by Daniel Mack

[permalink] [raw]
Subject: Re: 5.1.0-rc4: Oops in __rpc_execute() when trying to boot from NFS

On 8/4/2019 8:51 PM, Trond Myklebust wrote:
> On Mon, 2019-04-08 at 19:01 +0200, Daniel Mack wrote:
>> Hi,
>>
>> I'm seeing the Oops below when trying to boot 5.1.0-rc4 on an ARM
>> PXA3xx
>> platform. v5.0 did not show this effect with the same cmdline.
>>

> Please do bisect if that is at all practical. I'm having trouble
> interpreting this Oops.

Here you go:

009a82f6437490c262584d65a14094a818bcb747 is the first bad commit
commit 009a82f6437490c262584d65a14094a818bcb747
Author: Trond Myklebust <[email protected]>
Date: Sat Mar 9 12:07:17 2019 -0500

SUNRPC: Micro-optimise when the task is known not to be sleeping

In cases where we know the task is not sleeping, try to optimise
away the indirect call to task->tk_action() by replacing it with
a direct call.
Only change tail calls, to allow gcc to perform tail call
elimination.

Signed-off-by: Trond Myklebust <[email protected]>

:040000 040000 9803b15cf8f094c6c9cf92cef34232b8173aa915
a13d38cf4e3fd4724c9b83ec837714407eda4fb6 M include
:040000 040000 89666ad30932dc247554c9aff79d6b4f210be573
6d41d1d3e47ddcf2b9bd87a2ea43743ab18844be M net


Any idea?

Thanks,
Daniel

2019-04-09 16:56:44

by Trond Myklebust

[permalink] [raw]
Subject: Re: 5.1.0-rc4: Oops in __rpc_execute() when trying to boot from NFS

On Tue, 2019-04-09 at 18:25 +0200, Daniel Mack wrote:
> On 8/4/2019 8:51 PM, Trond Myklebust wrote:
> > On Mon, 2019-04-08 at 19:01 +0200, Daniel Mack wrote:
> > > Hi,
> > >
> > > I'm seeing the Oops below when trying to boot 5.1.0-rc4 on an ARM
> > > PXA3xx
> > > platform. v5.0 did not show this effect with the same cmdline.
> > >
> > Please do bisect if that is at all practical. I'm having trouble
> > interpreting this Oops.
>
> Here you go:
>
> 009a82f6437490c262584d65a14094a818bcb747 is the first bad commit
> commit 009a82f6437490c262584d65a14094a818bcb747
> Author: Trond Myklebust <[email protected]>
> Date: Sat Mar 9 12:07:17 2019 -0500
>
> SUNRPC: Micro-optimise when the task is known not to be sleeping
>
> In cases where we know the task is not sleeping, try to optimise
> away the indirect call to task->tk_action() by replacing it with
> a direct call.
> Only change tail calls, to allow gcc to perform tail call
> elimination.

Ah... It looks like we explicitly turn off tail call optimisation in
some ARM configs, so this might be a stack overflow.

Does your config file have THUMB2_AVOID_R_ARM_THM_JUMP11 set?

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-04-09 17:57:10

by Daniel Mack

[permalink] [raw]
Subject: Re: 5.1.0-rc4: Oops in __rpc_execute() when trying to boot from NFS

On 9/4/2019 6:55 PM, Trond Myklebust wrote:
> On Tue, 2019-04-09 at 18:25 +0200, Daniel Mack wrote:
>> On 8/4/2019 8:51 PM, Trond Myklebust wrote:
>>> On Mon, 2019-04-08 at 19:01 +0200, Daniel Mack wrote:
>>>> Hi,
>>>>
>>>> I'm seeing the Oops below when trying to boot 5.1.0-rc4 on an ARM
>>>> PXA3xx
>>>> platform. v5.0 did not show this effect with the same cmdline.
>>>>
>>> Please do bisect if that is at all practical. I'm having trouble
>>> interpreting this Oops.
>>
>> Here you go:
>>
>> 009a82f6437490c262584d65a14094a818bcb747 is the first bad commit
>> commit 009a82f6437490c262584d65a14094a818bcb747
>> Author: Trond Myklebust <[email protected]>
>> Date: Sat Mar 9 12:07:17 2019 -0500
>>
>> SUNRPC: Micro-optimise when the task is known not to be sleeping
>>
>> In cases where we know the task is not sleeping, try to optimise
>> away the indirect call to task->tk_action() by replacing it with
>> a direct call.
>> Only change tail calls, to allow gcc to perform tail call
>> elimination.
>
> Ah... It looks like we explicitly turn off tail call optimisation in
> some ARM configs, so this might be a stack overflow.
>
> Does your config file have THUMB2_AVOID_R_ARM_THM_JUMP11 set?

Nope. I don't even have THUMB2_KERNEL.

In the meantime, I tried to trace that with some printks, but the bug
appears evasive, and the backtrace changes as soon as I modify the
timing. Hmm.

Happy to test patches if you have any idea.


Thanks,
Daniel

2019-04-11 19:51:50

by Trond Myklebust

[permalink] [raw]
Subject: Re: 5.1.0-rc4: Oops in __rpc_execute() when trying to boot from NFS

Hi Daniel,

On Tue, 2019-04-09 at 19:54 +0200, Daniel Mack wrote:
> On 9/4/2019 6:55 PM, Trond Myklebust wrote:
> > On Tue, 2019-04-09 at 18:25 +0200, Daniel Mack wrote:
> > > On 8/4/2019 8:51 PM, Trond Myklebust wrote:
> > > > On Mon, 2019-04-08 at 19:01 +0200, Daniel Mack wrote:
> > > > > Hi,
> > > > >
> > > > > I'm seeing the Oops below when trying to boot 5.1.0-rc4 on an
> > > > > ARM
> > > > > PXA3xx
> > > > > platform. v5.0 did not show this effect with the same
> > > > > cmdline.
> > > > >
> > > > Please do bisect if that is at all practical. I'm having
> > > > trouble
> > > > interpreting this Oops.
> > >
> > > Here you go:
> > >
> > > 009a82f6437490c262584d65a14094a818bcb747 is the first bad commit
> > > commit 009a82f6437490c262584d65a14094a818bcb747
> > > Author: Trond Myklebust <[email protected]>
> > > Date: Sat Mar 9 12:07:17 2019 -0500
> > >
> > > SUNRPC: Micro-optimise when the task is known not to be
> > > sleeping
> > >
> > > In cases where we know the task is not sleeping, try to
> > > optimise
> > > away the indirect call to task->tk_action() by replacing it
> > > with
> > > a direct call.
> > > Only change tail calls, to allow gcc to perform tail call
> > > elimination.
> >
> > Ah... It looks like we explicitly turn off tail call optimisation
> > in
> > some ARM configs, so this might be a stack overflow.
> >
> > Does your config file have THUMB2_AVOID_R_ARM_THM_JUMP11 set?
>
> Nope. I don't even have THUMB2_KERNEL.
>
> In the meantime, I tried to trace that with some printks, but the bug
> appears evasive, and the backtrace changes as soon as I modify the
> timing. Hmm.
>
> Happy to test patches if you have any idea.
>

Could you please try pulling the 'testing' branch from
http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/testing

i.e. 'git pull git://git.linux-nfs.org/projects/trondmy/linux-nfs.git
testing'

to see if that suffices to fix the issue you're reporting?

Thanks
Trond
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-04-12 05:58:05

by Daniel Mack

[permalink] [raw]
Subject: Re: 5.1.0-rc4: Oops in __rpc_execute() when trying to boot from NFS

Hi Trond,

On 11/4/2019 9:50 PM, Trond Myklebust wrote:
> On Tue, 2019-04-09 at 19:54 +0200, Daniel Mack wrote:
>> On 9/4/2019 6:55 PM, Trond Myklebust wrote:
>>> On Tue, 2019-04-09 at 18:25 +0200, Daniel Mack wrote:
>>>> On 8/4/2019 8:51 PM, Trond Myklebust wrote:
>>>>> On Mon, 2019-04-08 at 19:01 +0200, Daniel Mack wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm seeing the Oops below when trying to boot 5.1.0-rc4 on an
>>>>>> ARM
>>>>>> PXA3xx
>>>>>> platform. v5.0 did not show this effect with the same
>>>>>> cmdline.
>>>>>>
>>>>> Please do bisect if that is at all practical. I'm having
>>>>> trouble
>>>>> interpreting this Oops.
>>>>
>>>> Here you go:
>>>>
>>>> 009a82f6437490c262584d65a14094a818bcb747 is the first bad commit
>>>> commit 009a82f6437490c262584d65a14094a818bcb747
>>>> Author: Trond Myklebust <[email protected]>
>>>> Date: Sat Mar 9 12:07:17 2019 -0500
>>>>
>>>> SUNRPC: Micro-optimise when the task is known not to be
>>>> sleeping
>>>>
>>>> In cases where we know the task is not sleeping, try to
>>>> optimise
>>>> away the indirect call to task->tk_action() by replacing it
>>>> with
>>>> a direct call.
>>>> Only change tail calls, to allow gcc to perform tail call
>>>> elimination.
>>>
>>> Ah... It looks like we explicitly turn off tail call optimisation
>>> in
>>> some ARM configs, so this might be a stack overflow.
>>>
>>> Does your config file have THUMB2_AVOID_R_ARM_THM_JUMP11 set?
>>
>> Nope. I don't even have THUMB2_KERNEL.
>>
>> In the meantime, I tried to trace that with some printks, but the bug
>> appears evasive, and the backtrace changes as soon as I modify the
>> timing. Hmm.
>>
>> Happy to test patches if you have any idea.
>>
>
> Could you please try pulling the 'testing' branch from
> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/testing
>
> i.e. 'git pull git://git.linux-nfs.org/projects/trondmy/linux-nfs.git
> testing'
>
> to see if that suffices to fix the issue you're reporting?

Yes, that revert in your branch fixes the problem. All fine again!


Thanks,
Daniel

2019-04-12 13:37:24

by Trond Myklebust

[permalink] [raw]
Subject: Re: 5.1.0-rc4: Oops in __rpc_execute() when trying to boot from NFS

On Fri, 2019-04-12 at 07:57 +0200, Daniel Mack wrote:
> Hi Trond,
>
> On 11/4/2019 9:50 PM, Trond Myklebust wrote:
> > Could you please try pulling the 'testing' branch from
> > http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=shortlog;h=refs/heads/testing
> >
> > i.e. 'git pull git://git.linux-nfs.org/projects/trondmy/linux-
> > nfs.git
> > testing'
> >
> > to see if that suffices to fix the issue you're reporting?
>
> Yes, that revert in your branch fixes the problem. All fine again!

OK. Thanks for testing!

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]