2014-10-23 17:01:03

by Andre Przywara

[permalink] [raw]
Subject: BPF crash with 3.18-rc1 on arm64 Juno hardware

Hi,

I see a crash with 3.18-rc1 on a Juno board related to bpf_jit (see dump
below). Userland tries to carry on afterwards, but eventually hangs in
RCU stalls.
The kernel has just CONFIG_BPF_JIT enabled, I guess Ubuntu enables this
automatically if detected.

The backtrace doesn't make too much sense to me:

void bpf_jit_free(struct bpf_prog *prog)
{
if (prog->jited)
module_free(NULL, prog->bpf_func);

kfree(prog);
}
It crashes in kfree, but has survived the dereference before.

I have no clue about BPF, so if anyone could help me debug this, I'd be
grateful.

Cheers,
Andre.


* Starting Signal sysvinit that local filesystems are mounted [ OK ]
* Starting configure network device security [ OK ]
Unable to handle kernel paging request at virtual address 37fffbd21c02290
pgd = ffffffc976538000
[37fffbd21c02290] *pgd=0000000000000000, *pud=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 3 PID: 737 Comm: kworker/3:1 Not tainted 3.18.0-rc1+ #1666
Workqueue: events bpf_prog_free_deferred
task: ffffffc977a89580 ti: ffffffc976494000 task.ti: ffffffc976494000
PC is at kfree+0x70/0x260
LR is at bpf_jit_free+0x34/0x40
pc : [<ffffffc0001b0634>] lr : [<ffffffc000099290>] pstate: a0000145
sp : ffffffc976497ca0
x29: ffffffc976497ca0 x28: 0000000000000000
x27: ffffffc97feff400 x26: ffffffc0009b0000
x25: 0000000000000000 x24: 0000000000000000
x23: ffffffc97ff03900 x22: ffffffc97feff400
x21: ffffffc000099290 x20: ffffff800009e000
x19: ffffff800009e000 x18: 0000007feb492820
x17: 0000007fb71c6980 x16: ffffffc0001fcc14
x15: 003b9aca00000000 x14: 0027947614000000
x13: ffffffffabb6d0e3 x12: 0000000000000018
x11: 0000000033c2a168 x10: 0000000000000006
x9 : ffffffc976497bd0 x8 : ffffffc977a89a90
x7 : ffffffc97736c4d0 x6 : 00000000000009be
x5 : 0000000000000000 x4 : 0000000000000001
x3 : ffffffc97feff7c0 x2 : 03ffffff02002780
x1 : 037fffff21c02290 x0 : ffffffbe00000000

Process kworker/3:1 (pid: 737, stack limit = 0xffffffc976494058)
Stack: (0xffffffc976497ca0 to 0xffffffc976498000)
....


2014-10-23 17:23:52

by Z Lim

[permalink] [raw]
Subject: Re: BPF crash with 3.18-rc1 on arm64 Juno hardware

Hi Andre,

On Thu, Oct 23, 2014 at 10:00 AM, Andre Przywara <[email protected]> wrote:
> Hi,
>
> I see a crash with 3.18-rc1 on a Juno board related to bpf_jit (see dump
> below). Userland tries to carry on afterwards, but eventually hangs in
> RCU stalls.
> The kernel has just CONFIG_BPF_JIT enabled, I guess Ubuntu enables this
> automatically if detected.
>

When net-next and arm64-next merged in mainline, a silent failure is
introduced due to new enhancements in net/bpf.
This was actually uncovered before 3.18 merge window, and Daniel's
patch to fix this was discussed here [1].
I see that Catalin has queued up this patch in fixes/core [2].

[1] https://lkml.org/lkml/2014/9/16/73
[2] https://git.kernel.org/cgit/linux/kernel/git/arm64/linux.git/commit/?h=fixes/core&id=b569c1c622c5e60c960a6ae5bd0880e0cdbd56b1)

> The backtrace doesn't make too much sense to me:
>
> void bpf_jit_free(struct bpf_prog *prog)
> {
> if (prog->jited)
> module_free(NULL, prog->bpf_func);
>
> kfree(prog);
> }
> It crashes in kfree, but has survived the dereference before.
>
> I have no clue about BPF, so if anyone could help me debug this, I'd be
> grateful.
>
> Cheers,
> Andre.
>
>
> * Starting Signal sysvinit that local filesystems are mounted [ OK ]
> * Starting configure network device security [ OK ]
> Unable to handle kernel paging request at virtual address 37fffbd21c02290
> pgd = ffffffc976538000
> [37fffbd21c02290] *pgd=0000000000000000, *pud=0000000000000000
> Internal error: Oops: 96000004 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 3 PID: 737 Comm: kworker/3:1 Not tainted 3.18.0-rc1+ #1666
> Workqueue: events bpf_prog_free_deferred
> task: ffffffc977a89580 ti: ffffffc976494000 task.ti: ffffffc976494000
> PC is at kfree+0x70/0x260
> LR is at bpf_jit_free+0x34/0x40
> pc : [<ffffffc0001b0634>] lr : [<ffffffc000099290>] pstate: a0000145
> sp : ffffffc976497ca0
> x29: ffffffc976497ca0 x28: 0000000000000000
> x27: ffffffc97feff400 x26: ffffffc0009b0000
> x25: 0000000000000000 x24: 0000000000000000
> x23: ffffffc97ff03900 x22: ffffffc97feff400
> x21: ffffffc000099290 x20: ffffff800009e000
> x19: ffffff800009e000 x18: 0000007feb492820
> x17: 0000007fb71c6980 x16: ffffffc0001fcc14
> x15: 003b9aca00000000 x14: 0027947614000000
> x13: ffffffffabb6d0e3 x12: 0000000000000018
> x11: 0000000033c2a168 x10: 0000000000000006
> x9 : ffffffc976497bd0 x8 : ffffffc977a89a90
> x7 : ffffffc97736c4d0 x6 : 00000000000009be
> x5 : 0000000000000000 x4 : 0000000000000001
> x3 : ffffffc97feff7c0 x2 : 03ffffff02002780
> x1 : 037fffff21c02290 x0 : ffffffbe00000000
>
> Process kworker/3:1 (pid: 737, stack limit = 0xffffffc976494058)
> Stack: (0xffffffc976497ca0 to 0xffffffc976498000)
> ....

2014-10-24 10:32:08

by Catalin Marinas

[permalink] [raw]
Subject: Re: BPF crash with 3.18-rc1 on arm64 Juno hardware

On Thu, Oct 23, 2014 at 06:23:49PM +0100, Z Lim wrote:
> On Thu, Oct 23, 2014 at 10:00 AM, Andre Przywara <[email protected]> wrote:
> > I see a crash with 3.18-rc1 on a Juno board related to bpf_jit (see dump
> > below). Userland tries to carry on afterwards, but eventually hangs in
> > RCU stalls.
> > The kernel has just CONFIG_BPF_JIT enabled, I guess Ubuntu enables this
> > automatically if detected.
>
> When net-next and arm64-next merged in mainline, a silent failure is
> introduced due to new enhancements in net/bpf.
> This was actually uncovered before 3.18 merge window, and Daniel's
> patch to fix this was discussed here [1].
> I see that Catalin has queued up this patch in fixes/core [2].

Indeed. Pull request to Linus will go out later today.

--
Catalin

2014-10-24 10:34:56

by Andre Przywara

[permalink] [raw]
Subject: Re: BPF crash with 3.18-rc1 on arm64 Juno hardware

Hi,

On 24/10/14 11:31, Catalin Marinas wrote:
> On Thu, Oct 23, 2014 at 06:23:49PM +0100, Z Lim wrote:
>> On Thu, Oct 23, 2014 at 10:00 AM, Andre Przywara <[email protected]> wrote:
>>> I see a crash with 3.18-rc1 on a Juno board related to bpf_jit (see dump
>>> below). Userland tries to carry on afterwards, but eventually hangs in
>>> RCU stalls.
>>> The kernel has just CONFIG_BPF_JIT enabled, I guess Ubuntu enables this
>>> automatically if detected.
>>
>> When net-next and arm64-next merged in mainline, a silent failure is
>> introduced due to new enhancements in net/bpf.
>> This was actually uncovered before 3.18 merge window, and Daniel's
>> patch to fix this was discussed here [1].
>> I see that Catalin has queued up this patch in fixes/core [2].
>
> Indeed. Pull request to Linus will go out later today.

Indeed this patch fixes it for me.
Thanks to both of you!

Cheers,
Andre.

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782