Hi,
So recently syzcaller ran into the big deadline/period issue (again), and I
figured I should at least propose a patch that puts limits on that -- see Patch 1.
During that discussion; SCHED_OTHER servers got mentioned (also again), and I
figured I should have a poke at that too. So I took some inspiration from
patches Alessio Balsini send a while back and cobbled something together for
that too.
Included are also a bunch of patches I did for core scheduling (2-8),
which I'm probably going to just merge as they're generic cleanups.
They're included here because they make pick_next_task() simpler and
thereby thinking about the nested pick_next_task() logic inherent in
servers was less of a head-ache. (I think it can work fine without them,
but its easier with them on)
Anyway; after it all compiled it booted a kvm image all the way to userspace on
the first run, so clearly this code isn't to be trusted at all.
There's still lots of missing bits and pieces -- like changelogs and the
fair server isn't configurable or hooked into the bandwidth accounting,
but the foundation is there I think.
Enjoy!
Hi Peter,
On Fri, 26 Jul 2019 16:54:09 +0200
Peter Zijlstra <[email protected]> wrote:
> Hi,
>
> So recently syzcaller ran into the big deadline/period issue (again),
> and I figured I should at least propose a patch that puts limits on
> that -- see Patch 1.
>
> During that discussion; SCHED_OTHER servers got mentioned (also
> again), and I figured I should have a poke at that too. So I took
> some inspiration from patches Alessio Balsini send a while back and
> cobbled something together for that too.
I think Patch 1 is a very good idea!
The server patches look interesting (and they seem to be much simpler
than our patchset :). I need to have a better look at them, but this
seems to be very promising.
Thanks,
Luca
>
> Included are also a bunch of patches I did for core scheduling (2-8),
> which I'm probably going to just merge as they're generic cleanups.
> They're included here because they make pick_next_task() simpler and
> thereby thinking about the nested pick_next_task() logic inherent in
> servers was less of a head-ache. (I think it can work fine without
> them, but its easier with them on)
>
> Anyway; after it all compiled it booted a kvm image all the way to
> userspace on the first run, so clearly this code isn't to be trusted
> at all.
>
> There's still lots of missing bits and pieces -- like changelogs and
> the fair server isn't configurable or hooked into the bandwidth
> accounting, but the foundation is there I think.
>
> Enjoy!
>
Hi Peter,
While testing your series (peterz/sched/wip-deadline 7a9e91d3fe951), I ended up
in a panic at boot on a x86_64 kvm guest, would you please have a look? Here
attached the backtrace.
Happy to test any suggestion that fixes the issue.
Thanks,
Alessio
---
------>8------
[ 0.798326] ------------[ cut here ]------------
[ 0.798328] kernel BUG at kernel/sched/deadline.c:1542!
[ 0.798335] invalid opcode: 0000 [#1] SMP PTI
[ 0.798339] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.3.0-rc6+ #28
[ 0.798340] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 0.798349] RIP: 0010:enqueue_dl_entity+0x3f8/0x440
[ 0.798351] Code: ff 48 8b 85 60 0a 00 00 8b 48 28 85 c9 0f 85 99 fd ff ff c7 40 28 01 00 00 00 e9 8d fd ff ff 85 c0 75 20 f
[ 0.798353] RSP: 0000:ffffb68e40154f10 EFLAGS: 00010096
[ 0.798356] RAX: 0000000000000020 RBX: ffff974bc74d0c00 RCX: ffff974bc751b200
[ 0.798358] RDX: 0000000000000001 RSI: ffff974bc7929410 RDI: ffff974bc7929410
[ 0.798359] RBP: 0000000000000009 R08: 00000000a73eb274 R09: 00000000000f4887
[ 0.798361] R10: 0000000000000000 R11: 0000000000000000 R12: ffff974bc74d0c80
[ 0.798362] R13: 0000000000000000 R14: ffff974bc74d0d00 R15: 0000000000000000
[ 0.798365] FS: 0000000000000000(0000) GS:ffff974bc7900000(0000) knlGS:0000000000000000
[ 0.798371] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.798372] CR2: 00000000ffffffff CR3: 000000000480a000 CR4: 00000000000006e0
[ 0.798374] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.798375] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.798376] Call Trace:
[ 0.798397] <IRQ>
[ 0.798402] enqueue_task_fair+0xe69/0x11d0
[ 0.798407] activate_task+0x58/0x90
[ 0.798412] ? kvm_sched_clock_read+0xd/0x20
[ 0.798416] ttwu_do_activate.isra.96+0x3a/0x50
[ 0.798420] sched_ttwu_pending+0x5e/0x90
[ 0.798424] scheduler_ipi+0x9f/0x120
[ 0.798430] reschedule_interrupt+0xf/0x20
[ 0.798432] </IRQ>
[ 0.798436] RIP: 0010:default_idle+0x20/0x140
[ 0.798438] Code: 90 90 90 90 90 90 90 90 90 90 41 55 41 54 55 65 8b 2d f4 40 d7 70 53 0f 1f 44 00 00 e9 07 00 00 00 0f 00 5
[ 0.798440] RSP: 0000:ffffb68e40083ec0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff02
[ 0.798442] RAX: ffffffff8f29c250 RBX: 0000000000000004 RCX: ffff974bc7916000
[ 0.798444] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff974bc791ca80
[ 0.798445] RBP: 0000000000000004 R08: 000000009d74022b R09: 0000004d7ebb5820
[ 0.798447] R10: 0000000000000400 R11: 0000000000000400 R12: 0000000000000000
[ 0.798448] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 0.798451] ? __cpuidle_text_start+0x8/0x8
[ 0.798455] do_idle+0x19e/0x210
[ 0.798458] cpu_startup_entry+0x14/0x20
[ 0.798464] start_secondary+0x144/0x170
[ 0.798467] secondary_startup_64+0xa4/0xb0
[ 0.798469] Modules linked in:
[ 0.798478] ---[ end trace c2be7729c78a55ad ]---
[ 0.798482] RIP: 0010:enqueue_dl_entity+0x3f8/0x440
[ 0.798484] Code: ff 48 8b 85 60 0a 00 00 8b 48 28 85 c9 0f 85 99 fd ff ff c7 40 28 01 00 00 00 e9 8d fd ff ff 85 c0 75 20 f
[ 0.798485] RSP: 0000:ffffb68e40154f10 EFLAGS: 00010096
[ 0.798487] RAX: 0000000000000020 RBX: ffff974bc74d0c00 RCX: ffff974bc751b200
[ 0.798489] RDX: 0000000000000001 RSI: ffff974bc7929410 RDI: ffff974bc7929410
[ 0.798490] RBP: 0000000000000009 R08: 00000000a73eb274 R09: 00000000000f4887
[ 0.798491] R10: 0000000000000000 R11: 0000000000000000 R12: ffff974bc74d0c80
[ 0.798493] R13: 0000000000000000 R14: ffff974bc74d0d00 R15: 0000000000000000
[ 0.798495] FS: 0000000000000000(0000) GS:ffff974bc7900000(0000) knlGS:0000000000000000
[ 0.798500] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.798501] CR2: 00000000ffffffff CR3: 000000000480a000 CR4: 00000000000006e0
[ 0.798502] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.798504] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.798505] Kernel panic - not syncing: Fatal exception in interrupt
[ 0.799522] Kernel Offset: 0xd800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 0.875144] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
------8<------
Hi Alessio,
On 03/09/19 15:27, Alessio Balsini wrote:
> Hi Peter,
>
> While testing your series (peterz/sched/wip-deadline 7a9e91d3fe951), I ended up
> in a panic at boot on a x86_64 kvm guest, would you please have a look? Here
> attached the backtrace.
> Happy to test any suggestion that fixes the issue.
Are you running with latest fix by Peter?
https://lore.kernel.org/lkml/[email protected]/
It seems that his wip tree now has d3138279c7f3 on top (and the fix
above has been merged).
Not sure it fixes also what you are seeing, though.
Thanks,
Juri
On Wed, Sep 04, 2019 at 12:50:37PM +0200, Juri Lelli wrote:
> Hi Alessio,
>
> On 03/09/19 15:27, Alessio Balsini wrote:
> > Hi Peter,
> >
> > While testing your series (peterz/sched/wip-deadline 7a9e91d3fe951), I ended up
> > in a panic at boot on a x86_64 kvm guest, would you please have a look? Here
> > attached the backtrace.
> > Happy to test any suggestion that fixes the issue.
>
> Are you running with latest fix by Peter?
>
> https://lore.kernel.org/lkml/[email protected]/
>
> It seems that his wip tree now has d3138279c7f3 on top (and the fix
> above has been merged).
>
> Not sure it fixes also what you are seeing, though.
He likely is; but it is also very likely I messed it up somehow; I
didn't even boot that branch :/ I'll try and have a look, but I'm
running out of time before LPC.