Hello,
Andre Wild reported that fake NUMA doesn't work on s390 anymore. Doesn't
work means it crashed for Andre, or it is in an endless loop within
init_sched_groups_capacity() for me (sg != sd->groups is always true).
I could reproduce this with a very simple setup with only two nodes, where
each node has only one CPU. This allowed me to bisect it down to commit
051f3ca02e46 ("sched/topology: Introduce NUMA identity node sched domain").
With that commit reverted the system comes up again and the scheduling
domains look like this:
[ 0.148592] smp: Bringing up secondary CPUs ...
[ 0.148984] smp: Brought up 2 nodes, 2 CPUs
[ 0.149097] CPU0 attaching sched-domain(s):
[ 0.149099] domain-0: span=0-1 level=NUMA
[ 0.149101] groups: 0:{ span=0 }, 1:{ span=1 }
[ 0.149106] CPU1 attaching sched-domain(s):
[ 0.149107] domain-0: span=0-1 level=NUMA
[ 0.149108] groups: 1:{ span=1 }, 0:{ span=0 }
[ 0.149111] span: 0-1 (max cpu_capacity = 1024)
Any idea what's going wrong?
Config file is attached.
Thanks,
Heiko
On Sat, May 12, 2018 at 12:02:33PM +0200, Heiko Carstens wrote:
> Hello,
>
> Andre Wild reported that fake NUMA doesn't work on s390 anymore. Doesn't
> work means it crashed for Andre, or it is in an endless loop within
> init_sched_groups_capacity() for me (sg != sd->groups is always true).
>
> I could reproduce this with a very simple setup with only two nodes, where
> each node has only one CPU. This allowed me to bisect it down to commit
> 051f3ca02e46 ("sched/topology: Introduce NUMA identity node sched domain").
>
> With that commit reverted the system comes up again and the scheduling
> domains look like this:
>
> [ 0.148592] smp: Bringing up secondary CPUs ...
> [ 0.148984] smp: Brought up 2 nodes, 2 CPUs
> [ 0.149097] CPU0 attaching sched-domain(s):
> [ 0.149099] domain-0: span=0-1 level=NUMA
> [ 0.149101] groups: 0:{ span=0 }, 1:{ span=1 }
> [ 0.149106] CPU1 attaching sched-domain(s):
> [ 0.149107] domain-0: span=0-1 level=NUMA
> [ 0.149108] groups: 1:{ span=1 }, 0:{ span=0 }
> [ 0.149111] span: 0-1 (max cpu_capacity = 1024)
>
> Any idea what's going wrong?
Not yet; still trying to decipher your fake nume implementation.
But meanwhile; could you provide me with:
$ cat /sys/devices/system/node/node*/distance
$ cat /sys/devices/system/node/node*/cpulist
On Mon, May 14, 2018 at 11:39:09AM +0200, Peter Zijlstra wrote:
> On Sat, May 12, 2018 at 12:02:33PM +0200, Heiko Carstens wrote:
> > Hello,
> >
> > Andre Wild reported that fake NUMA doesn't work on s390 anymore. Doesn't
> > work means it crashed for Andre, or it is in an endless loop within
> > init_sched_groups_capacity() for me (sg != sd->groups is always true).
> >
> > I could reproduce this with a very simple setup with only two nodes, where
> > each node has only one CPU. This allowed me to bisect it down to commit
> > 051f3ca02e46 ("sched/topology: Introduce NUMA identity node sched domain").
> >
> > With that commit reverted the system comes up again and the scheduling
> > domains look like this:
> >
> > [ 0.148592] smp: Bringing up secondary CPUs ...
> > [ 0.148984] smp: Brought up 2 nodes, 2 CPUs
> > [ 0.149097] CPU0 attaching sched-domain(s):
> > [ 0.149099] domain-0: span=0-1 level=NUMA
> > [ 0.149101] groups: 0:{ span=0 }, 1:{ span=1 }
> > [ 0.149106] CPU1 attaching sched-domain(s):
> > [ 0.149107] domain-0: span=0-1 level=NUMA
> > [ 0.149108] groups: 1:{ span=1 }, 0:{ span=0 }
> > [ 0.149111] span: 0-1 (max cpu_capacity = 1024)
> >
> > Any idea what's going wrong?
>
> Not yet; still trying to decipher your fake nume implementation.
>
> But meanwhile; could you provide me with:
>
> $ cat /sys/devices/system/node/node*/distance
> $ cat /sys/devices/system/node/node*/cpulist
Yes, of course:
$ cat /sys/devices/system/node/node0/distance
0 10
$ cat /sys/devices/system/node/node1/distance
10 0
$ cat /sys/devices/system/node/node0/cpulist
0
$ cat /sys/devices/system/node/node1/cpulist
1