Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp406778ybv; Wed, 19 Feb 2020 02:01:15 -0800 (PST) X-Google-Smtp-Source: APXvYqxr2q4jbSib0SiOqpWjdpb+BWV5ELzgdSdDcwJMmIr12h7GpwMei/7T1a8/RLVywisFlosB X-Received: by 2002:a05:6808:4c2:: with SMTP id a2mr3861047oie.118.1582106474938; Wed, 19 Feb 2020 02:01:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582106474; cv=none; d=google.com; s=arc-20160816; b=pE9uM1Q0g9Bh7mVhzgVDD76CHybAtb1ZojONYdtYD7svK3ET2p2U0UjovL03F/lH4E tW2iJ/uh1EQRmyZRfsx3mxSpMJiz5UnD/hH+alM9sP8gMH7w6hVPRTXyxWjuvGqgFH5B zhd4begrOQmB5QVT3Dhn/EbBuWvABkT8tszASpDg4lFOygKae1VZAboiihfszkKsGeTc WUCpJglTYzhpUYY5rTLCoV+Rg+h3iMoS4k7ml1yWYCXtc0LOv9RC2VOAeaChxrg1nNVl ew0EXv/XKfO4XtdEYLskCH7PWnczSD15rvlEMtShM76jL97TXjI+1aW5jFjBy42iKAwl gNCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=v8De8fz7bsiPi+rxf171LdpvWNZJLQq3TPj0+x4vwjc=; b=IHci/hv13OuXqRYfOmoM1Rj8I4hScwKdhimQcgHRpoj1CCPk0q7eiFy7//UYsJ/6Un ZPffUXzMNhaIIN0ETdAFnkYwROsGMAIh2lepOEmWtuj9ho36bPVnBQOrrI2kDoVsq4Up woqLcPQE+eJCrjWAlZpErkKUYFj+qMHnYYedtigp4QWPPDA5NJc6tIjh5HE/p3/Mcefi 07ON/PUmi0Mm6iOWN+vU0AONe96o/P+1VF52mNxEejHM05ftfVYoOPAM2YYq4MXAcYeE JzySafU2UvYA7v/7HyZ04G20y0TEY2crk2vV3OphIY2RxLHLR3OMyhYK8HY5ZwoxIjID t7aQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=e9fBDua5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l9si752682oti.229.2020.02.19.02.01.02; Wed, 19 Feb 2020 02:01:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=e9fBDua5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726871AbgBSKAT (ORCPT + 99 others); Wed, 19 Feb 2020 05:00:19 -0500 Received: from mail-io1-f68.google.com ([209.85.166.68]:42459 "EHLO mail-io1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726725AbgBSKAS (ORCPT ); Wed, 19 Feb 2020 05:00:18 -0500 Received: by mail-io1-f68.google.com with SMTP id z1so25142116iom.9 for ; Wed, 19 Feb 2020 02:00:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=v8De8fz7bsiPi+rxf171LdpvWNZJLQq3TPj0+x4vwjc=; b=e9fBDua5k7kesXY9nVLuwsqJex+AojArkjJ57bX5JkA/7k2Cw1cIGUsn7UOtGw5H3Y sQN9uuERhSBMnT9yuhlFy0WtXQMZG6LBsUlOcXsyb9RJcbPtqh2UKb5wIK1KVNI0l+Is Yo8zb09bT/6xtvfglQu3twEIl9FZf/2UMZLx6l/PdNb18XGsBcY2ZB+TihdJhszVSZhl g0SWU5QN/kwoz8ijkrs+8iwtfYbV6LoSRaJsABTAWki0pAOJ61xTI8iAuRDnWPcYdMS8 Dq6kHgl3JfsMQfI0yKITAxQ9QFv5ZcNlaFf7iXrZ9qEPzp27QC9B7XKJjtxxGkojvpov 35oQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=v8De8fz7bsiPi+rxf171LdpvWNZJLQq3TPj0+x4vwjc=; b=c26vGBs50a/eeGTLpWUK6J6vnORJzCE2SeXcl1HP1bg4QrEbAoJm/ZI1SIhyquZ5a6 BhkZvl8dLAmiGcSVSJib+1oSaZsKRxIkdArGeby0OEHzxksZOJeF8nUZ0GnEKg64pv1P 4fxsgf2bszZcdVmAGZ2w9DrzQs8XxjmbtacsqPyLoNT5dI3yIReZqIE3nmLe/jpC505o lEq8yiv8okXTVIBAEg9X4HFQLES2yQB49ZRIpqzkAuTQCGji5IWD5XWGAtwclgk/vLBa yzcK7E1QQ2CheSlrbjijwdt/5RYkm6+LvaPfWKlvy0Xsa8EfwZMD72M81yQnEfRAuXZQ VcfA== X-Gm-Message-State: APjAAAUbYHWcZ4MatmQUyBbVbnGvtS7MOEY5TuFHM7w0B7J2gak75ifq e+nJqwiDIQaKyH5UHDh4wBaZZyhjQb7/umwbVgw= X-Received: by 2002:a6b:6604:: with SMTP id a4mr19874115ioc.300.1582106416969; Wed, 19 Feb 2020 02:00:16 -0800 (PST) MIME-Version: 1.0 References: <787302f0-670f-fadf-14e6-ea0a73603d77@arm.com> In-Reply-To: <787302f0-670f-fadf-14e6-ea0a73603d77@arm.com> From: Chen Yu Date: Wed, 19 Feb 2020 18:00:05 +0800 Message-ID: Subject: Re: [RFC] Display the cpu of sched domain in procfs To: Valentin Schneider Cc: Linux Kernel Mailing List , Peter Zijlstra , Ingo Molnar , Juri Lelli , Vincent Guittot , Mel Gorman , Tony Luck , Aubrey Li , Tim Chen , Dave Hansen , Borislav Petkov Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Valentin, Thanks very much for looking at my question, On Wed, Feb 19, 2020 at 4:13 PM Valentin Schneider wrote: > > Hi, > > On 19/02/2020 07:15, Chen Yu wrote: > > Problem: > > sched domain topology is not always consistent with the CPU topology exposed at > > /sys/devices/system/cpu/cpuX/topology, which makes it > > hard for monitor tools to distinguish the CPUs among different sched domains. > > > > For example, on x86 if there are NUMA nodes within a package, say, > > SNC(Sub-Numa-Cluster), > > then there would be no die sched domain but only NUMA sched domains > > created. As a result, > > you don't know what the sched domain hierarchical is by only looking > > at /sys/devices/system/cpu/cpuX/topology. > > > > Although by appending sched_debug in command line would show the sched > > domain CPU topology, > > it is only printed once during boot up, which makes it hard to track > > at run-time. > > > > It should (and in my experience, is) be printed any time there is a sched > domain update - hotplug, cpusets, IOW not just at bootup. > Right, whenever domain has changed it will be printed. > e.g. if I hotplug out a CPU: > > root@valsch-juno:~# echo 0 > /sys/devices/system/cpu/cpu3/online > [40150.882586] CPU3: shutdown > [40150.885383] psci: CPU3 killed (polled 0 ms) > [40150.891362] CPU0 attaching NULL sched-domain. > [40150.895954] CPU1 attaching NULL sched-domain. > [40150.900433] CPU2 attaching NULL sched-domain. > [40150.906583] CPU3 attaching NULL sched-domain. > [40150.910998] CPU4 attaching NULL sched-domain. > [40150.915444] CPU5 attaching NULL sched-domain. > [40150.920108] CPU0 attaching sched-domain(s): > [40150.924396] domain-0: span=0,4-5 level=MC > [40150.928592] groups: 0:{ span=0 cap=444 }, 4:{ span=4 cap=445 }, 5:{ span=5 cap=446 } > [40150.936684] domain-1: span=0-2,4-5 level=DIE > [40150.941207] groups: 0:{ span=0,4-5 cap=1335 }, 1:{ span=1-2 cap=2041 } > [40150.948107] CPU1 attaching sched-domain(s): > [40150.952342] domain-0: span=1-2 level=MC > [40150.956311] groups: 1:{ span=1 cap=1020 }, 2:{ span=2 cap=1021 } > [40150.962592] domain-1: span=0-2,4-5 level=DIE > [40150.967082] groups: 1:{ span=1-2 cap=2041 }, 0:{ span=0,4-5 cap=1335 } > [40150.973984] CPU2 attaching sched-domain(s): > [40150.978208] domain-0: span=1-2 level=MC > [40150.982176] groups: 2:{ span=2 cap=1021 }, 1:{ span=1 cap=1021 } > [40150.988431] domain-1: span=0-2,4-5 level=DIE > [40150.992922] groups: 1:{ span=1-2 cap=2042 }, 0:{ span=0,4-5 cap=1335 } > [40150.999819] CPU4 attaching sched-domain(s): > [40151.004045] domain-0: span=0,4-5 level=MC > [40151.008186] groups: 4:{ span=4 cap=445 }, 5:{ span=5 cap=446 }, 0:{ span=0 cap=444 } > [40151.016220] domain-1: span=0-2,4-5 level=DIE > [40151.020722] groups: 0:{ span=0,4-5 cap=1335 }, 1:{ span=1-2 cap=2044 } > [40151.027619] CPU5 attaching sched-domain(s): > [40151.031843] domain-0: span=0,4-5 level=MC > [40151.035985] groups: 5:{ span=5 cap=446 }, 0:{ span=0 cap=444 }, 4:{ span=4 cap=445 } > [40151.044021] domain-1: span=0-2,4-5 level=DIE > [40151.048512] groups: 0:{ span=0,4-5 cap=1335 }, 1:{ span=1-2 cap=2043 } > [40151.055440] root domain span: 0-2,4-5 (max cpu_capacity = 1024) > > > Same for setting up cpusets: > > root@valsch-juno:~# cgset -r cpuset.mems=0 asym > root@valsch-juno:~# cgset -r cpuset.cpu_exclusive=1 asym > root@valsch-juno:~# > root@valsch-juno:~# cgcreate -g cpuset:smp > root@valsch-juno:~# cgset -r cpuset.cpus=4-5 smp > root@valsch-juno:~# cgset -r cpuset.mems=0 smp > root@valsch-juno:~# cgset -r cpuset.cpu_exclusive=1 smp > root@valsch-juno:~# > root@valsch-juno:~# cgset -r cpuset.sched_load_balance=0 . > [40224.135466] CPU0 attaching NULL sched-domain. > [40224.140038] CPU1 attaching NULL sched-domain. > [40224.144531] CPU2 attaching NULL sched-domain. > [40224.148951] CPU3 attaching NULL sched-domain. > [40224.153366] CPU4 attaching NULL sched-domain. > [40224.157811] CPU5 attaching NULL sched-domain. > [40224.162394] CPU0 attaching sched-domain(s): > [40224.166623] domain-0: span=0,3 level=MC > [40224.170624] groups: 0:{ span=0 cap=445 }, 3:{ span=3 cap=446 } > [40224.176709] domain-1: span=0-3 level=DIE > [40224.180884] groups: 0:{ span=0,3 cap=891 }, 1:{ span=1-2 cap=2044 } > [40224.187497] CPU1 attaching sched-domain(s): > [40224.191753] domain-0: span=1-2 level=MC > [40224.195724] groups: 1:{ span=1 cap=1021 }, 2:{ span=2 cap=1023 } > [40224.202010] domain-1: span=0-3 level=DIE > [40224.206154] groups: 1:{ span=1-2 cap=2044 }, 0:{ span=0,3 cap=890 } > [40224.212792] CPU2 attaching sched-domain(s): > [40224.217020] domain-0: span=1-2 level=MC > [40224.220989] groups: 2:{ span=2 cap=1023 }, 1:{ span=1 cap=1020 } > [40224.227244] domain-1: span=0-3 level=DIE > [40224.231386] groups: 1:{ span=1-2 cap=2042 }, 0:{ span=0,3 cap=889 } > [40224.238025] CPU3 attaching sched-domain(s): > [40224.242252] domain-0: span=0,3 level=MC > [40224.246221] groups: 3:{ span=3 cap=446 }, 0:{ span=0 cap=443 } > [40224.252329] domain-1: span=0-3 level=DIE > [40224.256474] groups: 0:{ span=0,3 cap=889 }, 1:{ span=1-2 cap=2042 } > [40224.263142] root domain span: 0-3 (max cpu_capacity = 1024) > [40224.268945] CPU4 attaching sched-domain(s): > [40224.273200] domain-0: span=4-5 level=MC > [40224.277173] groups: 4:{ span=4 cap=446 }, 5:{ span=5 cap=444 } > [40224.283291] CPU5 attaching sched-domain(s): > [40224.287517] domain-0: span=4-5 level=MC > [40224.291487] groups: 5:{ span=5 cap=444 }, 4:{ span=4 cap=446 } > [40224.297584] root domain span: 4-5 (max cpu_capacity = 446) > [40224.303185] rd 4-5: CPUs do not have asymmetric capacities > > So in short, if you have sched_debug enabled, whatever is reported as the > last sched domain hierarchy in dmesg will be the one in use. > > Now, if you have a userspace that tries to be clever and wants to use this > information then yes, this isn't ideal, but then that's a different matter. The dmesg might be lost if someone has once used dmesg -c to clear the log, and the /var/log/dmesg might not always there. And it is not common to trigger sched domain update once boot up in some environment. But anyway, this information printed by sched_debug is very fertile for knowing the topology. > I think exposing the NUMA boundaries is fair game - and they already are > via /sys/devices/system/node/node*/. It seems that the numa sysfs could not reflect the SNC topology, it just has the *leaf* numa node information. Say, node0 and node1 might form one sched_domain. > I'm not sure we'd want to expose more > (e.g. MC span), ideally that is something you shouldn't really have to care > about - that's the scheduler's job. I agree, it just aims to facilitate the user space, or just for debugging purpose. But as Dietmar replied, the /proc/schedstat has already exposed it. thanks again, Chenyu