Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752648AbdDJQmm (ORCPT ); Mon, 10 Apr 2017 12:42:42 -0400 Received: from mail-pf0-f194.google.com ([209.85.192.194]:36280 "EHLO mail-pf0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753541AbdDJQkc (ORCPT ); Mon, 10 Apr 2017 12:40:32 -0400 Date: Tue, 11 Apr 2017 00:39:14 +0800 From: Wei Yang To: Borislav Petkov Cc: Wei Yang , "Kirill A. Shutemov" , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Tejun Heo , Linux Kernel Mailing List Subject: Re: [Patch V2 2/2] x86/mm/numa: remove the numa_nodemask_from_meminfo() Message-ID: <20170410163914.GA4404@WeideMacBook-Pro.local> Reply-To: Wei Yang References: <20170314030801.13656-1-richard.weiyang@gmail.com> <20170314030801.13656-2-richard.weiyang@gmail.com> <20170406124459.dwn5zhpr2xqg3lqm@node.shutemov.name> <20170406145937.docce7sa5tuqyon4@pd.tnic> <20170406154216.a4um6ftjyia5wxya@node.shutemov.name> <20170406180113.hvcydzrjldodosfo@pd.tnic> <20170406182147.mwifrukq7ylczi6i@node.shutemov.name> <20170406184838.z4pa4j33z2rp4mrg@pd.tnic> <20170410124320.fq5sw4lt2imztiyl@pd.tnic> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="sdtB3X0nJg68CQEu" Content-Disposition: inline In-Reply-To: <20170410124320.fq5sw4lt2imztiyl@pd.tnic> User-Agent: Mutt/1.7.2 (2016-11-26) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6874 Lines: 209 --sdtB3X0nJg68CQEu Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Apr 10, 2017 at 02:43:20PM +0200, Borislav Petkov wrote: >On Sun, Apr 09, 2017 at 11:12:14AM +0800, Wei Yang wrote: >> Oops, sorry to bring in the regression with my cleanup. >> I haven't noticed there is a kernel command line "numa=3Dfake", which >> is the cause of the crash I think. > >Of course it is, didn't you see my debugging upthread? > >> So from my understanding, I am goting to do these tests: >>=20 >> 1. all fake numa scenarios with Kirill's qemu command line > >It is enough if you boot the kernel with "numa=3Dfake..." > >> 2. Real numa scenarios with following qemu command option > >Not qemu command option but a kernel cmdline option. > >> 3. Baremetal >>=20 >> One more question, on the baremetal mathine, I can't change the >> numa configuration, so there would be only one case. Do you have >> some specific requirement? > >numa=3Dfake on baremetal too. > >> Well, if I missed something, just let me know :-) >>=20 >> > Qemu can emulate real numa too, for example you can boot with: >> > >> > -smp 64 \ >> > -numa node,nodeid=3D0,cpus=3D1-8 \ >> > -numa node,nodeid=3D1,cpus=3D9-16 \ >> > -numa node,nodeid=3D2,cpus=3D17-24 \ >> > -numa node,nodeid=3D3,cpus=3D25-32 \ >> > -numa node,nodeid=3D4,cpus=3D0 \ >> > -numa node,nodeid=3D4,cpus=3D33-39 \ >> > -numa node,nodeid=3D5,cpus=3D40-47 \ >> > -numa node,nodeid=3D6,cpus=3D48-55 \ >> > -numa node,nodeid=3D7,cpus=3D56-63 > >Also, do this in kvm. kvm can emulate a lot of numa configurations, do >experiment with those too. > >Basically, try to break your "cleanup". Stuff one should do for every >patch one sends anyway. Hi, Borislav I have tried several test combinations of the fake numa. The result shows g= ood. The test result marked as P (Passed), means the system boots up and simple kernel build test succeed. # test matrix and result ## Qemu With qemu, I have tried [phys_node, emu_node] =3D [(1, 4), (0, 2, 4, 8)] +----------------+--------+--------+ | phys_node | 1 | 4 | |emu_node | | | +----------------+--------+--------+ | 0 | P | P | +----------------+--------+--------+ | 2 | P | P | +----------------+--------+--------+ | 4 | P | P | +----------------+--------+--------+ | 8 | P | P | +----------------+--------+--------+ phys_node is emulated with qemu command line: =20 "-numa node,nodeid=3D0,cpus=3D1-2 -numa node,nodeid=3D1,cpus=3D3-4 -numa node,nodeid=3D2,cpus=3D0 -numa node,nodeid=3D2,cpus=3D5 -numa node,nodeid=3D3,cpus=3D6-7" emu_node is emulated with kernel command line: "numa=3Dfake=3DN" ## Baremetal On my machine, it only has one numa node, so I could just verify phys_node with 1. +----------------+--------+ | phys_node | 1 | |emu_node | | +----------------+--------+ | 0 | P | +----------------+--------+ | 2 | P | +----------------+--------+ | 4 | P | +----------------+--------+ | 8 | P | +----------------+--------+ emu_node is emulated with kernel command line: "numa=3Dfake=3DN" # Other things I observed Generally, in qemu guest, every thing looks good, while there are two thing= s I saw in baremetal machine. At first I want to emphasize, I saw the same behavior with/without my "cleanup". ## only 3 node when fake=3D4 [ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000022f5fffff] [ 0.000000] Faking node 0 at [mem 0x0000000000000000-0x000000007fffffff] (2048MB) [ 0.000000] Faking node 1 at [mem 0x0000000080000000-0x0000000133ffffff] (2880MB) [ 0.000000] Faking node 2 at [mem 0x0000000134000000-0x000000022f5fffff] (4022MB) [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009cfff] [ 0.000000] node 0: [mem 0x0000000000100000-0x000000007fffffff] [ 0.000000] node 1: [mem 0x0000000080000000-0x00000000ba5b1fff] [ 0.000000] node 1: [mem 0x00000000ba5b9000-0x00000000bad8dfff] [ 0.000000] node 1: [mem 0x00000000bafb6000-0x00000000ca8a1fff] [ 0.000000] node 1: [mem 0x00000000ca93a000-0x00000000ca977fff] [ 0.000000] node 1: [mem 0x00000000cafff000-0x00000000caffffff] [ 0.000000] node 1: [mem 0x0000000100000000-0x0000000133ffffff] [ 0.000000] node 2: [mem 0x0000000134000000-0x000000022f5fffff] ## some warning I don't see these two warnings without "numa=3Dfake=3DN". [ 0.004000] sched: CPU #1's llc-sibling CPU #0 is not on the same node! = [node: 1 !=3D 0]. Ignoring dependency. [ 0.004000] ------------[ cut here ]------------ [ 0.004000] WARNING: CPU: 1 PID: 0 at arch/x86/kernel/smpboot.c:424 topo= logy_sane.isra.5+0x6c/0x70 [ 8.594469] sysfs: cannot create duplicate filename '/devices/platform/c= oretemp.0/hwmon/hwmon2/temp2_label' [ 8.594478] ------------[ cut here ]------------ [ 8.594482] WARNING: CPU: 4 PID: 34 at fs/sysfs/dir.c:31 sysfs_warn_dup+= 0x56/0x70 # Some thoughts on the code After went throught the numa_emulation(), I suggest to restructure the numa_nodes_parsed based on the emulated nodes, instead of set numa_nodes_parsed directly in emu_setup_memblk(). Two cases in my mind, which are not friendly: 1. split_nodes_size_interleave/split_nodes_interleave() may fail or the following procedure may fail. 2. fake node may be less than physcial nodes Both of them may leads to a inaccurate numa_nodes_parsed. So I have a patch= to restructure it from emulated node info. Will send it soon. > >--=20 >Regards/Gruss, > Boris. > >Good mailing practices for 400: avoid top-posting and trim the reply. --=20 Wei Yang Help you, Help me --sdtB3X0nJg68CQEu Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJY67UyAAoJEKcLNpZP5cTdoc4QAI5bjgHyjzDv4DmUzvMDpf3z /WhcA1Y1s+RGSkqCd6/O8Ez32GKHrVLmL85+ae71uLdZzs11f7ct+fpB81YLezIl oJWYe5JLw44+0g0BaMF7H/Pfunc49eaJOEt+GZQGONx+bzQqU/ta2Bu6eph9GnOk xpGmp1QlhWaEzzSbYuExw+Pd4onC0HvjGRsYP4YZlDHdnWwx0/NQzRc1eafZHyFQ vsek9qtZtO456YyjHplxbsfe2xhiTd8jaTAEQHah1Q4RYPxLDcKPgHhnEKMr2y/s u1bO3VtXxPH1gi5PgFbItCLd73eGFIX0PKqbrQsHjeEsgsAVhVA2ySDkfrxwKTj/ jrPPRUb2noZw7qMj9AbqmEt7kMJSKzxQdhZY4eYiH+qhA1vnZVKkSwToRobjbOrL fZemmjYvLrBm+BK9dYnxZWJ8VDLqj336yANo/BleNiNMhCck6vsgwstDYZ6RfOms mRYRcNg/gaNmi6BPsox0fKdPbl07+Ku+eQnTazv9FTFdGuNJ0u0zljBHLpQlA6Pz EPTQmoJUVCpo2+E1QWntoc/nKvGIfCKq93ZkT19XXGO6UbDMx6td6ZeXsiA/RtkK 4HpKjPFamfu/dn3uDeLsS9fn+uV2CFzfAbheoloMV3B1RyJADHTrRz2ZSxTg1+Uq OGCpHypbn1y3Ru83135b =Dp4p -----END PGP SIGNATURE----- --sdtB3X0nJg68CQEu--