Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752292AbcLJD3a (ORCPT ); Fri, 9 Dec 2016 22:29:30 -0500 Received: from aserp1040.oracle.com ([141.146.126.69]:40537 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751623AbcLJD33 (ORCPT ); Fri, 9 Dec 2016 22:29:29 -0500 Subject: Re: [PATCH] x86/smpboot: Make logical package management more robust To: Thomas Gleixner , LKML References: <8aa33de4-db18-759b-d2cb-0e25d5ab9d88@oracle.com> Cc: x86@kernel.org, Peter Zijlstra , Borislav Petkov , "Charles (Chas) Williams" , "M. Vefa Bicakci" , Alok Kataria , xen-devel , =?UTF-8?Q?Juergen_Gro=c3=9f?= From: Boris Ostrovsky Message-ID: <730d61ff-ff1e-df80-3446-7fceb25a6d63@oracle.com> Date: Fri, 9 Dec 2016 22:27:37 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <8aa33de4-db18-759b-d2cb-0e25d5ab9d88@oracle.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: userv0021.oracle.com [156.151.31.71] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1274 Lines: 34 On 12/09/2016 06:02 PM, Boris Ostrovsky wrote: > On 12/09/2016 05:06 PM, Thomas Gleixner wrote: >> On Thu, 8 Dec 2016, Thomas Gleixner wrote: >> >> Boris, can you please verify if that makes the >> topology_update_package_map() call which you placed into the Xen cpu >> starting code obsolete ? > > Will do. I did test your patch but without removing > topology_update_package_map() call. It complained about package IDs > being wrong, but that's expected until I fix Xen part. Ignore my statement about earlier testing --- it was all on single-node machines. Something is broken with multi-node on Intel, but failure modes are different. Prior to this patch build_sched_domain() reports an error and pretty soon we crash in scheduler (don't remember off the top of my head). With patch applied I crash mush later, when one of the drivers does kmalloc_node(.., cpu_to_node(cpu)) and cpu_to_node() returns 1, which should never happen ("x86: Booted up 1 node, 32 CPUs" is reported, for example). 2-node AMD box doesn't have these problems. I haven't upgraded the Intel machine for about a month but this all must have happened in 4.9 timeframe. So I can't answer your question since we clearly have other problems on Xen. I will be looking into this. -boris