Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751935AbdHVQyo (ORCPT ); Tue, 22 Aug 2017 12:54:44 -0400 Received: from mail-qt0-f194.google.com ([209.85.216.194]:35907 "EHLO mail-qt0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751238AbdHVQym (ORCPT ); Tue, 22 Aug 2017 12:54:42 -0400 Date: Tue, 22 Aug 2017 09:54:37 -0700 From: Tejun Heo To: Michael Ellerman Cc: Laurent Vivier , linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Jens Axboe , Lai Jiangshan , linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH 1/2] powerpc/workqueue: update list of possible CPUs Message-ID: <20170822165437.GG491396@devbig577.frc2.facebook.com> References: <20170821134951.18848-1-lvivier@redhat.com> <20170821144832.GE491396@devbig577.frc2.facebook.com> <87r2w4bcq2.fsf@concordia.ellerman.id.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87r2w4bcq2.fsf@concordia.ellerman.id.au> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1993 Lines: 49 Hello, Michael. On Tue, Aug 22, 2017 at 11:41:41AM +1000, Michael Ellerman wrote: > > This is something powerpc needs to fix. > > There is no way for us to fix it. I don't think that's true. The CPU id used in kernel doesn't have to match the physical one and arch code should be able to pre-map CPU IDs to nodes and use the matching one when hotplugging CPUs. I'm not saying that's the best way to solve the problem tho. It could be that the best way forward is making cpu <-> node mapping dynamic and properly synchronized. However, please note that that does mean we mess up node affinity for things like per-cpu memory which are allocated before the cpu comes up, so there's some inherent benefits to keeping the mapping static even if that involves indirection. > > Workqueue isn't the only one making this assumption. mm as a whole > > assumes that CPU <-> node mapping is stable regardless of hotplug > > events. > > At least in this case I don't think the mapping changes, it's just we > don't know the mapping at boot. > > Currently we have to report possible but not present CPUs as belonging > to node 0, because otherwise we trip this helpful piece of code: > > for_each_possible_cpu(cpu) { > node = cpu_to_node(cpu); > if (WARN_ON(node == NUMA_NO_NODE)) { > pr_warn("workqueue: NUMA node mapping not available for cpu%d, disabling NUMA support\n", cpu); > /* happens iff arch is bonkers, let's just proceed */ > return; > } > > But if we remove that, we could then accurately report NUMA_NO_NODE at > boot, and then update the mapping when the CPU is hotplugged. If you think that making this dynamic is the right way to go, I have no objection but we should be doing this properly instead of patching up what seems to be crashing right now. What synchronization and notification mechanisms do we need to make cpu <-> node mapping dynamic? Do we need any synchronization in memory allocation paths? If not, why would it be safe? Thanks. -- tejun