Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp40322pxf; Wed, 17 Mar 2021 14:47:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwzm1AeMxlU25knikYTHZKcfNgonVclY6RFicUkDWCBrtf65j4TnQgF8UoHrPAv7R2AOC1p X-Received: by 2002:a17:906:aada:: with SMTP id kt26mr36780376ejb.137.1616017626955; Wed, 17 Mar 2021 14:47:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616017626; cv=none; d=google.com; s=arc-20160816; b=TCBdqyGz37zBAkDbxPRpcO791Sgz/ZllEj1tR4AK3qwxUoPkVVy0wqaL8YbxUzegZQ AHIf0SDjqTjJW01IGqVFbpI2zq8d9tfvnpZzWKZsSH2MyZDHdHIrteJzmNlIHzDjkR1s jgYDJiv0NufmrlPy5XwS2r/NBARi4fzHeVCCKQJwmLHRzgeOMnUcg4xZKfF5GjdPgin+ CI9BvKmi0jnMtKbdE7ucESQsWcVTE4hhTUdaCvTUMr5CKrrzb1ujEbhwwF2/Km12FlLO 5hAxAHSh9Ts0za5Yn49G9GGq1K2l0l8NFJn3UPh0+a9qkKv4tpIM+XMhPaIcktdDFrq4 YPJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=+D9EsyMgyfU2RuS3bKitmKyeqbJKyZVLGgAFNeuFSJk=; b=xpV87xvavCdKnVIBEb/p7K2s8BYOLMxsgiDjsCH+1FgJcAYxUnMs1LqCtDcax6XCqr 9TwHiUQSmwSgUSg6LpYGjmUwSjB/ARrSotYc3MRlmWPj7PRCCHxWyaLcQ9lhqaA+koZ7 R0LRN1nyw/a9aZIV2LpPgLj+eGtsAz7fTHihdLyQr8NT9jN77Vv2PAbnKCw61FI/kV2V i7el36cQx35+bbIW9YmjRXL91fquM9dtfhIHpr1W04hfq4XATiM9QdDdQCRq0gAYC5WO pmSO0D7zVrekoFZEqKMRruLZvKu7n1YVFqiceb2vrZYuAAagvRNJPVttgn4na8EZLLG5 mBQA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gentoo.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s13si430edc.150.2021.03.17.14.46.43; Wed, 17 Mar 2021 14:47:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gentoo.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230341AbhCQVOq (ORCPT + 99 others); Wed, 17 Mar 2021 17:14:46 -0400 Received: from smtp.gentoo.org ([140.211.166.183]:48644 "EHLO smtp.gentoo.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231375AbhCQVOd (ORCPT ); Wed, 17 Mar 2021 17:14:33 -0400 Date: Wed, 17 Mar 2021 21:14:24 +0000 From: Sergei Trofimovich To: Valentin Schneider Cc: John Paul Adrian Glaubitz , "Peter Zijlstra \(Intel\)" , Linux Kernel Mailing List , "linux-ia64\@vger.kernel.org" , debian-ia64 Subject: Re: [PATCH 0/1] sched/topology: NUMA distance deduplication Message-ID: <20210317211424.33090a37@sf> In-Reply-To: <87zgz1pmx4.mognet@arm.com> References: <255d6b5d-194e-eb0e-ecdd-97477a534441@physik.fu-berlin.de> <8735wtr2ro.mognet@arm.com> <87zgz1pmx4.mognet@arm.com> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 17 Mar 2021 20:04:07 +0000 Valentin Schneider wrote: > On 17/03/21 20:47, John Paul Adrian Glaubitz wrote: > > Helo Valentin! > > > > On 3/17/21 8:36 PM, Valentin Schneider wrote: > >> I see ACPI in your boot logs, so I'm guessing you have a bogus SLIT table > >> (the ACPI table with node distances). You should be able to double check > >> this with something like: > >> > >> $ acpidump > acpi.dump > >> $ acpixtract -a acpi.dump > >> $ iasl -d *.dat > >> $ cat slit.dsl > > > > There does not seem to be a SLIT table in my firmware: > > > > root@glendronach:~# acpidump > acpi.dump > > root@glendronach:~# acpixtract -a acpi.dump > > > > Intel ACPI Component Architecture > > ACPI Binary Table Extraction Utility version 20200925 > > Copyright (c) 2000 - 2020 Intel Corporation > > > > acpixtract(31194): unaligned access to 0x60000fffff9b3925, ip=0x4000000000003e91 > > SSDT - 3768 bytes written (0x00000EB8) - ssdt1.dat > > acpixtract(31194): unaligned access to 0x60000fffff9b3925, ip=0x4000000000003e00 > > acpixtract(31194): unaligned access to 0x60000fffff9b3925, ip=0x4000000000003e91 > > SPCR - 80 bytes written (0x00000050) - spcr.dat > > acpixtract(31194): unaligned access to 0x60000fffff9b3925, ip=0x4000000000003e00 > > acpixtract(31194): unaligned access to 0x60000fffff9b3925, ip=0x4000000000003e91 > > APIC - 200 bytes written (0x000000C8) - apic.dat > > SSDT - 1110 bytes written (0x00000456) - ssdt2.dat > > SSDT - 316 bytes written (0x0000013C) - ssdt3.dat > > SPMI - 80 bytes written (0x00000050) - spmi.dat > > DSDT - 58726 bytes written (0x0000E566) - dsdt.dat > > SSDT - 312 bytes written (0x00000138) - ssdt4.dat > > SSDT - 2150 bytes written (0x00000866) - ssdt5.dat > > SSDT - 316 bytes written (0x0000013C) - ssdt6.dat > > SSDT - 3768 bytes written (0x00000EB8) - ssdt7.dat > > FACP - 244 bytes written (0x000000F4) - facp.dat > > SSDT - 1203 bytes written (0x000004B3) - ssdt8.dat > > CPEP - 52 bytes written (0x00000034) - cpep.dat > > SSDT - 316 bytes written (0x0000013C) - ssdt9.dat > > DBGP - 52 bytes written (0x00000034) - dbgp.dat > > SSDT - 3768 bytes written (0x00000EB8) - ssdt10.dat > > FACS - 64 bytes written (0x00000040) - facs.dat > > root@glendronach:~# > > > > root@glendronach:~# ls *.dsl *.dat > > apic.dat cpep.dsl dsdt.dat facp.dsl spcr.dat spmi.dsl ssdt1.dat ssdt2.dsl ssdt4.dat ssdt5.dsl ssdt7.dat ssdt8.dsl > > apic.dsl dbgp.dat dsdt.dsl facs.dat spcr.dsl ssdt10.dat ssdt1.dsl ssdt3.dat ssdt4.dsl ssdt6.dat ssdt7.dsl ssdt9.dat > > cpep.dat dbgp.dsl facp.dat facs.dsl spmi.dat ssdt10.dsl ssdt2.dat ssdt3.dsl ssdt5.dat ssdt6.dsl ssdt8.dat ssdt9.dsl > > root@glendronach:~# > > > > Huh, then this might be some initialization fail that leaves nr_node_ids to > MAX_NUMNODES, which must be 256 in your case (NODES_SHIFT==8). Devicetree > can provide node distances, but something tells me you're not using that :-) > > >> a) Complain to your hardware vendor to have them fix the table and ship a > >> firmware fix > > > > The hardware is probably too old for the vendor to care about fixing it. > > > > Indeed, I only realized that after googling your machine > > >> b) Fix the ACPI table yourself - I've been told it's doable for *some* of > >> them, but I've never done that myself > >> c) Compile your kernel with CONFIG_NUMA=n, as AFAICT you only actually have > >> a single node > >> d) Ignore the warning > >> > >> > >> c) is clearly not ideal if you want to use a somewhat generic kernel image > >> on a wide host of machines; d) is also a bit yucky... > > > > Shouldn't the kernel be able to cope with quirky hardware? From what I remember in the past, > > ACPI tables used to be broken quite a lot and the kernel contained workarounds for such cases, > > didn't it? > > > > Technically it *is* coping with it, it's just dumping the entire NUMA > distance matrix in the process... Let me see if I can't figure out why your > system doesn't end up with nr_node_ids=1. I also poked at it a few days ago assuming it was an issue causing boot failures (it was not, it's a harmless warning). Looking at 'arch/ia64/**' NUMA presence is detected by SRAT ACPI tables (and generic ACPI also wants SLIT, those probably exist on large ia64 boxes?) None of SRAT/SLIT are present on these small machines, thus I would expect generic code to erive single fake node. Boot log suggests we even inferred 1 node system: > [ 0.040000] smp: Brought up 1 node, 4 CPUs Or is it just an early bootstrap message assuming more are to come? Could it be that we initialize too little of generic acpi boilerplace when there is no SRAT? Somewhere around: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/ia64/kernel/acpi.c#n446 arm64 and riscv calls `arch_numa_init()` and initializes numa node data in numa_init(). While ia64 only calls acpi_numa_init() but not arch_acpi_numa_init() (which would do numa_init() for us). x86 seems to vaguely resemble generic code by duplicating numa_init(). (UNTESTED) Could it be that we need something like: --- a/arch/ia64/kernel/setup.c +++ b/arch/ia64/kernel/setup.c @@ -571,7 +571,7 @@ setup_arch (char **cmdline_p) acpi_table_init(); early_acpi_boot_init(); #ifdef CONFIG_ACPI_NUMA - acpi_numa_init(); + arch_numa_init(); acpi_numa_fixup(); #ifdef CONFIG_ACPI_HOTPLUG_CPU prefill_possible_map(); -- Sergei