Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp974822pxp; Wed, 16 Mar 2022 23:09:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzeuUKGIAoyFImORYDu9pEOOK4bQV0hEx2bvOH5CFiUnv8niKkyMophuMmuMLuFffHMQt+M X-Received: by 2002:a17:902:8bcc:b0:14f:2294:232e with SMTP id r12-20020a1709028bcc00b0014f2294232emr3103998plo.105.1647497393537; Wed, 16 Mar 2022 23:09:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647497393; cv=none; d=google.com; s=arc-20160816; b=Cr+co6J98JP5pcYTPo5dtMMoQ6S5zHPHxaqH8iLXhSwF/hMB053xYYwTYon6Hjs2Eu QqR7IAJRpJgve/P1Kr7YcLWC7h7BNzg8qEidSRFZb/hMF6X9EQEy8Rla1wcMkDulNtbt D45nGrw6+EDnPzIXRI9w4zOfbrvgmv2CkckPFCiKYDR7+R0dGbpcgpPPxu4NmlqB90Z+ KytBmMb1TX+WjbGl+t0ylFkZ/vmP6nPAxDH/lyoG30UB0Xz0vFvNSQpGKocgJJCrWtdN U+p3vz2dqgu3APRX7D+H2shSyv2Zd4mrShlhpmMj8RbLDLMAUUf4g8H7zv3TeM7KoWlO /snw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=+2ufOnIvbUGAFd00lEMV3SW3ZhH4lv4M/Vj/vxcKBHk=; b=jOreV4Krqr0ZjE8ydQWPhN9z33oY950+/jXpvEK+JprhIuBiNnAJQUrDr21oqq6fWz IvenV44RsZ2lb3FMIrS1LtcYPacDUuABmIUtANq2b5ssISNwWtyiVRrnyKddYgl46LJ9 foBt97+y6tavIkbKQLv0BdvIUzfn2vRt6t0xedjPMVYQrd5vUVtInnGa1VF4/sZ3vw2m jCT1jjamQ26F9oyYZj1RX6N6vbl7ZkZhoo9aBfy1JcRG+y2GeC3ocXH4sqnEl+GEmVyg 0rvWbKSM/M8iV4BuHXhlWcGtQ87RNnMsHlgjJQop9jfHNBIieS8qpzowg07efoBJifFI ri5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@canonical.com header.s=20210705 header.b=cST9creb; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id n13-20020a63720d000000b003816043f15dsi1170337pgc.850.2022.03.16.23.09.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Mar 2022 23:09:53 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@canonical.com header.s=20210705 header.b=cST9creb; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B7B662BC1D1; Wed, 16 Mar 2022 22:23:26 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357564AbiCPQtx (ORCPT + 99 others); Wed, 16 Mar 2022 12:49:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353981AbiCPQtt (ORCPT ); Wed, 16 Mar 2022 12:49:49 -0400 Received: from smtp-relay-internal-1.canonical.com (smtp-relay-internal-1.canonical.com [185.125.188.123]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB5B036E0F for ; Wed, 16 Mar 2022 09:48:34 -0700 (PDT) Received: from mail-il1-f199.google.com (mail-il1-f199.google.com [209.85.166.199]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id 29F713F5F4 for ; Wed, 16 Mar 2022 16:48:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1647449312; bh=+2ufOnIvbUGAFd00lEMV3SW3ZhH4lv4M/Vj/vxcKBHk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=cST9crebXwS4jnJvxxHptYxPTz9bveWLQxzPJe61P4959Qf+S7wh1FL0HuEs9A1Ic mBIvv2dSsq20nmYR36Du2oU9ZOx+6ScajfBUFAsfv8lNztYAMKCMWhm2HKADTY8Akp S9SWEckqOL+L+qjbsidvR25IstSA1GXXV9IZA9FWuCWVn5VCzhQybf2lQY4SNR+6XX o4HWqSWFf4EvLRKfbSg//FzDVdknyPgvwSGlzsahitXW+6JFu7k9Z3cyOvMpfMHfI+ iWLBd8h/khMh5cC9RAk8548PQ7b73+GX8fcAQmUyv93nvAf7Rk0QgwLE5L+xA8AAru Fi2tlKLh0Sz0w== Received: by mail-il1-f199.google.com with SMTP id x4-20020a056e021bc400b002c7c7c72447so1567923ilv.15 for ; Wed, 16 Mar 2022 09:48:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+2ufOnIvbUGAFd00lEMV3SW3ZhH4lv4M/Vj/vxcKBHk=; b=nS74XMrAdsBjiPL4QQ1NZb/1qS0x14em+O+n5qbdR6W9/lLVq69nW/8bcePKmQv1NP sRLNERwfIxp0riPM+q5+Ov/KsH7q4+Fzz4hC6zDAbJWNBB1z7xITJFbVBmYloF2uKST+ a4xbq14N7KBt/m8gRjLZnbkfGzf561elaZ5K1w4g8MQuO3gL49z26so5qTFEIwzfBiau tU82fhhclJZCDT0CmykvREW1RMUbKKN93qJf+V35Q5fPWMRzpjySvKcCqAGkN6G48O1C EU4SeoOEnoazEwiHvO/YcbQ3afoKaZWd4ehtIrcMzPAEBQz4+ZqPwnQwUxeC1HLqxu/I n7IA== X-Gm-Message-State: AOAM532DvCJTMzhPB7UZC33TTWs6BJ/v1CZA07nZ7HTEsj1DGph11rFw tuvk4yVIkxF9NRk5MVyHbjcuIjcDSl5cnBA3fMl9Ne6DBuKaPuRCVeneeWlUoF3YCZO/2GwsG4I 3LiKWRPCpN+t03dWMc1uEt67SAvH46dtPlqma9LEjpw== X-Received: by 2002:a5e:a80a:0:b0:645:b477:bc23 with SMTP id c10-20020a5ea80a000000b00645b477bc23mr406918ioa.191.1647449310615; Wed, 16 Mar 2022 09:48:30 -0700 (PDT) X-Received: by 2002:a5e:a80a:0:b0:645:b477:bc23 with SMTP id c10-20020a5ea80a000000b00645b477bc23mr406908ioa.191.1647449310297; Wed, 16 Mar 2022 09:48:30 -0700 (PDT) Received: from localhost (c-71-196-238-11.hsd1.co.comcast.net. [71.196.238.11]) by smtp.gmail.com with ESMTPSA id 66-20020a6b1545000000b00645c8667726sm1197468iov.19.2022.03.16.09.48.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Mar 2022 09:48:29 -0700 (PDT) From: dann frazier To: stable@vger.kernel.org Cc: Valentin Schneider , Dietmar Eggemann , Peter Zijlstra , Ingo Molnar , Vincent Guittot , John Paul Adrian Glaubitz , Sergei Trofimovich , Anatoly Pugachev , Andrew Morton , Linus Torvalds , linux-kernel@vger.kernel.org Subject: [PATCH v2 4.19 1/3] sched/topology: Make sched_init_numa() use a set for the deduplicating sort Date: Wed, 16 Mar 2022 10:48:06 -0600 Message-Id: <20220316164808.569272-2-dann.frazier@canonical.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220316164808.569272-1-dann.frazier@canonical.com> References: <20220316164808.569272-1-dann.frazier@canonical.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Valentin Schneider commit 620a6dc40754dc218f5b6389b5d335e9a107fd29 upstream. The deduplicating sort in sched_init_numa() assumes that the first line in the distance table contains all unique values in the entire table. I've been trying to pen what this exactly means for the topology, but it's not straightforward. For instance, topology.c uses this example: node 0 1 2 3 0: 10 20 20 30 1: 20 10 20 20 2: 20 20 10 20 3: 30 20 20 10 0 ----- 1 | / | | / | | / | 2 ----- 3 Which works out just fine. However, if we swap nodes 0 and 1: 1 ----- 0 | / | | / | | / | 2 ----- 3 we get this distance table: node 0 1 2 3 0: 10 20 20 20 1: 20 10 20 30 2: 20 20 10 20 3: 20 30 20 10 Which breaks the deduplicating sort (non-representative first line). In this case this would just be a renumbering exercise, but it so happens that we can have a deduplicating sort that goes through the whole table in O(n²) at the extra cost of a temporary memory allocation (i.e. any form of set). The ACPI spec (SLIT) mentions distances are encoded on 8 bits. Following this, implement the set as a 256-bits bitmap. Should this not be satisfactory (i.e. we want to support 32-bit values), then we'll have to go for some other sparse set implementation. This has the added benefit of letting us allocate just the right amount of memory for sched_domains_numa_distance[], rather than an arbitrary (nr_node_ids + 1). Note: DT binding equivalent (distance-map) decodes distances as 32-bit values. Signed-off-by: Valentin Schneider Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210122123943.1217-2-valentin.schneider@arm.com Signed-off-by: dann frazier --- include/linux/topology.h | 1 + kernel/sched/topology.c | 99 +++++++++++++++++++--------------------- 2 files changed, 49 insertions(+), 51 deletions(-) diff --git a/include/linux/topology.h b/include/linux/topology.h index cb0775e1ee4b..707364c90aa6 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -47,6 +47,7 @@ int arch_update_cpu_topology(void); /* Conform to ACPI 2.0 SLIT distance definitions */ #define LOCAL_DISTANCE 10 #define REMOTE_DISTANCE 20 +#define DISTANCE_BITS 8 #ifndef node_distance #define node_distance(from,to) ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE) #endif diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index f58efa5cc647..0826f3f4920a 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1322,66 +1322,58 @@ static void init_numa_topology_type(void) } } + +#define NR_DISTANCE_VALUES (1 << DISTANCE_BITS) + void sched_init_numa(void) { - int next_distance, curr_distance = node_distance(0, 0); struct sched_domain_topology_level *tl; - int level = 0; - int i, j, k; - - sched_domains_numa_distance = kzalloc(sizeof(int) * (nr_node_ids + 1), GFP_KERNEL); - if (!sched_domains_numa_distance) - return; - - /* Includes NUMA identity node at level 0. */ - sched_domains_numa_distance[level++] = curr_distance; - sched_domains_numa_levels = level; + unsigned long *distance_map; + int nr_levels = 0; + int i, j; /* * O(nr_nodes^2) deduplicating selection sort -- in order to find the * unique distances in the node_distance() table. - * - * Assumes node_distance(0,j) includes all distances in - * node_distance(i,j) in order to avoid cubic time. */ - next_distance = curr_distance; + distance_map = bitmap_alloc(NR_DISTANCE_VALUES, GFP_KERNEL); + if (!distance_map) + return; + + bitmap_zero(distance_map, NR_DISTANCE_VALUES); for (i = 0; i < nr_node_ids; i++) { for (j = 0; j < nr_node_ids; j++) { - for (k = 0; k < nr_node_ids; k++) { - int distance = node_distance(i, k); - - if (distance > curr_distance && - (distance < next_distance || - next_distance == curr_distance)) - next_distance = distance; - - /* - * While not a strong assumption it would be nice to know - * about cases where if node A is connected to B, B is not - * equally connected to A. - */ - if (sched_debug() && node_distance(k, i) != distance) - sched_numa_warn("Node-distance not symmetric"); + int distance = node_distance(i, j); - if (sched_debug() && i && !find_numa_distance(distance)) - sched_numa_warn("Node-0 not representative"); + if (distance < LOCAL_DISTANCE || distance >= NR_DISTANCE_VALUES) { + sched_numa_warn("Invalid distance value range"); + return; } - if (next_distance != curr_distance) { - sched_domains_numa_distance[level++] = next_distance; - sched_domains_numa_levels = level; - curr_distance = next_distance; - } else break; + + bitmap_set(distance_map, distance, 1); } + } + /* + * We can now figure out how many unique distance values there are and + * allocate memory accordingly. + */ + nr_levels = bitmap_weight(distance_map, NR_DISTANCE_VALUES); - /* - * In case of sched_debug() we verify the above assumption. - */ - if (!sched_debug()) - break; + sched_domains_numa_distance = kcalloc(nr_levels, sizeof(int), GFP_KERNEL); + if (!sched_domains_numa_distance) { + bitmap_free(distance_map); + return; } + for (i = 0, j = 0; i < nr_levels; i++, j++) { + j = find_next_bit(distance_map, NR_DISTANCE_VALUES, j); + sched_domains_numa_distance[i] = j; + } + + bitmap_free(distance_map); + /* - * 'level' contains the number of unique distances + * 'nr_levels' contains the number of unique distances * * The sched_domains_numa_distance[] array includes the actual distance * numbers. @@ -1390,15 +1382,15 @@ void sched_init_numa(void) /* * Here, we should temporarily reset sched_domains_numa_levels to 0. * If it fails to allocate memory for array sched_domains_numa_masks[][], - * the array will contain less then 'level' members. This could be + * the array will contain less then 'nr_levels' members. This could be * dangerous when we use it to iterate array sched_domains_numa_masks[][] * in other functions. * - * We reset it to 'level' at the end of this function. + * We reset it to 'nr_levels' at the end of this function. */ sched_domains_numa_levels = 0; - sched_domains_numa_masks = kzalloc(sizeof(void *) * level, GFP_KERNEL); + sched_domains_numa_masks = kzalloc(sizeof(void *) * nr_levels, GFP_KERNEL); if (!sched_domains_numa_masks) return; @@ -1406,7 +1398,7 @@ void sched_init_numa(void) * Now for each level, construct a mask per node which contains all * CPUs of nodes that are that many hops away from us. */ - for (i = 0; i < level; i++) { + for (i = 0; i < nr_levels; i++) { sched_domains_numa_masks[i] = kzalloc(nr_node_ids * sizeof(void *), GFP_KERNEL); if (!sched_domains_numa_masks[i]) @@ -1414,12 +1406,17 @@ void sched_init_numa(void) for (j = 0; j < nr_node_ids; j++) { struct cpumask *mask = kzalloc(cpumask_size(), GFP_KERNEL); + int k; + if (!mask) return; sched_domains_numa_masks[i][j] = mask; for_each_node(k) { + if (sched_debug() && (node_distance(j, k) != node_distance(k, j))) + sched_numa_warn("Node-distance not symmetric"); + if (node_distance(j, k) > sched_domains_numa_distance[i]) continue; @@ -1431,7 +1428,7 @@ void sched_init_numa(void) /* Compute default topology size */ for (i = 0; sched_domain_topology[i].mask; i++); - tl = kzalloc((i + level + 1) * + tl = kzalloc((i + nr_levels) * sizeof(struct sched_domain_topology_level), GFP_KERNEL); if (!tl) return; @@ -1454,7 +1451,7 @@ void sched_init_numa(void) /* * .. and append 'j' levels of NUMA goodness. */ - for (j = 1; j < level; i++, j++) { + for (j = 1; j < nr_levels; i++, j++) { tl[i] = (struct sched_domain_topology_level){ .mask = sd_numa_mask, .sd_flags = cpu_numa_flags, @@ -1466,8 +1463,8 @@ void sched_init_numa(void) sched_domain_topology = tl; - sched_domains_numa_levels = level; - sched_max_numa_distance = sched_domains_numa_distance[level - 1]; + sched_domains_numa_levels = nr_levels; + sched_max_numa_distance = sched_domains_numa_distance[nr_levels - 1]; init_numa_topology_type(); } -- 2.35.1