Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp432051pxb; Thu, 21 Jan 2021 10:25:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJwGUtnzJjkmhj82/Tzdz7sC76sn3uj+LxzZ7rcNB2zGYWLaDNll2uy0B6g7uebbOHBj1KFk X-Received: by 2002:a17:906:4302:: with SMTP id j2mr499407ejm.217.1611253552851; Thu, 21 Jan 2021 10:25:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611253552; cv=none; d=google.com; s=arc-20160816; b=hjUe0e372PeTr2i40EuNseLj55Kt82TpLVLHYyyMmZAifUz1NR4Zn/PVzxa4X6FZ44 FPvauhD9KKz9pMndYTOP93uEqZyoX/o7H+R2bUdkzxpw8xE/RIYPOXLzyKCSJBHaqaNi LaMQCX6HL6nZug0FX0DWoBcrQXVkFeeisqLdv6/FfYO53ONyJIDMaIoXlAYBCpKRcItO 2P0nM3A1QCbuLqX43hEoayWkopj+0gRrikynx4P86esujQsOkNI18Yx4L/MdWKuiaJAo IZ7cprxUSCnPFz3MXyLqoGSMTct3+pKhafaJg9YbYOv10U/6J8phBMACMWoTIzKNhjJ3 yhsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:subject:from :references:cc:to; bh=ZaXWW0jmxN3B7SIpt/44pBhj9IzA0ApANegtPXk3xaY=; b=YPFcFYUinCWi7Kvn38R5CFBx6RTombdEeNLadeiBmS51IOEdKFKX5aJdKltSlJEjam gFfNHh8DU3p4kF5BIZ3BnqHQMg453l0t/NOcQLr2yvzDU3l1dyQ4XShWNkLOhsoNfrSB 3refYlXr8G038OofD5caBPYWmsPTxu8u8ETLtP/Vz1RBcx4eM8tljcEXHxpo3BMg+3K8 frDNPOTN4c429xMsdoRyrvF+W+76ERQX3b/jEFBRipJeijTz5kfo+B/IhV7j4PWrxfIz mJIyc+lp19GTCMJcNMZWSKHmET2+PL0rOp+eb8zZqkYaKhqOag4h9WXHCx120ko2HGrS m72A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q1si2436126edi.331.2021.01.21.10.25.28; Thu, 21 Jan 2021 10:25:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727085AbhAUSWD (ORCPT + 99 others); Thu, 21 Jan 2021 13:22:03 -0500 Received: from mx2.suse.de ([195.135.220.15]:48774 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388643AbhAUSUE (ORCPT ); Thu, 21 Jan 2021 13:20:04 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 17DF9B947; Thu, 21 Jan 2021 18:19:22 +0000 (UTC) To: Christoph Lameter , Bharata B Rao Cc: Vincent Guittot , linux-kernel , linux-mm@kvack.org, David Rientjes , Joonsoo Kim , Andrew Morton , guro@fb.com, shakeelb@google.com, Johannes Weiner , aneesh.kumar@linux.ibm.com, Jann Horn , Michal Hocko References: <20201118082759.1413056-1-bharata@linux.ibm.com> <20210121053003.GB2587010@in.ibm.com> From: Vlastimil Babka Subject: Re: [RFC PATCH v0] mm/slub: Let number of online CPUs determine the slub page order Message-ID: Date: Thu, 21 Jan 2021 19:19:21 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/21/21 11:01 AM, Christoph Lameter wrote: > On Thu, 21 Jan 2021, Bharata B Rao wrote: > >> > The problem is that calculate_order() is called a number of times >> > before secondaries CPUs are booted and it returns 1 instead of 224. >> > This makes the use of num_online_cpus() irrelevant for those cases >> > >> > After adding in my command line "slub_min_objects=36" which equals to >> > 4 * (fls(num_online_cpus()) + 1) with a correct num_online_cpus == 224 >> > , the regression diseapears: >> > >> > 9 iterations of hackbench -l 16000 -g 16: 3.201sec (+/- 0.90%) I'm surprised that hackbench is that sensitive to slab performance, anyway. It's supposed to be a scheduler benchmark? What exactly is going on? >> Should we have switched to num_present_cpus() rather than >> num_online_cpus()? If so, the below patch should address the >> above problem. > > There is certainly an initcall after secondaries are booted where we could > redo the calculate_order? We could do it even in hotplug handler. But in practice that means making sure it's safe, i.e. all users of oo_order/oo_objects must handle the value changing. Consider e.g. init_cache_random_seq() which uses oo_objects(s->oo) to allocate s->random_seq when cache s is created. Then shuffle_freelist() will use the current value of oo_objects(s->oo) to index s->random_seq, for a newly allocated slab - what if the page order has increased meanwhile due to secondary booting or hotplug? Array overflow. That's why I just made the former sysfs handler for changing order read-only. Things would be easier if we could trust *on all arches* either - num_present_cpus() to count what the hardware really physically has during boot, even if not yet onlined, at the time we init slab. This would still not handle later hotplug (probably mostly in a VM scenario, not that somebody would bring bunch of actual new cpu boards to a running bare metal system?). - num_possible_cpus()/nr_cpu_ids not to be excessive (broken BIOS?) on systems where it's not really possible to plug more CPU's. In a VM scenario we could still have an opposite problem, where theoretically "anything is possible" but the virtual cpus are never added later. We could also start questioning the very assumption that number of cpus should affect slab page size in the first place. Should it? After all, each CPU will have one or more slab pages privately cached, as we discuss in the other thread... So why make the slab pages also larger? > Or the num_online_cpus needs to be up to date earlier. Why does this issue > not occur on x86? Does x86 have an up to date num_online_cpus earlier? > >