Received: by 2002:ac0:e350:0:0:0:0:0 with SMTP id g16csp2646705imn; Tue, 2 Aug 2022 10:33:05 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tCnohG/b4s2qcCVHS9B9+pILECJGVwZx1bZbfpyHDW7/FEmje+alxnnWsqV++m5geuyJ0i X-Received: by 2002:a17:907:760d:b0:730:3da3:facd with SMTP id jx13-20020a170907760d00b007303da3facdmr14389490ejc.5.1659461585651; Tue, 02 Aug 2022 10:33:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659461585; cv=none; d=google.com; s=arc-20160816; b=WMdtApbhCMCJcTUlAYMH4ATHn8vhtyIzulUgc9d2buW7VxsGvNnW1WlX8vlv5nveET 93sK3+l7IUdYg9dDAR3iSgx6/5+Sx8EJCoC/TEobC31riXHZ05H2WeKRsBGXZg24P56c MrxsKig+hus87+IUoPpMUxr9Y/POF2LTDr/zoYXFUqCI8qduG+ubvvQ5y6c6dpM6GcVa BHq/S7wbBU0mSzlFJhvLXbDMYtfTjK3wVlUk3xoB2hygpgV86Zh/apA4kd+4IFwkVUYp RMLyPgqQQHz06RlzVl5mdpZQi1ZuOF87jEdIRZhpXW9nlytUbEb/dM8HGDOVJ45reIbq zUww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=ZXWj8bAr14o0Agc06qVEnrp82B9KIUIFNSkdgF3Gohc=; b=vdPDl9TqkYZl0p0gQPcMVAa2adjwCI+pTsOM/TWCVCmk07/rI6t3RI8srU+ta1NC7j 8fRZNph/XUS6Ieexjvz6s7cDcICejE78zqXGVETasw0ZACc8GEibiVRiFZXA07wJlPru Ddf4MpQtFtLpjMhx1yTf//ZPNWARm3nXs3kFcG2JBrYyUl1pWtq9nRHq0XvytXIbJH4V ngUYind1t2EBK95iahbhQZykajXgjnVtRq92RJvdXwzZtMO08fkb37OC4Sr0X8agv0Ap Z/6P6cfTLo70s7sduBA6Cj3fTjJlLKNIkCdnnvtYxibSGiH8wzhf0XFUxmUi7hMas4YO IBpg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=cBO7meug; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o7-20020a056402038700b0043dd94fec91si3124726edv.620.2022.08.02.10.32.41; Tue, 02 Aug 2022 10:33:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=cBO7meug; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229682AbiHBRHD (ORCPT + 99 others); Tue, 2 Aug 2022 13:07:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229526AbiHBRHB (ORCPT ); Tue, 2 Aug 2022 13:07:01 -0400 Received: from mail-lf1-x130.google.com (mail-lf1-x130.google.com [IPv6:2a00:1450:4864:20::130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 948633340E for ; Tue, 2 Aug 2022 10:06:59 -0700 (PDT) Received: by mail-lf1-x130.google.com with SMTP id t1so22841869lft.8 for ; Tue, 02 Aug 2022 10:06:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=ZXWj8bAr14o0Agc06qVEnrp82B9KIUIFNSkdgF3Gohc=; b=cBO7meugLJqyR+4TK+L2GvQ1jwGL7IEu5mBgrr0z7hLEJmgMGEus+W0LqYvkdEaxU7 gWEFucqgGrc6ngDSOxBhjuO8b1xGU12K1m+wdKSbUky1Y6ACeFyS+G+q4VSo34iVs0p/ ZLf8euyc/cxp8VDona9sw7igkjVYASpFKYzhJlkcdeuXBMq0tVNrdLnPOqfXB2LnJxz2 lXYzIZXbEqjE5EJEQ4ESz767wYzWFLAWiqRR/E8OQ45TwsYNtH0+Jget0BeZPcB339Mn IKwOVtQ7Vk443URjhpJgugm6ZdwpYs8Slz59bjQF3mJ3RYMECgSo/DFnzEHVj7jPszFM y/OQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=ZXWj8bAr14o0Agc06qVEnrp82B9KIUIFNSkdgF3Gohc=; b=HTDl246Ol1cLYj/YgCMvEyf/GwVHBXrMM9Qz2nbMDup+lEeNhabFxZJaB2ATwWCzue ejV5jsFIRtMSV1GJsgXnkSwquUPzorBYffl4MmZj2KWHHPUWynab/vUIf7IF0+g79CI9 arerdhqx6+KqlqLwWOvMOtp8kyL/fIgQD6TbOqFzODW2o6N6JP49QEJ5UXB9KvfYmbb3 1LK8gJvlM+jqN4KuAkBG9HMiiDaOOBOsGfH+1twq4HEi//1uH/B+cV/hVZPCoKR7FzbW 8sm12C66EZtoKqZp6W3ikoefD25S6nqzjh6YLmFKL23kXYfXhnTHDT2D6pFvbP4A3Ihs E5fA== X-Gm-Message-State: AJIora+YslT69hK7Jn+n9STrcKlDBUNzvQ3uYwJ1Z81z3QwJkJRUz3v5 OM65s/rDfxfzuFIXjVMPvwItUxuU7upWmjSsvb45xKZgz+sFAg== X-Received: by 2002:a05:6512:10ce:b0:489:cc6b:fad with SMTP id k14-20020a05651210ce00b00489cc6b0fadmr8328207lfg.299.1659460017708; Tue, 02 Aug 2022 10:06:57 -0700 (PDT) MIME-Version: 1.0 References: <20220729190225.12726-1-mathieu.desnoyers@efficios.com> <500891137.95782.1659452479846.JavaMail.zimbra@efficios.com> In-Reply-To: <500891137.95782.1659452479846.JavaMail.zimbra@efficios.com> From: Peter Oskolkov Date: Tue, 2 Aug 2022 10:06:45 -0700 Message-ID: Subject: Re: [PATCH v3 00/23] RSEQ node id and virtual cpu id extensions To: Mathieu Desnoyers Cc: Peter Oskolkov , Peter Zijlstra , Linux Kernel Mailing List , Thomas Gleixner , "Paul E . McKenney" , Boqun Feng , "H. Peter Anvin" , Paul Turner , linux-api , Christian Brauner , Florian Weimer , David Laight , carlos , Chris Kennelly Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 2, 2022 at 8:01 AM Mathieu Desnoyers wrote: > [...] > > > > We have experimented with several approaches here. The one that we are > > currently using is the "flat" model: we allocate vcpu IDs ignoring numa nodes. > > > > We did try per-numa-node vcpus, but it did not show any material improvement > > over the "flat" model, perhaps because on our most "wide" servers the CPU > > topology is multi-level. Chris Kennelly may provide more details here. > > I would really like to know more about Google's per-numa-node vcpus implementation. > I suspect you guys may have taken a different turn somewhere in the design which > led to these results. But having not seen that implementation, I can only guess. > > I notice the following Google-specific prototype extension in tcmalloc: > > // This is a prototype extension to the rseq() syscall. Since a process may > // run on only a few cores at a time, we can use a dense set of "v(irtual) > // cpus." This can reduce cache requirements, as we only need N caches for > // the cores we actually run on simultaneously, rather than a cache for every > // physical core. > union { > struct { > short numa_node_id; > short vcpu_id; > }; > int vcpu_flat; > }; > > Can you tell me more about the way the numa_node_id and vcpu_id are allocated > internally, and how they are expected to be used by userspace ? Based on a "VCPU policy" flag passed by the userspace during rseq registration request, our kernel would: - do nothing re: vcpus, i.e. behave like it currently does upstream; - allocate VCPUs in a "flat" manner, ignoring NUMA; - populate numa_node_id with the value from the function with the same name in https://elixir.bootlin.com/linux/latest/source/include/linux/topology.h and allocate vcpu_id within the numa node in a tight manner. Basically, if there are M threads running on node 0 and N threads running on node 1 at time T, there will be [0,M-1] vcpu IDs associated with node 0 and [0,N-1] vcpu IDs associated with node 1 at this moment in time. If a thread migrates across nodes, the balance would change accordingly. I'm not sure how exactly tcmalloc tried to use VCPUs under this policy, and what were the benefits expected. The simplest way would be to keep a freelist per node_id/vcpu_id pair (basically, per vcpu_flat in the union), but this would tend to increase the number of freelists due to thread migrations, so benefits should be related to memory locality, and so somewhat difficult to measure precisely. Chris Kennelly may offer more details here. Thanks, Peter