Received: by 2002:ac0:e34a:0:0:0:0:0 with SMTP id g10csp487538imn; Wed, 27 Jul 2022 11:34:14 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vmLb7+ORUcAIcma2a2ZX+ONie/jxdNZ/paIZY/RTjGQagBdpUNkxpD4JMahmz2KBWL6f3P X-Received: by 2002:a63:8549:0:b0:415:ed3a:a0c with SMTP id u70-20020a638549000000b00415ed3a0a0cmr19652813pgd.448.1658946854255; Wed, 27 Jul 2022 11:34:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658946854; cv=none; d=google.com; s=arc-20160816; b=aQnI9y8caelVKHNgcsqBYAuDP9oLbxPl5G7Sd8VCj+4y2YEAzU+ikI5AW7kpQnC7t3 Icgeb7DOXHKzze49Sa33iWYytu6cVPJ8U0TUHS9skFc9rUQU7TCwVL++rxORgInFpuAL yWIHWDjXae6a4GzX48nlH3MLVLQA8wUKSNJZC48FIbWFX1kDv3gDhXe0fiN2m1PTSxQ2 5L4XRPeCvu0JCe1Gtw02uEg1GB8NR5QXJo+Y5sN3dLiSQNAVsSIW5XGa6Zax7PzDpzWy XQt02ieB1T5WLoDHwxEuVbzK3DGqu3wuF/+Ej6thR2N5LTZWWbydeE5yzPdaECUjSxB7 xY3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=Qjt6jgON/IXxK9vUL+bWcu11UAynI91N7Vvf+lXsDnA=; b=0POMac6eVVdzuq4qeiZUz+gExEES/fpxwJb3RGt6TZGwjCp0ytr/nrvPU7760Wwu4f DK8NzzY5IAh7sVIwBtEHKE3zlCL22xTyszPG3id0vb0j3Tlx1tfd1+pcHvKG6MzLO/r7 FGyAgaXiTgmoGV0pqpL31R5n+DnvnwOssY7+oxYcR8dv5mj9GOEEyCW402/lhxkQhUs7 p5s0Xpipl4h0jejGcHg+Z3t96VXgCTneADpMmA0+1zQTKN74OUo8XSZ/MXM9vYESCzPT B0jBEv8DdGAh+6K/ZJZrTuHp+8kXY63QbJSca34pQ2hUO9YCY11tumSrtIu8k01ZYEIA d63Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=Y4W3ohKo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 22-20020a631456000000b00415c14120fasi21281546pgu.34.2022.07.27.11.33.58; Wed, 27 Jul 2022 11:34:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=Y4W3ohKo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234155AbiG0ST6 (ORCPT + 99 others); Wed, 27 Jul 2022 14:19:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33840 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243537AbiG0ST2 (ORCPT ); Wed, 27 Jul 2022 14:19:28 -0400 Received: from fanzine2.igalia.com (fanzine.igalia.com [178.60.130.6]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 64815B3; Wed, 27 Jul 2022 10:19:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:From: References:Cc:To:Subject:MIME-Version:Date:Message-ID:Sender:Reply-To: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=Qjt6jgON/IXxK9vUL+bWcu11UAynI91N7Vvf+lXsDnA=; b=Y4W3ohKogtIsv5sO1rsWnXkRxi 5K5uHf++cajTI1O4kH4egHglOMQeus8wDRum4prR6t6udBPNq07JZ54X79r4UJMJH02fEC/I/Y2sH v65heA7zDLHxHvS7rJjLO4xZmNTPmCVFKs7E5Ga7iwzyEign4+hVEJIq+EcgFwm0YchyRH8C30w+k KvERPaM0VaUp4n5N1AbSp9VJhr65btxexX2f+3qEBO83EjsA7A7FATL+HTj44yU92peMLVxFAO/ZF eM0DzRb3pPzswO/t+XgtST8E7yS5ado6s5eooxQT2jZbRizD2JSFUDndZq31sVfRbYs2WKLld3mkG AyDYg3xw==; Received: from 201-13-50-220.dsl.telesp.net.br ([201.13.50.220] helo=[192.168.15.109]) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_128_GCM:128) (Exim) id 1oGkhC-008yIs-Ri; Wed, 27 Jul 2022 19:19:42 +0200 Message-ID: <9ef24c18-775b-000a-5a03-4e4fe0f1c83c@igalia.com> Date: Wed, 27 Jul 2022 14:19:22 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [RFC] futex2: add NUMA awareness Content-Language: en-US To: Andrey Semashev Cc: linux-api@vger.kernel.org, fweimer@redhat.com, linux-kernel@vger.kernel.org, Darren Hart , Peter Zijlstra , Ingo Molnar , Thomas Gleixner , libc-alpha@sourceware.org, Davidlohr Bueso , Steven Rostedt , Sebastian Andrzej Siewior References: <36a8f60a-69b2-4586-434e-29820a64cd88@igalia.com> <74ba5239-27b0-299e-717c-595680cd52f9@gmail.com> <8bfd13a7-ed02-00dd-63a1-7144f2e55ef0@igalia.com> <3995754e-064b-6091-ccb0-224c3e698af2@gmail.com> From: =?UTF-8?Q?Andr=c3=a9_Almeida?= In-Reply-To: <3995754e-064b-6091-ccb0-224c3e698af2@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Às 13:42 de 22/07/22, Andrey Semashev escreveu: > On 7/14/22 18:00, André Almeida wrote: >> Hi Andrey, >> >> Thanks for the feedback. >> >> Às 08:01 de 14/07/22, Andrey Semashev escreveu: >>> On 7/14/22 06:18, André Almeida wrote: >> [...] >>>> >>>> Feedback? Who else should I CC? >>> >>> Just a few questions: >>> >>> Do I understand correctly that notifiers won't be able to wake up >>> waiters unless they know on which node they are waiting? >>> >> >> If userspace is using NUMA_FLAG, yes. Otherwise all futexes would be >> located in the default node, and userspace doesn't need to know which >> one is the default. >> >>> Is it possible to wait on a futex on different nodes? >> >> Yes, given that you specify `.hint = id` with the proper node id. > > So any given futex_wake(FUTEX_NUMA) operates only within its node, right? > >>> Is it possible to wake waiters on a futex on all nodes? When a single >>> (or N, where N is not "all") waiter is woken, which node is selected? Is >>> there a rotation of nodes, so that nodes are not skewed in terms of >>> notified waiters? >> >> Regardless of which node the waiter process is running, what matter is >> in which node the futex hash table is. So for instance if we have: >> >> struct futex32_numa f = {.value = 0, hint = 2}; >> >> And now we add some waiters for this futex: >> >> Thread 1, running on node 3: >> >> futex_wait(&f, 0, FUTEX_NUMA | FUTEX_32, NULL); >> >> Thread 2, running on node 0: >> >> futex_wait(&f, 0, FUTEX_NUMA | FUTEX_32, NULL); >> >> Thread 3, running on node 2: >> >> futex_wait(&f, 0, FUTEX_NUMA | FUTEX_32, NULL); >> >> And then, Thread 4, running on node 3: >> >> futex_wake(&f, 2, FUTEX_NUMA | FUTEX_32); >> >> Now, two waiter would wake up (e.g. T1 and T3, node 3 and 2) and they >> are from different nodes. futex_wake() doesn't provide guarantees of >> which waiter will be selected, so I can't say which node would be >> selected. > > In this example, T1, T2 and T3 are all blocking on node 2 (since all of > them presumably specify hint == 2), right? In this sense, it doesn't > matter which node they are running on, what matters is what node they > block on. yes > > What I'm asking is can I wake all threads blocked on all nodes on the > same futex? That is, is the following possible? > > // I'm using hint == -1 to indicate the current node > // of the calling thread for waiters and all nodes for notifiers > struct futex32_numa f = {.value = 0, .hint = -1}; > > Thread 1, running on node 3, blocks on node 3: > > futex_wait(&f, 0, FUTEX_NUMA | FUTEX_32, NULL); > > Thread 2, running on node 0, blocks on node 0: > > futex_wait(&f, 0, FUTEX_NUMA | FUTEX_32, NULL); > > Thread 3, running on node 2, blocks on node 2: > > futex_wait(&f, 0, FUTEX_NUMA | FUTEX_32, NULL); > > And then, Thread 4, running on whatever node: > > futex_wake(&f, -1, FUTEX_NUMA | FUTEX_32); this futex_wake will wake all futexes waiting on the node that called futex_wake(), waking only one futex in this example. They are __not__ the same futex. If they have different nodes, they would have different information inside the kernel. if you want to wake them all with the same futex_wake(), they need to be waiting on the same node. > > Here, futex_wake would wake T1, T2 and T3. Or: > > futex_wake(&f, 1, FUTEX_NUMA | FUTEX_32); this would behave exactly as the futex_wake() above. > > Here, futex_wake would wake any one of T1, T2 or T3. > >> There's no policy for fairness/starvation for futex_wake(). Do >> you think this would be important for the NUMA case? > > I'm not sure yet. If there isn't a cross-node behavior like in my > example above then, I suppose, it falls to the userspace to ensure fair > rotation of the wakeups on different nodes. If there is functionality > like this, I imagine, some sort of fairness would be desired.