Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp1754936ybh; Thu, 23 Jul 2020 17:28:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyaGJ0qxbOAJT3APaD2jSGPrvpNYd4poNbeK4qSkbUfiKtmLOe6utK/h3JkXZBdBssUkhNk X-Received: by 2002:a17:907:1189:: with SMTP id uz9mr7059035ejb.478.1595550493875; Thu, 23 Jul 2020 17:28:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595550493; cv=none; d=google.com; s=arc-20160816; b=C50WF7BNPhh8tdn1+B0tG2IMHwhVpqp9SSBjYXVyTZeo9hLhyy0Avq4Qe1tDLj1atF p29LHF7cMBiUu7iIh3NoMzLDTtVF3yP9Q/svWeJSIh7Ci1AbjFjhfXE1QrcVfszg2tnR mjwT7RPLYoGAw6TNrM45S4gpLdjIpHM7O9qRwuVC5Rt/JJBaO5t/a/q9WGqjxglsMVz+ 9BXoPPYYsYREdnSL0R3XFb68hYCGjk4IgeRKRP0eaVc6j0M3V8LFexcqdmSMv+PXFIY/ X2ZFQL/yN27jSu4NbDLvdTgRYGLsF1F4d5oVAu5b480ULalD6utUV5J3L46lQYtHcv0r oIWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=SCYAm0cXhlKGumLX7lGB7J67gQ6nGdBWRNsstnRRJKg=; b=BEOUgDKT7Q9MdirmsH4WHB70esMwneVaL4vHJlG9nrEZ7Dp8dAPs8j1aTLJdibqo/1 6eU5aGE4yidEI/409YrvrOedHL9xVJHcHhSwaXwC6RjITkmx97KgVzpF/mWy1TKp/b7C JR3h5NfFVdLrMN4iswQ2Y0dVjS+Ka9qCYYjHo5tpVTj1eF8OmP1oUpdgjNNhzmECKzu8 UGWfvD7/IKuC6ylj7hiiQLAoJaIVAcbKfoWKbvuyJ8YMZ+nvAFU70Zjd2Gu2/xAFChzV ZoHnVAN2nr39KAoJulMAY1yQaMiUeDJbhBt6gixKNNNI3NwXmD0LEEPKhDHcNOqyDLwU cGvg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@posk.io header.s=google header.b="Q/PA8j0i"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id qx20si2624426ejb.489.2020.07.23.17.27.47; Thu, 23 Jul 2020 17:28:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@posk.io header.s=google header.b="Q/PA8j0i"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728275AbgGXAZR (ORCPT + 99 others); Thu, 23 Jul 2020 20:25:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57076 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726723AbgGXAZR (ORCPT ); Thu, 23 Jul 2020 20:25:17 -0400 Received: from mail-ua1-x944.google.com (mail-ua1-x944.google.com [IPv6:2607:f8b0:4864:20::944]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D61F8C0619D3 for ; Thu, 23 Jul 2020 17:25:16 -0700 (PDT) Received: by mail-ua1-x944.google.com with SMTP id n4so2420518uae.5 for ; Thu, 23 Jul 2020 17:25:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=posk.io; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=SCYAm0cXhlKGumLX7lGB7J67gQ6nGdBWRNsstnRRJKg=; b=Q/PA8j0irj/Go/IJbrKbECwGxNLKdOb1eB6VUasRjMghpx6SIs+l4WyzA+iAGgBXcJ XLiwMTRQ28QoxhKA8otzwG7Zb/g58C9sMufsWEshd0uyUsuRBZUfGsLEcQWFT7kYayNI uVI3+HuiA40NmhciTRm4JmQyaFHHOLej9HJNSlhZRACXC9C1URvoF8USzoTNgfKUW6d9 rV1qka8ORBgSWU/yiT6UrBHnPhJHZjd/WYTezWRAyA1L1jmGq4GGEoCWdqkxXcoQDA9V ViewoS0sIPBTlMPW6R/EPTyc0l6kOTNiHHDoK16t5QMeVmz11bTTfm1Sg7nPG67wm9H1 V/NQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=SCYAm0cXhlKGumLX7lGB7J67gQ6nGdBWRNsstnRRJKg=; b=muHCCQK2PF/hWEEcmOYi+uRUXhJUMHg3GoU2OCubRksGc0du66iYVpWA3jKP6pQYX9 62pn5z2k+IJ+gB/G/hzgktQu44iWsYZsVDQciXryT8rXwxxvdyGkaNhuY6wUeU6qoVYE XMB8cGyFl2xKvpfLl2qNRjr1XPR6UnTNk4Rb42i7G0x6fFwdDVCOl/RJ1PqEgEL4Qwtw xJ6QRUF1o3jRktpBjh/qmyQKbWxJUJm53TE0kobowvM2Zp7iklk/wWBJ/7VVYGkT1Np4 GTu7pPY/QLYu5HdEbGfoIvw/OGlLo8+BuyzNbM22IxuD2RFfijMrb71RT6wwflLj4cro tpZw== X-Gm-Message-State: AOAM531cd60bvR83i60ODH+4LOfvaly1anvwTDLlNJOl8s2tXZYdr1Sa Kj8Nb8108AMWtuT60qQXKimKkJTHOdMQIMpwU5Wt+A== X-Received: by 2002:ab0:3b2:: with SMTP id 47mr6354662uau.139.1595550315810; Thu, 23 Jul 2020 17:25:15 -0700 (PDT) MIME-Version: 1.0 References: <20200722234538.166697-1-posk@posk.io> <20200722234538.166697-2-posk@posk.io> <20200723112757.GN5523@worktop.programming.kicks-ass.net> In-Reply-To: <20200723112757.GN5523@worktop.programming.kicks-ass.net> From: Peter Oskolkov Date: Thu, 23 Jul 2020 17:25:05 -0700 Message-ID: Subject: Re: [PATCH for 5.9 1/3] futex: introduce FUTEX_SWAP operation To: Peter Zijlstra Cc: Linux Kernel Mailing List , Thomas Gleixner , Ingo Molnar , Ingo Molnar , Darren Hart , Vincent Guittot , Peter Oskolkov , Andrei Vagin , Paul Turner , Ben Segall , Aaron Lu , Waiman Long Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 23, 2020 at 4:28 AM Peter Zijlstra wrote: Thanks a lot for your comments, Peter! My answers below. > > On Wed, Jul 22, 2020 at 04:45:36PM -0700, Peter Oskolkov wrote: > > This patchset is the first step to open-source this work. As explained > > in the linked pdf and video, SwitchTo API has three core operations: wait, > > resume, and swap (=switch). So this patchset adds a FUTEX_SWAP operation > > that, in addition to FUTEX_WAIT and FUTEX_WAKE, will provide a foundation > > on top of which user-space threading libraries can be built. > > The PDF and video can go pound sand; you get to fully explain things > here. Will do. Should I expand the cover letter or the commit message? (I'll probably split the first patch into two in the latter case). > > What worries me is how FUTEX_SWAP would interact with the future > FUTEX_LOCK / FUTEX_UNLOCK. When we implement pthread_mutex with those, > there's very few WAIT/WAKE left. [+cc Waiman Long] I've looked through the latest FUTEX_LOCK patchset I could find ( https://lore.kernel.org/patchwork/cover/772643/ and related), and it seems that FUTEX_SWAP and FUTEX_LOCK/FUTEX_UNLOCK patchsets address the same issue (slow wakeups) but for different use cases: FUTEX_LOCK/FUTEX_UNLOCK uses spinning and lock stealing to improve futex wake/wait performance in high contention situations; FUTEX_SWAP is designed to be used for fast context switching with _no_ contention by design: the waker that is going to sleep, and the wakee are using different futexes; the userspace will have a futex per thread/task, and when needed the thread/task will either simply sleep on its futex, or context switch (=FUTEX_SWAP) into a different thread/task. I can also imagine that instead of combining WAIT/WAKE for fast context switching, a variant of FUTEX_SWAP can use LOCK/UNLOCK operations in the future, when these are available; but again, I fully expect that a single "FUTEX_LOCK the current task on futex A, FUTEX_UNLOCK futex B, context switch into the wakee" futex op will be much faster than doing the same thing in two syscalls, as FUTEX_LOCK/FUTEX_UNLOCK does not seem to be concerned with fast waking of a sleeping task, but more with minimizing sleeping in the first place. What will be faster: FUTEX_SWAP that does FUTEX_WAKE (futex A) + FUTEX_WAIT (current, futex B), or FUTEX_SWAP that does FUTEX_UNLOCK (futex A) + FUTEX_LOCK (current, futex B)? As wake+wait will always put the waker to sleep, it means that there will be a true context switch on the same CPU on the fast path; on the other hand, unlock+lock will potentially evade sleeping, so the wakee will often run on a different CPU (with the waker spinning instead of sleeping?), thus not benefitting from cache locality that fast context switching on the same CPU is meant to use... I'll add some of the considerations above to the expanded cover letter (or a commit message). > > Also, why would we commit to an ABI without ever having seen the rest? I'm not completely sure what you mean here. We do not envision any expansion/changes to the ABI proposed here, only further performance improvements. On these, we currently think that marking the wakee as the preferred next task to run on the current CPU (by storing "struct task_struct *preferred_next_tast" either in a per-CPU pointer, or in the current task_struct) and then having schedule() determine whether to follow the hint or ignore it would be the simplest way to speed up the context switch. > > On another note: wake_up_process_prefer_current_cpu() is a horrific > function name :/ That's half to a third of the line limit. I fully agree. I considered wake_up_on_current_cpu() first, but this name does not indicate that the wakeup is a "strong wish", but "current cpu" is a weak one... Do you have any suggestions? Maybe wake_up_on_cpu(struct task_struct *next, int cpu_hint)? But this seems too broad in scope, as we are interested here in only migrating the task to the current CPU... Thanks again for your comments!