Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp918187ybt; Wed, 17 Jun 2020 17:52:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwI91gVdXiPQ5TV1frUvX6tj2V46QgwmZagCh5PEbWn9NIeSPGa8XmvDsyBNktwJDAWk/fs X-Received: by 2002:a17:906:3952:: with SMTP id g18mr1833494eje.68.1592441534588; Wed, 17 Jun 2020 17:52:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592441534; cv=none; d=google.com; s=arc-20160816; b=bVJwojna8rEBsU7PgkJOROkXVqvLemX8zAClIqbf6RYtw/86Ffbz6bGOs3R2E2N8SC +QsUeiuNbO44z7QXb8//Pjdh3s+du1ciq6RtsTS0aIiP+AhiWgav1J2BOsfSSZoRiD9S StN+FYlIT0UXHTdBsdGhFzuKKn9HGu7b+iHjXzAJphK1TBQQ2wG2K9Te5JVnZl2hUpsV jsfZoAeLiIjnLiLO6B5w6K8yOw4OrY+RJnFHZeL47llFLvAxkyBH6XkugKk4OgdM5rjn wE022Z731zdWAc6zAgodhpe4zB8F5QvI1uek6CklYU9xVeXObsrFtk5OVEg/8V4XZ7+3 4btg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=o9bocZzzzDNWldBOb19aL0JrD3DAgfCWT0/GuyKYFAk=; b=t2ybYosnYtRGyIIIuBfciYe3Yrh/lijTpQbTCEEaOTFy1+oTxa5CfK0xJtbCiuyoci ovwrQ1JxR8Aqewh9ILw0UODnLeT/jDouCVSHId3gPW8dMLmNZPCLqi9G0Ym2A8aMma7P OTJOZ4SQluJARsud5mwcS87ncJlrSWqjjJObXdbB/08FWM5olS+dJDBerpMvBc1OiIwo pec0hjuJ/PpKsX7Rwpb3UTAE1XM7uiPrEE7GhHztabRJ4WsqhwsCotX+076soTDAa/Sw gGOysGCVfoZtqzAnQBdL7+Hi5pqCxKV4H7SXflgz2gr7ErHTHjJBAzpcx2PcNMAr5Fdj VDqg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=rkTdIS7M; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g25si855165ejr.690.2020.06.17.17.51.51; Wed, 17 Jun 2020 17:52:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=rkTdIS7M; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727047AbgFRAtD (ORCPT + 99 others); Wed, 17 Jun 2020 20:49:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42064 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726899AbgFRAtC (ORCPT ); Wed, 17 Jun 2020 20:49:02 -0400 Received: from mail-pf1-x444.google.com (mail-pf1-x444.google.com [IPv6:2607:f8b0:4864:20::444]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA1C6C06174E for ; Wed, 17 Jun 2020 17:49:02 -0700 (PDT) Received: by mail-pf1-x444.google.com with SMTP id x207so1978347pfc.5 for ; Wed, 17 Jun 2020 17:49:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=o9bocZzzzDNWldBOb19aL0JrD3DAgfCWT0/GuyKYFAk=; b=rkTdIS7MhTO2I2wBBSPfMICXPfkKNi2NtHsoGpbAbwz1dYXchXPlCSPnuLxbn/l0tp BYjRl0UeKvJtYFYCA+ZDwNmDZRoH7Z4KEEOIUwSBbTTRkSH2b3NChumxwKzQizsj7NSR t1RWZ7uJwksNI8hUSdkZuOKgbn/gMo7v3eVn3enDzpQfQgIEEfKJzPX24i11kGJMnlmu LszBMUIeSDLOvLxv6lVb/09WsJDi746Yx1LgXtpCCCXGipSW+nDFqd9ldeFK+KTyG0nG B9+5MTWwWP4hOq1u+gK+oeYLa6cbj9wA6Kfu8K1Wt7R7Auu9Rf8qQVuLgx2W8rXtYUP1 7GDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=o9bocZzzzDNWldBOb19aL0JrD3DAgfCWT0/GuyKYFAk=; b=OCY8GnZnamgku1UK7jLK/jftIFSl35p0rJkkZ+UKZCL2vMe58lDTHRKrD5Y/hsIFr2 rBornlVwnw2SDxxhGDhh6wK3ZtpzcXV6X+9R7JSeOdO3qHnRs8g/LyTyXqKij0xLVIOY xXU0dAE25mK54YBxV8myB2rfyc0JMAeeKJWK44IGAiufeXuaP7o2GLC9gaSVgylfSUvl dmRbp2tpc4nCQXCi7rEVmerCXXK50bfx9TK5MKBHEsQxeUSUic15j+d6FFSwHM2IPi7e 5IorLWcaejxERI5lX9TDZLVgmPeqij5VzN8dKYbgpkTPotCfGCFPmi1Jc+hhQrDSu9DK Ok2Q== X-Gm-Message-State: AOAM532bX+OMNBo9pN1OGa0t4J2NA7CkVWFd3qW1K1McyGQCpnGkyr/6 nLDFKbCkdDkKgEwT9gDPHBA= X-Received: by 2002:a63:144c:: with SMTP id 12mr938274pgu.189.1592441341992; Wed, 17 Jun 2020 17:49:01 -0700 (PDT) Received: from gmail.com ([2601:600:817f:a132:df3e:521d:99d5:710d]) by smtp.gmail.com with ESMTPSA id a19sm918008pfd.165.2020.06.17.17.49.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jun 2020 17:49:01 -0700 (PDT) Date: Wed, 17 Jun 2020 17:48:58 -0700 From: Andrei Vagin To: Peter Oskolkov Cc: Linux Kernel Mailing List , Thomas Gleixner , Ingo Molnar , Peter Zijlstra , Darren Hart , Vincent Guittot , Peter Oskolkov , avagin@google.com, "pjt@google.com" , Ben Segall Subject: Re: [RFC PATCH 1/3 v2] futex: introduce FUTEX_SWAP operation Message-ID: <20200618004858.GA326453@gmail.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 16, 2020 at 10:22:26AM -0700, Peter Oskolkov wrote: > From 6fbe0261204692a7f488261ab3c4ac696b91db5c Mon Sep 17 00:00:00 2001 > From: Peter Oskolkov > Date: Tue, 9 Jun 2020 16:03:14 -0700 > Subject: [RFC PATCH 1/3 v2] futex: introduce FUTEX_SWAP operation > > This is an RFC! > > As Paul Turner presented at LPC in 2013 ... > - pdf: http://pdxplumbers.osuosl.org/2013/ocw//system/presentations/1653/original/LPC%20-%20User%20Threading.pdf > - video: https://www.youtube.com/watch?v=KXuZi9aeGTw > > ... Google has developed an M:N userspace threading subsystem backed > by Google-private SwitchTo Linux Kernel API (page 17 in the pdf referenced > above). This subsystem provides latency-sensitive services at Google with > fine-grained user-space control/scheduling over what is running when, > and this subsystem is used widely internally (called schedulers or fibers). > > This RFC patchset is the first step to open-source this work. As explained > in the linked pdf and video, SwitchTo API has three core operations: wait, > resume, and swap (=switch). So this patchset adds a FUTEX_SWAP operation > that, in addition to FUTEX_WAIT and FUTEX_WAKE, will provide a foundation > on top of which user-space threading libraries can be built. > > Another common use case for FUTEX_SWAP is message passing a-la RPC > between tasks: task/thread T1 prepares a message, > wakes T2 to work on it, and waits for the results; when T2 is done, it > wakes T1 and waits for more work to arrive. Currently the simplest > way to implement this is > > a. T1: futex-wake T2, futex-wait > b. T2: wakes, does what it has been woken to do > c. T2: futex-wake T1, futex-wait > > With FUTEX_SWAP, steps a and c above can be reduced to one futex operation > that runs 5-10 times faster. > Hi Peter, We have a good use-case in gVisor for this new futex command. gVisor accesses a file system through a file proxy, called the Gofer. The gofer runs as a separate process, that is isolated from the sandbox (sentry). Gofer instances communicate with their respective sentry using the 9P-like protocol. We used sockets as communication channels, but recently we switched to the flipcall (1) library which improve performance by using shared memory for data (reducing memory copies) and using futexes for control signaling (which is much cheaper than sendto/recvfrom/sendmsg/recvmsg). I modified the flipcall library to use FUTEX_SWAP and I see a significant performance improvement. A low level benchmarks (2) shows that req/resp is more than five time faster with FUTEX_SWAP than with FUTEX_WAKE&FUTEX_WAIT. This is more or less the same test what you did. * FUTEX_WAKE & FUTEX_WAIT BenchmarkSendRecv-8 88396 13625 ns/op * FUTEX_SWAP BenchmarkSendRecv-8 479604 2524 ns/op And a more high-level test (3) which benchmarks the open syscall in gVisor shows about 40% improvements. * FUTEX_WAKE & FUTEX_WAIT BM_Open/1/real_time_mean 93996 ns * FUTEX_SWAP BM_Open/1/real_time_mean 53136 ns I believe there are many use-cases for FUTEX_SWAP in other projects. 1. https://github.com/google/gvisor/tree/master/pkg/flipcall 2. https://github.com/google/gvisor/blob/master/pkg/flipcall/flipcall_test.go#L361 3. https://github.com/google/gvisor/blob/master/test/perf/linux/open_benchmark.cc Thanks, Andrei