Received: by 10.192.165.148 with SMTP id m20csp115814imm; Fri, 4 May 2018 07:33:18 -0700 (PDT) X-Google-Smtp-Source: AB8JxZobrbZNKYDMpwNIisqrQP7j2agI1XX8D0JC5h5286guhpdamzOd08Z6AT8w3iDBZnvKY61X X-Received: by 2002:a17:902:481:: with SMTP id e1-v6mr28423615ple.377.1525444398629; Fri, 04 May 2018 07:33:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525444398; cv=none; d=google.com; s=arc-20160816; b=kiHJi/8f4+f9mq/IzCVVA+0XMdMoRaeZ8rtpm/8RZLdo8py600sdUHIqJqFGRBKRRG I4MwPYVIjJms6geEV4GC1e6Yh1QrgsaB94g/lrSIScl4grdUVm9VUrp3+rZOndkI991l u/G0YD5QAHElpso4NB8Ezzb3/NAvybzos3S3or+EdMrRTunLUnX7e6Cal5qy1qaNcAbj zr3GVZw/IaHY2sw/P0coy2rnbaSW1owIQ6rY2mLtiyyZmQdgezkUgvEp3EhNUcxdSgyv Bj2ZZ6VHbQqrGT6Kag8/PYZsuJtNCoor3NnlKqsaZ1VCLZAxzdt+O/cgg5wvfiv2Yh8A PL2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:arc-authentication-results; bh=sAE5ZMVw9eoeMh2vk7V9VnsLfsMhTIbZrPxDUO43Vd8=; b=T9FEV5bUkqor6ohCwJJglUjPnCW2L4tfRlLEtVaMoyiamkSOkHKt1X3wdMfUPBpLRZ T8rveORPnXU0mvukDzlrNWNMsp81+yyWNgIpUdB1c5IvW8MnbORrEQxhkYGL6k1JqqBd pMKp3VmYd8nvOReCGuvey2VFJ5yEQBCZGg5uFRMoWc9hZxhh+SRyL6iVgIHTgQtpDxoC +isHFEWGZ8k82igvlQIVRVunm9f51+SS+RiU23QscjKHNfHjRpWKIwFrcT66IAm4nVJu IQZTZJ7l9Y8Odmp+vWuMRmrqm707CLdQ1SoJUdJPs3nFfFi1SmHsAp+vwnVESsPfMxsF VA4A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y11-v6si1148388plt.455.2018.05.04.07.33.04; Fri, 04 May 2018 07:33:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751570AbeEDOc5 (ORCPT + 99 others); Fri, 4 May 2018 10:32:57 -0400 Received: from mail.efficios.com ([167.114.142.138]:47024 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751341AbeEDOcz (ORCPT ); Fri, 4 May 2018 10:32:55 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 75A971BB4BD; Fri, 4 May 2018 10:32:54 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail02.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id HTsaK1OlED_7; Fri, 4 May 2018 10:32:53 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 82B341BB4B7; Fri, 4 May 2018 10:32:53 -0400 (EDT) X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail02.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id tBaRXKmJwfXn; Fri, 4 May 2018 10:32:53 -0400 (EDT) Received: from mail02.efficios.com (mail02.efficios.com [167.114.142.138]) by mail.efficios.com (Postfix) with ESMTP id 64DCA1BB4B0; Fri, 4 May 2018 10:32:53 -0400 (EDT) Date: Fri, 4 May 2018 10:32:53 -0400 (EDT) From: Mathieu Desnoyers To: Linus Torvalds Cc: Andy Lutomirski , Peter Zijlstra , "Paul E. McKenney" , Boqun Feng , Dave Watson , linux-kernel , linux-api , Paul Turner , Andrew Morton , Russell King , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Andrew Hunter , Andi Kleen , Chris Lameter , Ben Maurer , rostedt , Josh Triplett , Catalin Marinas , Will Deacon , Michael Kerrisk Message-ID: <1883133260.11283.1525444373208.JavaMail.zimbra@efficios.com> In-Reply-To: <1248652824.11527.1523912317964.JavaMail.zimbra@efficios.com> References: <20180412192800.15708-1-mathieu.desnoyers@efficios.com> <542721578.11358.1523903708510.JavaMail.zimbra@efficios.com> <435471300.11403.1523906479091.JavaMail.zimbra@efficios.com> <1248652824.11527.1523912317964.JavaMail.zimbra@efficios.com> Subject: Re: [RFC PATCH for 4.18 12/23] cpu_opv: Provide cpu_opv system call (v7) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.142.138] X-Mailer: Zimbra 8.8.8_GA_2009 (ZimbraWebClient - FF52 (Linux)/8.8.8_GA_2009) Thread-Topic: cpu_opv: Provide cpu_opv system call (v7) Thread-Index: RuF2OosSEg2Wu4cnvUvTkLUnST/FrbZ7wM3c Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On Apr 16, 2018, at 4:58 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: > ----- On Apr 16, 2018, at 3:26 PM, Linus Torvalds torvalds@linux-foundation.org > wrote: > >> On Mon, Apr 16, 2018 at 12:21 PM, Mathieu Desnoyers >> wrote: >>> >>> And I try very hard to avoid being told I'm the one breaking >>> user-space. ;-) >> >> You *can't* be breaking user space. User space doesn't use this yet. >> >> That's actually why I'd like to start with the minimal set - to make >> sure we don't introduce features that will come back to bite us later. >> >> The one compelling use case I saw was a memory allocator that used >> this for getting per-CPU (vs per-thread) memory scaling. >> >> That code didn't need the cpu_opv system call at all. >> >> And if somebody does a ldload of a malloc library, and then wants to >> analyze the behavior of a program, maybe they should ldload their own >> malloc routines first? That's pretty much par for the course for those >> kinds of projects. >> >> So I'd much rather we first merge the non-contentious parts that >> actually have some numbers for "this improves performance and makes a >> nice fancy malloc possible". >> >> As it is, the cpu_opv seems to be all about theory, not about actual need. > > I fully get your point about getting the minimal feature in. So let's focus > on rseq only. > > I will rework the patchset so the rseq selftests don't depend on cpu_opv, > and remove the cpu_opv stuff. I think it would be a good start for the > Facebook guys (jemalloc), given that just rseq seems to be enough for them > for now. It should be enough for the arm64 performance counters as well. > > Then we'll figure out what is needed to make other projects use it based on > their needs (e.g. lttng-ust, liburcu, glibc malloc), and whether jemalloc > end up requiring cpu_opv for memory migration between per-cpu pools after all. So, having done this, I find myself in need of advice regarding smoothly transitioning existing user-space programs/libraries to rseq. Let's consider a situation where only rseq (without cpu_opv) eventually gets merged into 4.18. The proposed rseq implementation presents the following constraints: - Only a single rseq TLS can be registered per thread, therefore rseq needs to be "owned" by a single library (let's say it's librseq.so), - User-space rseq critical sections need to be inlined into applications and libraries for performance reasons (extra branches and calls significantly degrade performance of those fast-paths). I have a ring buffer "space reservation" use-case in my user-space tracer which requires both rseq and cpu_opv. My original plan to transition this fast-path to rseq was to test the @cpu_id field value from the rseq TLS and use a fallback based on atomic instructions if it is negative. rseq is already designed to ensure we can compare @cpu_id against @cpu_id_start and detect both migration (cpu id differs) and rseq ENOSYS with a single branch in the fast path. Once rseq gets merged and deployed into kernels, this means librseq.so will actually populate the rseq TLS, and this @cpu_id field will be >= 0. If kernels are released with rseq but without cpu_opv, then I cannot use this @cpu_id field to detect whether *both* rseq and cpu_opv are available. I see a few possible ways to handle this, none of which are particularly great: 1) Duplicate the entire implementation of the user-space functions where the rseq critical sections are inlined, and dynamically detect whether cpu_opv is available, and select the right function at runtime. If those functions are relatively small this could be acceptable, 2) Code patching based on asm goto. There is no user-space library for this at the moment AFAIK, and patching user-space code triggers COW, which is bad for TLB and cache locality, 3) Add an extra branch in the rseq fast-path. I would like to avoid this especially on arm32, where the cost of an extra branch is significant enough to outweigh the benefit of rseq compared to ll/sc. So far, only option (1) seems relatively acceptable from my perspective, but that's only because my functions using rseq are relatively small. If this code bloat is not seen as acceptable, then we should revisit merging both rseq and cpu_opv at the same time, and make sure CONFIG_RSEQ selects CONFIG_CPU_OPV. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com