Received: by 10.192.165.156 with SMTP id m28csp657300imm; Fri, 13 Apr 2018 05:48:34 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+9vXNyOgUTy6qX4TjuYV6lZuxXFZZmZeyYJ5gQ/DIC5NPL8e/Kuhe+aP33Zs9vJdLv4V8H X-Received: by 10.99.165.10 with SMTP id n10mr1109981pgf.141.1523623714232; Fri, 13 Apr 2018 05:48:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523623714; cv=none; d=google.com; s=arc-20160816; b=dIva7SaTaM5hycTBeupC2noZ4wZnPGb+FZrXVl2erJPJ83fhzakNd/7Cs1QSHhk8l9 qBErqa4aFIQu4C9FYVlBK31KGeDnmRNFcw2IkA3sg6dZuqqF3LAVUdeFm1j6VFrWZgLh 0pk2HGQOqXgqksJce6zMl7anTvp6O4rWhpIpu6xSX8gWp+LNjcMz59MfY3RdEfeBSs8L zc+j218tgrLoFwdUWXTEii1Xue53BXrfKPyCENpv+qf/GmgHuhfxX7poSqjgot6E5f1J Q19oP7ci9V0le8dmwE2POYNI/txbrDq7zYWlsHzS+6fH/bOi8NLQI4habczhHbVX8Hm3 RPGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:arc-authentication-results; bh=h+0u+aMDy/8bXs8GKBfi8FPlLj/qLNWp/G5/xlT1lpw=; b=s0u9aQysh7H207Ykb+iYmIe6Br/c5VqeGsT34s5ypscy8JTjTlPHVM9+kB6cEDV90x B0x9g5di3VILSoKOxwAzsf1yJJZ4B2vGFZSb6my8bAYbJkpc70A3l6t1QRMK/YUG9hLG krulCqEC+YD2spY4MnMIEDIEzQb7+COEH0XqNfzcCUeZhK+4SDE+Ca6GKFzIaEgi/tU3 WDNVA7iZ7gGPCW9AGlgR6Q0PI0i64DMbVs2jEyFcVPN8BcGLRllMQh54QX1cCZey2U16 74E4FiJuT3crwtwE0Y84WcgO9efDQb81sg199j1p9Lktp6dkcGMFLJJrk18byKMdIclJ bVng== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n4si4416929pfn.352.2018.04.13.05.48.20; Fri, 13 Apr 2018 05:48:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754291AbeDMMQx (ORCPT + 99 others); Fri, 13 Apr 2018 08:16:53 -0400 Received: from mail.efficios.com ([167.114.142.138]:39798 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751160AbeDMMQv (ORCPT ); Fri, 13 Apr 2018 08:16:51 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 2C2B71B0CC1; Fri, 13 Apr 2018 08:16:51 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail02.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id gTUFRMuhcIi7; Fri, 13 Apr 2018 08:16:50 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 12F401B0CB9; Fri, 13 Apr 2018 08:16:50 -0400 (EDT) X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail02.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id RIBB6R7GwdfU; Fri, 13 Apr 2018 08:16:50 -0400 (EDT) Received: from mail02.efficios.com (mail02.efficios.com [167.114.142.138]) by mail.efficios.com (Postfix) with ESMTP id E94181B0CB2; Fri, 13 Apr 2018 08:16:49 -0400 (EDT) Date: Fri, 13 Apr 2018 08:16:49 -0400 (EDT) From: Mathieu Desnoyers To: Linus Torvalds Cc: Peter Zijlstra , "Paul E. McKenney" , Boqun Feng , Andy Lutomirski , Dave Watson , linux-kernel , linux-api , Paul Turner , Andrew Morton , Russell King , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Andrew Hunter , Andi Kleen , Chris Lameter , Ben Maurer , rostedt , Josh Triplett , Catalin Marinas , Will Deacon , Michael Kerrisk Message-ID: <625160026.9658.1523621809662.JavaMail.zimbra@efficios.com> In-Reply-To: References: <20180412192800.15708-1-mathieu.desnoyers@efficios.com> <20180412192800.15708-13-mathieu.desnoyers@efficios.com> <1580648199.9463.1523563167045.JavaMail.zimbra@efficios.com> Subject: Re: [RFC PATCH for 4.18 12/23] cpu_opv: Provide cpu_opv system call (v7) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.142.138] X-Mailer: Zimbra 8.8.7_GA_1964 (ZimbraWebClient - FF52 (Linux)/8.8.7_GA_1964) Thread-Topic: cpu_opv: Provide cpu_opv system call (v7) Thread-Index: 5M5IYuysyxAgZOeC4nxxQuEZVqZf2g== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On Apr 12, 2018, at 4:07 PM, Linus Torvalds torvalds@linux-foundation.org wrote: > On Thu, Apr 12, 2018 at 12:59 PM, Mathieu Desnoyers > wrote: >> >> What are your concerns about page pinning ? > > Pretty much everything. > > It's the most complex part by far, and the vmalloc space is a limited > resource on 32-bit architectures. The vmalloc space needed by cpu_opv is bound by the number of pages a cpu_opv call can touch. On architectures with virtually aliased dcache, we also need to add a few extra pages worth of address space to account for SHMLBA alignment. So on ARM32, with SHMLBA=4 pages, this means at most 1 MB of virtual address space temporarily needed for a cpu_opv system call in the very worst case scenario: 16 ops * 2 uaddr * 8 pages per uaddr (if we're unlucky and find ourselves aligned across two SHMLBA) * 4096 bytes per page. If this amount of vmalloc space happens to be our limiting factor, we can change the max cpu_opv ops array size supported, e.g. bringing it from 16 down to 4. The largest number of operations I currently need in the cpu-opv library is 4. With 4 ops, the worse case vmalloc space used by a cpu_opv system call becomes 256 kB. > >> Do you have an alternative approach in mind ? > > Do everything in user space. I wish we could disable preemption and cpu hotplug in user-space. Unfortunately, that does not seem to be a viable solution for many technical reasons, starting with page fault handling. > > And even if you absolutely want cpu_opv at all, why not do it in the > user space *mapping* without the aliasing into kernel space? That's because cpu_opv need to execute the entire array of operations with preemption disabled, and we cannot take a page fault with preemption off. Page pinning and aliasing user-space pages in the kernel linear mapping ensure that we don't end up in trouble in page fault scenarios, such as having the pages we need to touch swapped out under our feet. > > The cpu_opv approach isn't even fast. It's *really* slow if it has to > do VM crap. > > The whole rseq thing was billed as "faster than atomics". I > *guarantee* that the cpu_opv's aren't faster than atomics. Yes, and here is the good news: cpu_opv speed does not even matter. rseq assember instruction sequences are very fast, but cannot deal with infrequent corner-cases. cpu_opv is slow, but is guaranteed to deal with the occasional corner-case situations. This is similar to pthread mutex/futex fast/slow paths. The common case is fast (rseq), and the speed of the infrequent case (cpu_opv) does not matter as long as it's used infrequently enough, which is the case here. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com