Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp592478pxb; Wed, 15 Sep 2021 08:47:08 -0700 (PDT) X-Google-Smtp-Source: ABdhPJysVT1jt00CdSMlf6RqRlEuJkI5k3Yk4rFypMRMiWXyuQGSMoh+/RA41hpjbHMTIYo4YLo5 X-Received: by 2002:a05:6000:1569:: with SMTP id 9mr757326wrz.242.1631720828406; Wed, 15 Sep 2021 08:47:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631720828; cv=none; d=google.com; s=arc-20160816; b=m7TbIiuQaJK2zrU+vcH2sgBdLxOXxYWagZqBuuDiNEn72bQVHSUHIG2tP3s89yoWsv Go1oZ1VpVg9KebC9PEU2147DpgMSGUpHY5esBW/p/nHiUY/8aiDl4NPv/O32ufQjNA1g rkwDjVzGmPSVSZDSjCI7o0K4sg3tKgiK45lbUrefSpzAETMJNIcgAOJJ4ssntwzA5sUG rdq1qGnus+fjIlLCQaABPh5nQUGgKsUZeJTKk8u9S7oBxgQE5HijzKiiRZSDKGwlInMZ FpDqTEFJBmw0/cHdp/M7MiRvDCaA/Rjmm49+yxEnjnqIuMK0GIbZlqXG8l+O726CzAxW ky/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=KZRZyui0rTuRmFlZY88Do+PnDzitUQ/hl2n65++LNLI=; b=WmIpbFozI0QYMxsHa05Z28cAGPKN4rk7Blm5qoNoIde/jr1rP7bbLPtPh7be4hqI4I Yp2NRADTYlLFLfr7LxZZP1z3Q6NaYcCA+MQwSAodnxUDc3F2sPIJzDD9RAOy3EFsMMqc iRoYdy7gUVEvnF1QOmXTHGKV6ydzX6o2NqOSH07tP0O9/k9pbMITfN8Nb4v3qBqylMwT fY/XUNZtQjaQust+jmERi5Z8OyEyxxa0EqtxlhbTQBw2ggXzkI97h1fWUFolCLWB0JPv BbG7qpfDniY6KxdZ9NQqtomDTSyMFexdYvK2RrxAcRJp3mfdVCTVUZjHgOR/cB0WpCbl IWdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=go7c0iQE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i18si437208edc.171.2021.09.15.08.46.41; Wed, 15 Sep 2021 08:47:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=go7c0iQE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238147AbhIOPq3 (ORCPT + 99 others); Wed, 15 Sep 2021 11:46:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49148 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238120AbhIOPq2 (ORCPT ); Wed, 15 Sep 2021 11:46:28 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 915B4C061574; Wed, 15 Sep 2021 08:45:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=KZRZyui0rTuRmFlZY88Do+PnDzitUQ/hl2n65++LNLI=; b=go7c0iQEv4y3MwYA7gM0bl6TvR muwKCgXlwbdSd4I73ErHWxrKCcY77EnoQDKBzY2709ttwvvNGrEVg6k18YwIoyNXjwF18r1S4dSGh xo5/5vraEiIGw9gh5RAL142qYACm4KYEK/RgXxPbdreqkdYytmLEMNXHKky+gA0t1/jUtH46TmWeo CBCogObjqBTutw0SFzvlMoTSnHAGzTtjBWwj954rIuoDze3kpekGTxP112UMVtMt+Lp/f4oBnHHzJ DQjxN87Faa3TlgLQ3IAV9DCrrEMQeHBJMcq+ai2mT6/Emx0aoc5Ctc4pBHc8FcB3l/0MUrKdvfBuo NwmSWEBA==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1mQX3D-00Fnkf-K4; Wed, 15 Sep 2021 15:42:33 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 707213001C7; Wed, 15 Sep 2021 17:42:18 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 51C2B29BBD819; Wed, 15 Sep 2021 17:42:18 +0200 (CEST) Date: Wed, 15 Sep 2021 17:42:18 +0200 From: Peter Zijlstra To: Andy Lutomirski Cc: Jann Horn , Peter Oskolkov , Peter Oskolkov , Ingo Molnar , Thomas Gleixner , Linux Kernel Mailing List , Linux API , Paul Turner , Ben Segall , Andrei Vagin , Thierry Delisle Subject: Re: [PATCH 2/4 v0.5] sched/umcg: RFC: add userspace atomic helpers Message-ID: References: <20210908184905.163787-1-posk@google.com> <20210908184905.163787-3-posk@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 14, 2021 at 11:40:01AM -0700, Andy Lutomirski wrote: > > > On Tue, Sep 14, 2021, at 11:11 AM, Peter Zijlstra wrote: > > On Tue, Sep 14, 2021 at 09:52:08AM -0700, Andy Lutomirski wrote: > > > With a custom mapping, you don’t need to pin pages at all, I think. > > > As long as you can reconstruct the contents of the shared page and > > > you’re willing to do some slightly careful synchronization, you can > > > detect that the page is missing when you try to update it and skip the > > > update. The vm_ops->fault handler can repopulate the page the next > > > time it’s accessed. > > > > The point is that the moment we know we need to do this user-poke, is > > schedule(), which could be called while holding mmap_sem (it being a > > preemptable lock). Which means we cannot go and do faults. > > That’s fine. The page would be in one or two states: present and > writable by kernel or completely gone. If its present, the scheduler > writes it. If it’s gone, the scheduler skips the write and the next > fault fills it in. That's non-deterministic, and as such not suitable. > > > All that being said, I feel like I’m missing something. The point of > > > this is to send what the old M:N folks called “scheduler activations”, > > > right? Wouldn’t it be more efficient to explicitly wake something > > > blockable/pollable and write the message into a more efficient data > > > structure? Polling one page per task from userspace seems like it > > > will have inherently high latency due to the polling interval and will > > > also have very poor locality. Or am I missing something? > > > > The idea was to link the user structures together in a (single) linked > > list. The server structure gets a list of all the blocked tasks. This > > avoids having to a full N iteration (like Java, they're talking stupid > > number of N). > > > > Polling should not happen, once we run out of runnable tasks, the server > > task gets ran again and it can instantly pick up all the blocked > > notifications. > > > > How does the server task know when to read the linked list? And > what’s wrong with a ring buffer or a syscall? Same problem, ring-buffer has the case where it's full and events get dropped, at which point you've completely lost state. If it is at all possible to recover from that, doing so is non-deterministic. I really want this stuff to work for realtime workloads too.