Received: by 2002:a05:6a10:d5a5:0:0:0:0 with SMTP id gn37csp571922pxb; Thu, 30 Sep 2021 12:07:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwtenB88VTwyykPm9LJvN8Cy88Dr97PfM9TALBhoMFGmHir1clJmZC+pd8NdtM2baKzH+0h X-Received: by 2002:a17:90a:19d2:: with SMTP id 18mr8164600pjj.162.1633028831783; Thu, 30 Sep 2021 12:07:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633028831; cv=none; d=google.com; s=arc-20160816; b=R1iHt8ezv6vuKkYolNKTMnL/DDd1lBulusOY/mLT00nw86hVpARl6hB58e0a/9Ae8Y v+qHOmbz+dSQKz9W8SF47guQs6wjojXOKqs2o9q8qmytVKFGTkg9/4OFRxw9P9Z+Aa8d kDGKUOLyYAYSARMVfl3dzsko24SH4bXDdhoBFmDdtenJIZ1aoG1IQyN54Qfdc37PqYm9 V1PGPQxmkfezlD1P6XDKiN2erHZJ70pL/mre7JWqIwJU10xz2PetnPxce611V0m/Cf0j S7q3SC6N0eu1AYe6ZWvAL3ToxnAFGdGs6219pTNW1PrpwtKUSbT97G9aR5Ofuk5qMmZD 4nVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:subject:cc:to:from :date:references:in-reply-to:message-id:mime-version:user-agent :dkim-signature; bh=mfgMcMQ6DsE+L3OAeGMUwtjNmO5reOcy5bPPU4UBb8k=; b=bqWHGCN4SAaD6lFcjjiucLg+KgAhgSm5Xm9/eMsJDHeSdyejm9+J2+AnPUGBSfXlGT Cl27GtK0W3QFcIssn3hAk+I8cxIXqvf5rjqV+axQlVh6qfqa/QRhCqOs7rE/UMBmXy20 C2jFgGNzwONJkkeBPzRNZelfxj3YBRtZ6wB18boXCWvET2j/5Nu0kXrtpAzUGo8vmE32 J+GbumXBOMkc4k5JQgpzDzMf0p3zZ7g2WujcbpDDWQGAVbm1opeqJqK2zgrvx7ZL6Afm aZZGVjdWioH1PVtAJ1mPeWauh0YInpsxArAQRCqSPOFu2a64VZO7iG3/Obs86BAcx1sC iSwg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=hC8Ng3dl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a130si5328621pfd.233.2021.09.30.12.06.56; Thu, 30 Sep 2021 12:07:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=hC8Ng3dl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353146AbhI3SKo (ORCPT + 99 others); Thu, 30 Sep 2021 14:10:44 -0400 Received: from mail.kernel.org ([198.145.29.99]:54640 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353049AbhI3SKn (ORCPT ); Thu, 30 Sep 2021 14:10:43 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 07DE461164; Thu, 30 Sep 2021 18:08:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1633025340; bh=UPHwmzSJnwSsQoJLImFPmf4NodVlf1O6Qw5Hs01UYqQ=; h=In-Reply-To:References:Date:From:To:Cc:Subject:From; b=hC8Ng3dlwvmrP31QVc+9GLMXg7wS1lgS0k+KPnsGJPlEy+F0NYiocgYWcKJBA4ZkY eF3jc5FwTNBegmccHIcdJj3ENLMr4bMFvrbkQgAhdIZEwhJnGedkOOlvCOCX111lnD 8OV/MfIKFNfzSEz8VzMv8VcdI4GZyX7RcPz5LUCvK7CpQrRfOfkHEy7GFFjPelIKdk Zaeo1zwltgL4+Ajcv2m7Bg0m7BpblzJwBPB662ViK2whPCkEbfL2+da5T/liGAcfh6 XxCCgd56QqAjORyDFa4gNeiTnztdZXxnjnund73gwlfS/6O11Jjsafqhd80lrol2CB Ld2WqWgPh1TrQ== Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailauth.nyi.internal (Postfix) with ESMTP id 20DA227C0069; Thu, 30 Sep 2021 14:08:58 -0400 (EDT) Received: from imap48 ([10.202.2.98]) by compute6.internal (MEProxy); Thu, 30 Sep 2021 14:08:58 -0400 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrudekgedguddulecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefofgggkfgjfhffhffvufgtgfesthhqredtreerjeenucfhrhhomhepfdet nhguhicunfhuthhomhhirhhskhhifdcuoehluhhtoheskhgvrhhnvghlrdhorhhgqeenuc ggtffrrghtthgvrhhnpedvleehjeejvefhuddtgeegffdtjedtffegveethedvgfejieev ieeufeevuedvteenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfh hrohhmpegrnhguhidomhgvshhmthhprghuthhhphgvrhhsohhnrghlihhthidqudduiedu keehieefvddqvdeifeduieeitdekqdhluhhtoheppehkvghrnhgvlhdrohhrgheslhhinh hugidrlhhuthhordhush X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id BA6F621E0063; Thu, 30 Sep 2021 14:08:56 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.5.0-alpha0-1322-g921842b88a-fm-20210929.001-g921842b8 Mime-Version: 1.0 Message-Id: In-Reply-To: References: <20210913200132.3396598-1-sohil.mehta@intel.com> <20210913200132.3396598-12-sohil.mehta@intel.com> Date: Thu, 30 Sep 2021 11:08:35 -0700 From: "Andy Lutomirski" To: "Sohil Mehta" , "the arch/x86 maintainers" Cc: "Tony Luck" , "Dave Hansen" , "Thomas Gleixner" , "Ingo Molnar" , "Borislav Petkov" , "H. Peter Anvin" , "Jens Axboe" , "Christian Brauner" , "Peter Zijlstra (Intel)" , "Shuah Khan" , "Arnd Bergmann" , "Jonathan Corbet" , "Raj Ashok" , "Jacob Pan" , "Gayatri Kammela" , "Zeng Guang" , "Williams, Dan J" , "Randy E Witt" , "Shankar, Ravi V" , "Ramesh Thomas" , "Linux API" , linux-arch@vger.kernel.org, "Linux Kernel Mailing List" , linux-kselftest@vger.kernel.org Subject: Re: [RFC PATCH 11/13] x86/uintr: Introduce uintr_wait() syscall Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 28, 2021, at 9:56 PM, Sohil Mehta wrote: > On 9/28/2021 8:30 PM, Andy Lutomirski wrote: >> On Mon, Sep 13, 2021, at 1:01 PM, Sohil Mehta wrote: >>> Add a new system call to allow applications to block in the kernel a= nd >>> wait for user interrupts. >>> >> ... >> >>> When the application makes this syscall the notification vector is >>> switched to a new kernel vector. Any new SENDUIPI will invoke the ke= rnel >>> interrupt which is then used to wake up the process. >> Any new SENDUIPI that happens to hit the target CPU's ucode at a time= when the kernel vector is enabled will deliver the interrupt. Any new = SENDUIPI that happens to hit the target CPU's ucode at a time when a dif= ferent UIPI-using task is running will *not* deliver the interrupt, unle= ss I'm missing some magic. Which means that wakeups will be missed, whi= ch I think makes this whole idea a nonstarter. >> >> Am I missing something? > > > The current kernel implementation reserves 2 notification vectors (NV)=20 > for the 2 states of a thread (running vs blocked). > > NV-1 =E2=80=93 used only for tasks that are running. (results in a use= r=20 > interrupt or a spurious kernel interrupt) > > NV-2 =E2=80=93 used only for a tasks that are blocked in the kernel. (= always=20 > results in a kernel interrupt) > > The UPID.UINV bits are switched between NV-1 and NV-2 based on the sta= te=20 > of the task. Aha, cute. So NV-1 is only sent if the target is directly paying attent= ion and, assuming all the atomics are done right, NV-2 will be sent for = tasks that are asleep. Logically, I think these are the possible states for a receiving task: 1. Running. SENDUIPI will actually deliver the event directly (or not i= f uintr is masked). If the task just stopped running and the atomics ar= e right, then the schedule-out code can, I think, notice. 2. Not running, but either runnable or not currently waiting for uintr (= e.g. blocked in an unrelated syscall). This is straightforward -- no IP= I or other action is needed other than setting the uintr-pending bit. 3. Blocked and waiting for uintr. For this to work right, anyone trying= to send with SENDUIPI (or maybe a vdso or similar clever wrapper around= it) needs to result in either a fault or an IPI so the kernel can proce= ss the wakeup. (Note that, depending on how fancy we get with file descriptors and poll= ing, we need to watch out for the running-and-also-waiting-for-kernel-no= tification state. That one will never work right.) 3 is the nasty case, and your patch makes it work with this NV-2 trick. = The trick is a bit gross for a couple reasons. First, it conveys no us= eful information to the kernel except that an unknown task did SENDUIPI = and maybe that the target was most recently on a given CPU. So a big li= st search is needed. Also, it hits an essentially arbitrary and possibl= y completely innocent victim CPU and task, and people doing any sort of = task isolation workload will strongly dislike this. For some of those u= sers, "strongly" may mean "treat system as completely failed, fail over = to something else and call expensive tech support." So we can't do that. I think we have three choices: Use a fancy wrapper around SENDUIPI. This is probably a bad idea. Treat the NV-2 as a real interrupt and honor affinity settings. This wi= ll be annoying and slow, I think, if it's even workable at all. Handle this case with faults instead of interrupts. We could set a rese= rved bit in UPID so that SENDUIPI results in #GP, decode it, and process= it. This puts the onus on the actual task causing trouble, which is ni= ce, and it lets us find the UPID and target directly instead of walking = all of them. I don't know how well it would play with hypothetical futu= re hardware-initiated uintrs, though.