Received: by 2002:a05:6a10:eb17:0:0:0:0 with SMTP id hx23csp1674105pxb; Fri, 10 Sep 2021 11:02:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzwe/MLHEfv6q8W0fFbgyg7mKACRFP5QWkeQarOZfDJ48Q24qDt4yznE3iaidqt6zTijLt6 X-Received: by 2002:a02:7355:: with SMTP id a21mr5423678jae.53.1631296931381; Fri, 10 Sep 2021 11:02:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631296931; cv=none; d=google.com; s=arc-20160816; b=b/k5a4YtQFARzRDgAzn5tdnzabGb9Y4iodJhGteLCTEZjrpXXr6zjHU86mi/AdX62L GWZnnVzA5wXYoxQx1Cl7mRz4zoevFAnb8CcSDtdTh6/Q8FfsjZIix913UwL0WqPZ23yp 3JT2bJzvOs0psKPfFqd0LfamkdPmW3w/xH26ka0Vp0m/rb9JBbyI1EHf44LJUfEALm30 5lhAAww1WmYxY9Fw4steHmNdpYMYdslg7lfSSjtObeQrNkfzaFklOkqqnYdHHH1p67H0 TS0+S5J4vpFJZ6ZlOfGpj2BCiuw8uvJiIQ43ymNZuRtnU+mag8RcV9OFiqU1rr2/jGmr YQhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=D6eGw9e5BUY6TAP4EchlrFgfo6PF8pNmzyrAzV5l6+U=; b=NGD/0I1cOA0k4OYlyQ7uJ54wqZynxildAJRysvtSc98AQaY9ClY5086i1V3t1NYCAD qlW5gLSxnhymyBH+b4lMFNSO5JU6vla7hUAoJWwBWXXznXvkFLW+ge1YP0PZ/TmCc9wM VV88Y2BhotY+3+EJ1p+H4+wGrACG1ocHkgorV0FWxzgKjAOC6GTYoLLVal/qIoCvP9vN M6lzHlP5ROuCeN+L932kySwviNticv1KteThoyagzxTZn3mTtzFtpJV9nax/u1IbyFQT 04CoFg2sg4eFfRExP3/e7ax142TWlLKb/ldwoVsCKnI4qStawPob07Y/EciC5iohY6E0 1Beg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=WuAsSyO1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i11si5931135ilb.173.2021.09.10.11.01.57; Fri, 10 Sep 2021 11:02:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=WuAsSyO1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229660AbhIJSCW (ORCPT + 99 others); Fri, 10 Sep 2021 14:02:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48890 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229523AbhIJSCW (ORCPT ); Fri, 10 Sep 2021 14:02:22 -0400 Received: from mail-lj1-x236.google.com (mail-lj1-x236.google.com [IPv6:2a00:1450:4864:20::236]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 08A45C061756 for ; Fri, 10 Sep 2021 11:01:10 -0700 (PDT) Received: by mail-lj1-x236.google.com with SMTP id f2so4535825ljn.1 for ; Fri, 10 Sep 2021 11:01:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=D6eGw9e5BUY6TAP4EchlrFgfo6PF8pNmzyrAzV5l6+U=; b=WuAsSyO1ShPcQmScIPb9jbonQleyWAwshCderojlg7vaVKUngxHwb/h5Ny+Be5yHmQ 25PBV/STt9pJRexj3pGEC3gdKiAp6tkWm9wOvj0AlsP7cdz7R127eIeIPeT788sWGUwO iyeQAiFFKupCNvaVejE4zkeH5CkVb9VDK89GeVwWkcIL0cATiN4lja+WOkBUmCMo7Ab4 cyZYgEv2BPRGB0ouJE9Og/z9wFKCaO14FEvRTCtS2MYCicn8jd+KO4cFknbJ9rgydD5f Za6LcFooAgE3rXeHy6V5cEo0plRNTBWvSMxInnTdffNn18lZuHPurlHU0oIIbsGCZBCl mt9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=D6eGw9e5BUY6TAP4EchlrFgfo6PF8pNmzyrAzV5l6+U=; b=FoM5IURjw0/QAKq9+6FCrSlwgIBovoZpdi34qH5hYs4WIFtp56DewMAfFXOrZaJ/iV apidcbThy/2CRHNUmUbHygZmUInPOXisr4sBEF9+nMoI0WCcRy4jGvubGllakH5UTV9h 05u3tOX0M3VIUXp6XzIIiIYqonlp+634SjZNXS5FBIneYlm/0VNOOSJHKh43k7NikKlf 8/l+n77y3hXaxRFqqB81bYg1Q4AzaRDxZnDcMSGxr1IGzH/JdPHq+iTGH/YsujPQIXX0 6bqV9A8t2cgW5cpAsdmidnXo7BH5nz4SFJO9ebFYqP8a87e46wqZnMxpjro0Q33ZJsDW F3yg== X-Gm-Message-State: AOAM531/JKltFdLle1d31FlDUB1m76VvomcHWJ3eqLVu6CwaN0gzJKwB cy2ZzagSA4ICFkjvHyoqXRwCELOlVYYmKOUIhkdfXQ== X-Received: by 2002:a2e:2e16:: with SMTP id u22mr5279625lju.12.1631296867505; Fri, 10 Sep 2021 11:01:07 -0700 (PDT) MIME-Version: 1.0 References: <1631147036-13597-1-git-send-email-prakash.sangappa@oracle.com> <8735qcgzdu.fsf@oldenburg.str.redhat.com> <1297400717.15316.1631295199656.JavaMail.zimbra@efficios.com> <872090791.15342.1631296555821.JavaMail.zimbra@efficios.com> In-Reply-To: <872090791.15342.1631296555821.JavaMail.zimbra@efficios.com> From: Peter Oskolkov Date: Fri, 10 Sep 2021 11:00:56 -0700 Message-ID: Subject: Re: [RESEND RFC PATCH 0/3] Provide fast access to thread specific data To: Mathieu Desnoyers Cc: Florian Weimer , Prakash Sangappa , linux-kernel , linux-api , Ingo Molnar , Paul Turner , Jann Horn , Peter Oskolkov , Vincenzo Frascino , Peter Zijlstra Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 10, 2021 at 10:55 AM Mathieu Desnoyers wrote: > > ----- On Sep 10, 2021, at 1:48 PM, Peter Oskolkov posk@google.com wrote: > > > On Fri, Sep 10, 2021 at 10:33 AM Mathieu Desnoyers > > wrote: > >> > >> ----- On Sep 10, 2021, at 12:37 PM, Florian Weimer fweimer@redhat.com wrote: > >> > >> > * Peter Oskolkov: > >> > > >> >> In short, due to the need to read/write to the userspace from > >> >> non-sleepable contexts in the kernel it seems that we need to have some > >> >> form of per task/thread kernel/userspace shared memory that is pinned, > >> >> similar to what your sys_task_getshared does. > >> > > >> > In glibc, we'd also like to have this for PID and TID. Eventually, > >> > rt_sigprocmask without kernel roundtrip in most cases would be very nice > >> > as well. For performance and simplicity in userspace, it would be best > >> > if the memory region could be at the same offset from the TCB for all > >> > threads. > >> > > >> > For KTLS, the idea was that the auxiliary vector would contain size and > >> > alignment of the KTLS. Userspace would reserve that memory, register it > >> > with the kernel like rseq (or the robust list pointers), and pass its > >> > address to the vDSO functions that need them. The last part ensures > >> > that the vDSO functions do not need non-global data to determine the > >> > offset from the TCB. Registration is still needed for the caches. > >> > > >> > I think previous discussions (in the KTLS and rseq context) did not have > >> > the pinning constraint. > >> > >> If this data is per-thread, and read from user-space, why is it relevant > >> to update this data from non-sleepable kernel context rather than update it as > >> needed on return-to-userspace ? When returning to userspace, sleeping due to a > >> page fault is entirely acceptable. This is what we currently do for rseq. > >> > >> In short, the data could be accessible from the task struct. Flags in the > >> task struct can let return-to-userspace know that it has outdated ktls > >> data. So before returning to userspace, the kernel can copy the relevant data > >> from the task struct to the shared memory area, without requiring any pinning. > >> > >> What am I missing ? > > > > I can't speak about other use cases, but in the context of userspace > > scheduling, the information that a task has blocked in the kernel and > > is going to be removed from its runqueue cannot wait to be delivered > > to the userspace until the task wakes up, as the userspace scheduler > > needs to know of the even when it happened so that it can schedule > > another task in place of the blocked one. See the discussion here: > > > > https://lore.kernel.org/lkml/CAG48ez0mgCXpXnqAUsa0TcFBPjrid-74Gj=xG8HZqj2n+OPoKw@mail.gmail.com/ > > OK, just to confirm my understanding, so the use-case here is per-thread > state which can be read by other threads (in this case the userspace scheduler) ? Yes, exactly! And sometimes these other threads have to read/write the state while they are themselves in preempt_disabled regions in the kernel. There could be a way to do that asynchronously (e.g. via workpools), but this will add latency and complexity. > > Thanks, > > Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com