Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1624574ybl; Thu, 30 Jan 2020 03:11:44 -0800 (PST) X-Google-Smtp-Source: APXvYqyOcQkd0Nfzc6AgAtKRk4lC+cUdGJIcG/3hVr0s6pkl+GTajtGLXbVJkPaE89HUZPm808KU X-Received: by 2002:a05:6830:15d2:: with SMTP id j18mr3173420otr.187.1580382704719; Thu, 30 Jan 2020 03:11:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580382704; cv=none; d=google.com; s=arc-20160816; b=JpEG7cu5w+YFN20q9IgpHuocggROkgMWsGo97i3Rnh3RaEaiOFssV5QcLQxOv6drtp T7XL1Yd/NieEG3gSP1sbYhOLA3uNKs6kiSxWimgDriBjdpR25fVabzlxicSS3yBrnyjE fIUw/aYV61b5hixbRXcXh3HEcufRzAsEoPSQS2LEsVYCxMK59FbcfI0VNCHQ64tJBsyI ynEhw6JNmiLYmIjl9BhEf8TWI3Kl+pk4U3xMyUhcnOml/C4N4OXOMwsZ30PPjtSSZqqs 5ezw2gmEQoL1XcXHyZXubKhmR+ZBbpoTYddHDqBT1ttirR9Jo2risfhvjaT1mBnAbvKi aIaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from:dkim-signature; bh=hl+74QNlq0Fe4UtuI+hqFs5mISin5uD8WN32HNCWFSs=; b=BKDmMb4SIo8B2f8WkAApVEIYX6dnIIFYvTt2Z5JcmavixABzQV3Lr6gFz4nAG7icLX Qx3X4M1vJFhR8J53evv7RRKbtw39XKTi108IhS+AFkGNriqtcAf/1ijHi0M4shRbHeNY 0XO3/01MZhOd91gP1avHqEiaJ3AnIQEpzy0jWR5dS4Qtwa8VCzogMD9V4uqRBdMjK44d I7IGEblHjTxNPV4GFd2/YdvTesuNe/dJxv+z0MX3tVe2HrfKZ0C0T1lyE+WTQCxu7U7u kBzjafe2obG6nZH1YxxvQbEWznvdwqWeDTHnnoDOWzSfWc93Lpjbs+slv97oFEO3tCYr 3P8w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EYSCd6s3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m19si2877804otq.40.2020.01.30.03.11.32; Thu, 30 Jan 2020 03:11:44 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EYSCd6s3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727112AbgA3LKS (ORCPT + 99 others); Thu, 30 Jan 2020 06:10:18 -0500 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:41274 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726959AbgA3LKS (ORCPT ); Thu, 30 Jan 2020 06:10:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1580382617; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hl+74QNlq0Fe4UtuI+hqFs5mISin5uD8WN32HNCWFSs=; b=EYSCd6s3L98gFj2AJQgNy2UG6SSaq3Yq4/Sz90Dkf3pYPH1dnOOA0pm0LFirJymjzhaCLC MDJqqClO1xVj/oKDudnMvFmgGvBICcLU8L60KkzJ0309+dBK9ic7+cjvW7nVbgs1czhmPf wt2sMQ6j8oybhl1pn5RBUVcGNiWzGV4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-35-9ALw2e1pP1yEfjlHMQxp0g-1; Thu, 30 Jan 2020 06:10:13 -0500 X-MC-Unique: 9ALw2e1pP1yEfjlHMQxp0g-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E9C91477; Thu, 30 Jan 2020 11:10:08 +0000 (UTC) Received: from oldenburg2.str.redhat.com (ovpn-116-29.ams2.redhat.com [10.36.116.29]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D71891001B05; Thu, 30 Jan 2020 11:10:01 +0000 (UTC) From: Florian Weimer To: Mathieu Desnoyers Cc: "H. Peter Anvin" , Chris Lameter , Jann Horn , Peter Zijlstra , Thomas Gleixner , linux-kernel , Joel Fernandes , Ingo Molnar , Catalin Marinas , Dave Watson , Will Deacon , shuah , Andi Kleen , linux-kselftest , Russell King , Michael Kerrisk , Paul , Paul Turner , Boqun Feng , Josh Triplett , rostedt , Ben Maurer , linux-api , Andy Lutomirski Subject: Re: [RFC PATCH v1] pin_on_cpu: Introduce thread CPU pinning system call References: <20200121160312.26545-1-mathieu.desnoyers@efficios.com> <430172781.596271.1579636021412.JavaMail.zimbra@efficios.com> <2049164886.596497.1579641536619.JavaMail.zimbra@efficios.com> <1648013936.596672.1579655468604.JavaMail.zimbra@efficios.com> <87a76efuux.fsf@oldenburg2.str.redhat.com> <134428560.600911.1580153955842.JavaMail.zimbra@efficios.com> Date: Thu, 30 Jan 2020 12:10:00 +0100 In-Reply-To: <134428560.600911.1580153955842.JavaMail.zimbra@efficios.com> (Mathieu Desnoyers's message of "Mon, 27 Jan 2020 14:39:15 -0500 (EST)") Message-ID: <87blql5hfb.fsf@oldenburg2.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Mathieu Desnoyers: > It brings an interesting idea to the table though. Let's assume for now that > the only intended use of pin_on_cpu(2) would be to allow rseq(2) critical > sections to update per-cpu data on specific cpu number targets. In fact, > considering that userspace can be preempted at any point, we still need a > mechanism to guarantee atomicity with respect to other threads running on > the same runqueue, which rseq(2) provides. Therefore, that assumption does > not appear too far-fetched. > > There are 2 scenarios we need to consider here: > > A) pin_on_cpu(2) targets a CPU which is not part of the affinity mask. > > This case is easy: pin_on_cpu can return an error, and the caller needs to act > accordingly (e.g. figure out that this is a design error and report it, or > decide that it really did not want to touch that per-cpu data that badly and > make the entire process fall-back to a mechanism which does not use per-cpu > data at all from that point onwards) Affinity masks currently are not like process memory: there is an expectation that they can be altered from outside the process. Given that the caller may not have any ways to recover from the suggested pin_on_cpu behavior, that seems problematic. What I would expect is that if pin_on_cpu cannot achieve implied exclusion by running on the associated CPU, it acquires a lock that prevents others pin_on_cpu calls from entering the critical section, and tasks in the same task group from running on that CPU (if the CPU becomes available to the task group). The second part should maintain exclusion of rseq sequences even if their fast path is not changed. (On the other hand, I'm worried that per-CPU data structures are a dead end for user space unless we get containerized affinity masks, so that contains only see resources that are actually available to them.) Thanks, Florian