Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp1508575pxb; Wed, 2 Feb 2022 06:35:15 -0800 (PST) X-Google-Smtp-Source: ABdhPJwEkjOV6VUVEZ2kNHEZeK9AtedPhpipHk6jnQACb2/Mt4IQ4cuqR5lX/fzrb0Ho7+vsa+PL X-Received: by 2002:a17:90b:3706:: with SMTP id mg6mr8472964pjb.241.1643812514856; Wed, 02 Feb 2022 06:35:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643812514; cv=none; d=google.com; s=arc-20160816; b=BAY7UrttP/GAzeh0bOZ+pkC0/JUz247k4tEKa1XF8B4AWhhGll8GB3PIQlPjtDTeDr hSaH2+GfxzLk1C2xNGxF5Uh2Ya6Lj0pSvaaU0SVgugCb+nKhXzMzoRVMENIDCWRaHi40 LvnymU9CIHw26h7EV2Qyr2pu1JyTnXGSfatpJ7V6LrqSAxiGbVE7ZZ2fpG/Punz9+60F dZoRaCJaIwmrp8RztotLaNzp3eZZBVYukQCq1AquTD6quZIj01tqQh5lBseMk2EWHC7D o3Rh0geSH+9nejzP3eXaPWxcPSoNJecrqykupHPR+1WDlkgwCg5xi1EBNN6VQ3ygYKrZ Rhwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:dkim-signature:dkim-filter; bh=D6F7I14e09DLJ7OBXCoOb2aDWOL6Vpkl8xCgOrXzLZI=; b=R25hd68T/Z2Vy5Asq7qpunqTvCZwfanq96hHawJyb9+7pPDNFyik32g84hdKCVzOVG nnp7wKt++FubzBuC6Q2FM1ZtbNEe5rdbSMnBwCmWq4TKX6QcmG2yVaENCVxFZnY6LYQ0 1xTenEoawpreIIG9pGLHFMsf08htU4oQmBawckocmO4CJD3Y8MpJRmZy4ZMMz5Dmq8BG hOVsJ7wVdNyU5YmpRJ2bTW80SC048lwYjau7G+xk3jguwIxauRhOlRw8u2j3aHxdsKFd D9PiG7xqD9hpBwaL8c1URMslF4D92H4njmlw3A3mEBQD2A21Urwhpul4t9fqUbqhX8E5 Z89A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=TOcOopo+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 201si2275234pgb.258.2022.02.02.06.35.02; Wed, 02 Feb 2022 06:35:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=TOcOopo+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242698AbiBAVAz (ORCPT + 99 others); Tue, 1 Feb 2022 16:00:55 -0500 Received: from mail.efficios.com ([167.114.26.124]:56718 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242667AbiBAVAx (ORCPT ); Tue, 1 Feb 2022 16:00:53 -0500 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 4AF50348674; Tue, 1 Feb 2022 16:00:53 -0500 (EST) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id NDJj4GGdF_09; Tue, 1 Feb 2022 16:00:52 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id D2E3B348AA3; Tue, 1 Feb 2022 16:00:52 -0500 (EST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com D2E3B348AA3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1643749252; bh=D6F7I14e09DLJ7OBXCoOb2aDWOL6Vpkl8xCgOrXzLZI=; h=Date:From:To:Message-ID:MIME-Version; b=TOcOopo+zfIkJM1NpeSeho3ukIFoN0UcRhzMIbE4NvvWutErZvd76IW4HkG4bUZP8 Byksk4cy6M+6xk21dCLF3UspToZcfzLAxJJn5rTcShRA+xRNuqN5WJiPnwbBh2WE6g +sIBKJHQ6ls80uRC1XCH5dHT69KL1DeTk8T7QVXacZm7l4r8l1CY1s6okmlAjQwLsh i1KIaNUsgCt9LUxSkNdAy/5VLzKkbYiIgnctTk6qkrrWSRvlLpP4yqqVpT6LnogFH4 tP5GJWNjCvnU3dR+Xy2ZKsovVtaz5fcnBY/DdRTrbTpx3+xvSTA2JB8jdkqoppmd2q Tr3e8sLzfir5w== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id QS4bqrwocbFV; Tue, 1 Feb 2022 16:00:52 -0500 (EST) Received: from mail03.efficios.com (mail03.efficios.com [167.114.26.124]) by mail.efficios.com (Postfix) with ESMTP id BDAF7348766; Tue, 1 Feb 2022 16:00:52 -0500 (EST) Date: Tue, 1 Feb 2022 16:00:52 -0500 (EST) From: Mathieu Desnoyers To: Peter Oskolkov Cc: Peter Zijlstra , linux-kernel , Thomas Gleixner , paulmck , Boqun Feng , "H. Peter Anvin" , Paul Turner , linux-api , Christian Brauner , Florian Weimer , David Laight , carlos , Chris Kennelly Message-ID: <2083444900.25808.1643749252639.JavaMail.zimbra@efficios.com> In-Reply-To: References: <20220201192540.10439-1-mathieu.desnoyers@efficios.com> Subject: Re: [RFC PATCH 1/3] Introduce per thread group current virtual cpu id MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.26.124] X-Mailer: Zimbra 8.8.15_GA_4203 (ZimbraWebClient - FF96 (Linux)/8.8.15_GA_4203) Thread-Topic: Introduce per thread group current virtual cpu id Thread-Index: WUJXhv7+C6tZP3wo1UvlQkFhhpClMA== Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On Feb 1, 2022, at 2:49 PM, Peter Oskolkov posk@posk.io wrote: > On Tue, Feb 1, 2022 at 11:26 AM Mathieu Desnoyers > wrote: >> >> This feature allows the scheduler to expose a current virtual cpu id >> to user-space. This virtual cpu id is within the possible cpus range, >> and is temporarily (and uniquely) assigned while threads are actively >> running within a thread group. If a thread group has fewer threads than >> cores, or is limited to run on few cores concurrently through sched >> affinity or cgroup cpusets, the virtual cpu ids will be values close >> to 0, thus allowing efficient use of user-space memory for per-cpu >> data structures. > > Why per thread group and not per mm? The main use case is for > per-(v)cpu memory allocation logic, so it seems having this feature > per mm is more appropriate? Good point, yes, per-mm would be more appropriate. So I guess that from a userspace perspective, the rseq field could become "__u32 vm_vcpu; /* Current vcpu within memory space. */" [...] >> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h >> index b6ecb9fc4cd2..c87e7ad5a1ea 100644 >> --- a/include/linux/sched/signal.h >> +++ b/include/linux/sched/signal.h >> @@ -244,6 +244,12 @@ struct signal_struct { >> * and may have inconsistent >> * permissions. >> */ >> +#ifdef CONFIG_SCHED_THREAD_GROUP_VCPU >> + /* >> + * Mask of allocated vcpu ids within the thread group. >> + */ >> + cpumask_t vcpu_mask; > > We use a pointer for the mask (in struct mm). Adds complexity around > alloc/free, though. Just FYI. It does make sense if this is opt-in. [...] >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 2e4ae00e52d1..2690e80977b1 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -4795,6 +4795,8 @@ prepare_task_switch(struct rq *rq, struct task_struct >> *prev, >> sched_info_switch(rq, prev, next); >> perf_event_task_sched_out(prev, next); >> rseq_preempt(prev); >> + tg_vcpu_put(prev); >> + tg_vcpu_get(next); > > Doing this for all tasks on all context switches will most likely be > too expensive. We do it only for tasks that explicitly asked for this > feature during their rseq registration, and still the tight loop in > our equivalent of tg_vcpu_get() is occasionally noticeable (lots of > short wakeups can lead to the loop thrashing around). > > Again, our approach is more complicated as a result. I suspect that the overhead of tg_vcpu_get is quite small for processes which work on only few cores, but becomes noticeable when processes have many threads and are massively parallel (not affined to only a few cores). When the feature is disabled, we can always fall-back on the value returned by raw_smp_processor_id() and use that as a "vm-vcpu-id" value. Whether the vm-vcpu-id or the processor id is used needs to be a consensus across all threads from all processes using a mm at a given time. There appears to be a tradeoff here, and I wonder how this should be presented to users. A few possible options: - vm-vcpu feature is opt-in (default off) or opt-out (default on), - whether vm-vcpu is enabled for a process could be selected at runtime by the process, either at process initialization (single thread, single mm user) and/or while the process is multi-threaded (requires more synchronization), - if we find a way to move automatically between vm-vcpu-id and processor id as information source for all threads tied to a mm when we reach a number of parallel threads threshold, then I suspect we could have best of both worlds. But it's not clear to me how to achieve this. Thoughts ? Thanks, Mathieu > >> fire_sched_out_preempt_notifiers(prev, next); >> kmap_local_sched_out(); >> prepare_task(next); >> -- >> 2.17.1 -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com