Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp752155rdg; Wed, 11 Oct 2023 04:47:13 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFaSubyuW8XDyPrSWEmjRk8NEfLFwTtMr6nO1r7uRqh5YB0IYeu5/wagClCXLFJrbZ/ZTog X-Received: by 2002:a05:6830:6b94:b0:6c6:1e39:e5c6 with SMTP id dd20-20020a0568306b9400b006c61e39e5c6mr18181724otb.36.1697024832800; Wed, 11 Oct 2023 04:47:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697024832; cv=none; d=google.com; s=arc-20160816; b=RS1akAGo4iLXZ7x8G465cD5avjWiLmXGk8P6pWKY2RllNMJq9esRz5pVlBi377HkJ0 hnFjN82oEc23ZLYgvg0shTw5ASRzwtnwrVQNkYjZmzUfjzBvbASBbSQ2eOAPCWRbR0Dw sgndGLS/1eEGJ3rNs3EHYcmwnRDzuzzoLVvlscPst2WNlnn1lY4iLB4Jiz6cD3ETkeBa E1O8vxMuG+rK9VZRW1jQ5qLV9nBcy+pvyCnGyt/ZBNbsyUSNf6289PgoyW05NqoVc+48 3mLckIxVXoI/DeexTSu44xSnebfcVdpKoxRVfOzrW1VWM3OnWXnx7P/Wb9WOjdFcy/O5 vLiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=RSsV3tzE81yhMrJ2OAEkPBSGqOhOdmB831LXDp5PvM8=; fh=F9Nt5KJh76xH0xf5a73QvVQiS/n7ik9i7XiPLIFkQ3E=; b=zJL/SyIw1tF6F6KyHnMB6f2KJ1K49niGoy82IVB2yrrUbuWkgODzvhag4JUranwyJa Pda2bNEykAu/+NCi1U31HEitIw9P/JRWJ2PAcKXT3KCOKmIvLocqSaxHMxQIkvA1SkxB QUTvCjhA5dhGBUazRshqCCrEd07Pe7SjB3m5ip2DktGkN9LXGfIX6jsuirqhpQB6fh8N h4tC1iQYP9x19vuN1p21gNtZOnyRrPy8tkm5BBhBX6aWU8Pvp6M+NJT3k5DjrTpEy3ZL CWmiJ1kDFclOttKzUCUNEfqHa/sl8ijfuGfOnJBgcu6+82DdAnGrCCNpTwh52bXMZX80 VPzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=D6O30peo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id v11-20020a63f20b000000b0057748a05fcfsi13938396pgh.27.2023.10.11.04.47.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Oct 2023 04:47:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=D6O30peo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id DD98181BDF0F; Wed, 11 Oct 2023 04:47:11 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234723AbjJKLrJ (ORCPT + 99 others); Wed, 11 Oct 2023 07:47:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49592 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231601AbjJKLrI (ORCPT ); Wed, 11 Oct 2023 07:47:08 -0400 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B59C18F for ; Wed, 11 Oct 2023 04:47:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=RSsV3tzE81yhMrJ2OAEkPBSGqOhOdmB831LXDp5PvM8=; b=D6O30peoybOa+iTaf2UQhdZU0s kpT3cNMQkH2f5dT8iFXvz7/4WP4IAfW+lJe8X38+4NOkKSz9wXr7WTSa+fxGQzUslfvJAhtpwYhOU +yhWzE6xPEPeFQieevpx8CZ6ndWYWvw54QGbp52lKRFbp5izL/h1m0mTFrM9UkAFtlIlA7ha9MoMS vBMkaBbHH8QUrVWJHye6n5nvlSp2N0P/RhjK6aWuu9pXEa6oAVKOaUVHIQUjLL8S+agrVT1guRgW3 naz9KZ7wsUwiBDz9XhegVDDSTgenVsITNszY7p59JdrswuJaiAsf9d0W8jHocBu1dSnUmlb6g/Upn S9eK0L5A==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1qqXfl-000BmS-0e; Wed, 11 Oct 2023 11:46:42 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 2273130036C; Wed, 11 Oct 2023 13:46:42 +0200 (CEST) Date: Wed, 11 Oct 2023 13:46:42 +0200 From: Peter Zijlstra To: Ankit Jain Cc: yury.norov@gmail.com, andriy.shevchenko@linux.intel.com, linux@rasmusvillemoes.dk, qyousef@layalina.io, pjt@google.com, joshdon@google.com, bristot@redhat.com, vschneid@redhat.com, linux-kernel@vger.kernel.org, namit@vmware.com, amakhalov@vmware.com, srinidhir@vmware.com, vsirnapalli@vmware.com, vbrahmajosyula@vmware.com, akaher@vmware.com, srivatsa@csail.mit.edu Subject: Re: [PATCH RFC] cpumask: Randomly distribute the tasks within affinity mask Message-ID: <20231011114642.GA36521@noisy.programming.kicks-ass.net> References: <20231011071925.761590-1-ankitja@vmware.com> <20231011105329.GA17066@noisy.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20231011105329.GA17066@noisy.programming.kicks-ass.net> X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 11 Oct 2023 04:47:12 -0700 (PDT) On Wed, Oct 11, 2023 at 12:53:29PM +0200, Peter Zijlstra wrote: > On Wed, Oct 11, 2023 at 12:49:25PM +0530, Ankit Jain wrote: > > commit 46a87b3851f0 ("sched/core: Distribute tasks within affinity masks") > > and commit 14e292f8d453 ("sched,rt: Use cpumask_any*_distribute()") > > introduced the logic to distribute the tasks at initial wakeup on cpus > > where load balancing works poorly or disabled at all (isolated cpus). > > > > There are cases in which the distribution of tasks > > that are spawned on isolcpus does not happen properly. > > In production deployment, initial wakeup of tasks spawn from > > housekeeping cpus to isolcpus[nohz_full cpu] happens on first cpu > > within isolcpus range instead of distributed across isolcpus. > > > > Usage of distribute_cpu_mask_prev from one processes group, > > will clobber previous value of another or other groups and vice-versa. > > > > When housekeeping cpus spawn multiple child tasks to wakeup on > > isolcpus[nohz_full cpu], using cpusets.cpus/sched_setaffinity(), > > distribution is currently performed based on per-cpu > > distribute_cpu_mask_prev counter. > > At the same time, on housekeeping cpus there are percpu > > bounded timers interrupt/rcu threads and other system/user tasks > > would be running with affinity as housekeeping cpus. In a real-life > > environment, housekeeping cpus are much fewer and are too much loaded. > > So, distribute_cpu_mask_prev value from these tasks impacts > > the offset value for the tasks spawning to wakeup on isolcpus and > > thus most of the tasks end up waking up on first cpu within the > > isolcpus set. > > > > Steps to reproduce: > > Kernel cmdline parameters: > > isolcpus=2-5 skew_tick=1 nohz=on nohz_full=2-5 > > rcu_nocbs=2-5 rcu_nocb_poll idle=poll irqaffinity=0-1 > > > > * pid=$(echo $$) > > * taskset -pc 0 $pid > > * cat loop-normal.c > > int main(void) > > { > > while (1) > > ; > > return 0; > > } > > * gcc -o loop-normal loop-normal.c > > * for i in {1..50}; do ./loop-normal & done > > * pids=$(ps -a | grep loop-normal | cut -d' ' -f5) > > * for i in $pids; do taskset -pc 2-5 $i ; done > > > > Expected output: > > * All 50 “loop-normal” tasks should wake up on cpu2-5 > > equally distributed. > > * ps -eLo cpuid,pid,tid,ppid,cls,psr,cls,cmd | grep "^ [2345]" > > > > Actual output: > > * All 50 “loop-normal” tasks got woken up on cpu2 only > > Your expectation is wrong. Things work as advertised. That is, isolcpus results in single CPU balance domains and as such we must not distribute -- there is no load balancing. Ideally we'd reject setting cpumasks with multi bits set on domains like that, but alas, that would break historical behaviour :/ Now, looking at the code, I don't think the current code actually behaves correct in this case :-(, somewhere along the line we should truncate cpu_valid_mask to a single bit. Let me see where the sane place is to do that.