Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp715446rdg; Wed, 11 Oct 2023 03:39:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEtwEgX9p8FtyiksaGTUtFTedsKoVcSHsHfmUU8IofX+kwbF01KUGGCqruT1grFi9Gi5rij X-Received: by 2002:a17:902:74c7:b0:1c5:6309:e1e2 with SMTP id f7-20020a17090274c700b001c56309e1e2mr21506111plt.48.1697020774802; Wed, 11 Oct 2023 03:39:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697020774; cv=none; d=google.com; s=arc-20160816; b=xQ8S6ZEK2cvivHZCvCklANskwVsUNxrOiyjCIgdQ6Cp8u7AAH+rP2YTnU05uA0EvIN D6dVOn7c5wTZJM+LtLO0567Ssf+39dbLaw4qE+RuFtuocWWhQ/AULCPk8wttFny6Zdd3 nzRR2DOtwhlwx0GlQrjG89iflOS7ryHp95pSHMYvxiBberd38zwH1RSpYfzaU9j+xFxh Xk5RJkjaVnJ0RU93mB8jKsIbiW0DM1pgJtbOsUYQdBIyxxPxX6yXALy8ZOUceGxfRRms r/O74R2TmMxD48NQf6/VZ5yZZmEJTrqoJDKKasQBmemPLj8JnxRKANh5ZRZQX9fl3WjH YR+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:organization:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=xdLogw/IfRq4uv31QYOU+sV9h78kpDXoZjH1q8TwaMA=; fh=VZlPAM1cORgGyfKrozwbTHz56uF3TgTmY4C2VTmWfKM=; b=xfl8RuTnCLXvH7yu61QkVNK6dTqw7eOSPBFjfvbts+65texCivv0V5aHfEmgxBct3q SnLI6j6gB1Pd6BQX3yMA1Sqc8mv2/0YerAH1QQJPOzzdwxAE9aXpU2xDTvquzMvqQo4q DRi1t43/7Xz5DjJxN6Pf0C3cGm/APnn2F/Qugaf/8ymndvkiLXJyL4vpTegCWPoRGviG MIEPyqoZRvlNg8/sm4WLgMK0xHlORFSQSt3HUybkfvP60GpiVg0OJHHs5M04mymg4df+ FSYa+k6ml/cPQtJN5x3Y0aF9gQgX9piG5DuyRAyEiHXrBr93Fb0DnFQn+A6cI6s/g1UO 53Ig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Q7fKpUdV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from pete.vger.email (pete.vger.email. [23.128.96.36]) by mx.google.com with ESMTPS id k9-20020a170902c40900b001bd9e2b4b46si14292473plk.601.2023.10.11.03.39.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Oct 2023 03:39:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) client-ip=23.128.96.36; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Q7fKpUdV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 446388157DEF; Wed, 11 Oct 2023 03:39:32 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346179AbjJKKjX (ORCPT + 99 others); Wed, 11 Oct 2023 06:39:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36448 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229750AbjJKKjW (ORCPT ); Wed, 11 Oct 2023 06:39:22 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6DF87C0 for ; Wed, 11 Oct 2023 03:39:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697020761; x=1728556761; h=date:from:to:cc:subject:message-id:references: mime-version:content-transfer-encoding:in-reply-to; bh=gXkJ9Rz4wh3rd3Sz4VQwFqIASu0MtsnAvP+47lbpfF8=; b=Q7fKpUdVVrFlNq/KXY2A1byxTeFvthpw12Wckr2txiLoYo1N/LZPr0VT RqZsAJmBvDg3bCAwctQKELuWYRCq2TOxbeMaEqebhmtSZM8gxHEs2Prac 2Dg/t+dZyDv5EyGwycJU1jnIf4lbkcmLjwrp3ours44rYeJZQcKXqXlfv 8J2fuMspX//VZPRTrTKfcjQ1dirFd6dvjGnHnFEyE+U/hSIhscYvXIJq3 Ppa2r27pyMaI/fOlm5AtIitDIwZq3HY7GRgt5+aSai4AOZ6vguab8pwmy aO3hkrDCYb54x8HtwyIIwNQLjx0nwseG20z8EmT8qRbROSjZD5EBu9U3A Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10859"; a="451121937" X-IronPort-AV: E=Sophos;i="6.03,214,1694761200"; d="scan'208";a="451121937" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Oct 2023 03:39:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10859"; a="1001058836" X-IronPort-AV: E=Sophos;i="6.03,214,1694761200"; d="scan'208";a="1001058836" Received: from unknown (HELO smile.fi.intel.com) ([10.237.72.54]) by fmsmga006.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Oct 2023 03:39:16 -0700 Received: from andy by smile.fi.intel.com with local (Exim 4.97-RC1) (envelope-from ) id 1qqWcT-00000004b1I-0Ivx; Wed, 11 Oct 2023 13:39:13 +0300 Date: Wed, 11 Oct 2023 13:39:12 +0300 From: Andy Shevchenko To: Ankit Jain Cc: peterz@infradead.org, yury.norov@gmail.com, linux@rasmusvillemoes.dk, qyousef@layalina.io, pjt@google.com, joshdon@google.com, bristot@redhat.com, vschneid@redhat.com, linux-kernel@vger.kernel.org, namit@vmware.com, amakhalov@vmware.com, srinidhir@vmware.com, vsirnapalli@vmware.com, vbrahmajosyula@vmware.com, akaher@vmware.com, srivatsa@csail.mit.edu Subject: Re: [PATCH RFC] cpumask: Randomly distribute the tasks within affinity mask Message-ID: References: <20231011071925.761590-1-ankitja@vmware.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20231011071925.761590-1-ankitja@vmware.com> Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo X-Spam-Status: No, score=2.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Wed, 11 Oct 2023 03:39:32 -0700 (PDT) X-Spam-Level: ** On Wed, Oct 11, 2023 at 12:49:25PM +0530, Ankit Jain wrote: > commit 46a87b3851f0 ("sched/core: Distribute tasks within affinity masks") > and commit 14e292f8d453 ("sched,rt: Use cpumask_any*_distribute()") > introduced the logic to distribute the tasks at initial wakeup on cpus > where load balancing works poorly or disabled at all (isolated cpus). > > There are cases in which the distribution of tasks > that are spawned on isolcpus does not happen properly. > In production deployment, initial wakeup of tasks spawn from > housekeeping cpus to isolcpus[nohz_full cpu] happens on first cpu > within isolcpus range instead of distributed across isolcpus. > > Usage of distribute_cpu_mask_prev from one processes group, > will clobber previous value of another or other groups and vice-versa. > > When housekeeping cpus spawn multiple child tasks to wakeup on > isolcpus[nohz_full cpu], using cpusets.cpus/sched_setaffinity(), > distribution is currently performed based on per-cpu > distribute_cpu_mask_prev counter. > At the same time, on housekeeping cpus there are percpu > bounded timers interrupt/rcu threads and other system/user tasks > would be running with affinity as housekeeping cpus. In a real-life > environment, housekeeping cpus are much fewer and are too much loaded. > So, distribute_cpu_mask_prev value from these tasks impacts > the offset value for the tasks spawning to wakeup on isolcpus and > thus most of the tasks end up waking up on first cpu within the > isolcpus set. > > Steps to reproduce: > Kernel cmdline parameters: > isolcpus=2-5 skew_tick=1 nohz=on nohz_full=2-5 > rcu_nocbs=2-5 rcu_nocb_poll idle=poll irqaffinity=0-1 > > * pid=$(echo $$) > * taskset -pc 0 $pid > * cat loop-normal.c > int main(void) > { > while (1) > ; > return 0; > } > * gcc -o loop-normal loop-normal.c > * for i in {1..50}; do ./loop-normal & done > * pids=$(ps -a | grep loop-normal | cut -d' ' -f5) > * for i in $pids; do taskset -pc 2-5 $i ; done > > Expected output: > * All 50 “loop-normal” tasks should wake up on cpu2-5 > equally distributed. > * ps -eLo cpuid,pid,tid,ppid,cls,psr,cls,cmd | grep "^ [2345]" > > Actual output: > * All 50 “loop-normal” tasks got woken up on cpu2 only > > Analysis: > There are percpu bounded timer interrupt/rcu threads activities > going on every few microseconds on housekeeping cpus, exercising > find_lowest_rq() -> cpumask_any_and_distribute()/cpumask_any_distribute() > So, per cpu variable distribute_cpu_mask_prev for housekeeping cpus > keep on getting set to housekeeping cpus. Bash/docker processes > are sharing same per cpu variable as they run on housekeeping cpus. > Thus intersection of clobbered distribute_cpu_mask_prev and > new mask(isolcpus) return always first cpu within the new mask(isolcpus) > in accordance to the logic mentioned in commits above. > > Fix the issue by using random cores out of the applicable CPU set > instead of relying on distribute_cpu_mask_prev. > Fixes: 46a87b3851f0 ("sched/core: Distribute tasks within affinity masks") > Fixes: 14e292f8d453 ("sched,rt: Use cpumask_any*_distribute()") > Blank lines are not allowed in the tag block. > Signed-off-by: Ankit Jain ... > +/** > + * Returns an arbitrary cpu within srcp. > + * > + * Iterated calls using the same srcp will be randomly distributed > + */ This is invalid. Always run scripts/kernel-doc -v -none -Wall ... against the file of interest and fix all warnings and errors reported. -- With Best Regards, Andy Shevchenko