Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp2842162rdh; Wed, 27 Sep 2023 14:42:27 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFxWLxQh4yYoLpWtX6flUA+B8ukcbwJAAOtGXf+47n6nFyOpMwFkDTRuxKfm6hI+oyvAJeK X-Received: by 2002:a05:6a20:7483:b0:15b:b83c:9b48 with SMTP id p3-20020a056a20748300b0015bb83c9b48mr10426480pzd.24.1695850946945; Wed, 27 Sep 2023 14:42:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695850946; cv=none; d=google.com; s=arc-20160816; b=FaTCcjAJV6la2DX0uLRcT0eAkDbu7/gjqcJnnVbOej34xYrnn1vjvBri0aiN2sEN9i H+4xcEOs7LGr7DhGvNx9WUncJzFjzouzQhfBP2YDC2+gKukvJ6uVNIdQVVva/1zFQZWP AK/zyiaRi6JKnzct0Ho+iG+BGkrSD3ce1A13ULIojwwilQ8MuPcT05Jm0zSnWdKJj4GE 04s+GfAgnut1fz+OcZLcZeG3fcaGMH03+NKUM5qvo1nCVQDRrBTxbTgg1osBGtInNokx OXaYM/LMk0/niZYGzLZ9poeka6sGGYdOomlUaoHfgkpjCXammm6pHI7FF/6HWb2kwuA8 CKIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:cc:to:from :subject:message-id:dkim-signature; bh=rOBHSAwOGIL00FslK2nWvSAIuvXGw3594uJzfDvshfg=; fh=HWe+s1ygQpG8ubc6moRryXazFytIDmEeS8olJHlK/lI=; b=wy+8zZ8QgPa0rG9RbEFKV9eXWGtIXIjyn7s4bTrTyYzloYOiQlZizFOVSij0lJLKTM Dvjp3tBWmeadbCClO7l3MeW6H1h647A9WN9tMQlhEn6DAbL4jR2j10z+5kGI+2HVZ2FZ 3o2jrEEkICJ/ahVvPq40O+7b44PKaBGnLF7z78cvkGuG6XwiBYkLwg2xcmFWavFSkthx kO707bJ5PYPBWnaaGxJTK/bR5lm4W0zz+fasn26xwgTvaVIpBYDI3aKDOTv17/pSDWom s3ELeqGGStf5C6gI1Kt+lDV+LmEKHTBwsRpLmF+2yFqtlxt3mifKqrLKBLVGAg/Q4281 E7Lg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=SN4viWkx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id k189-20020a6384c6000000b0057761dd9c56si16707450pgd.322.2023.09.27.14.42.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Sep 2023 14:42:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=SN4viWkx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id A720A80A52BE; Wed, 27 Sep 2023 14:36:33 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229882AbjI0VgR (ORCPT + 99 others); Wed, 27 Sep 2023 17:36:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34228 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229458AbjI0VgQ (ORCPT ); Wed, 27 Sep 2023 17:36:16 -0400 X-Greylist: delayed 61 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Wed, 27 Sep 2023 14:36:14 PDT Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A29ABD6 for ; Wed, 27 Sep 2023 14:36:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695850575; x=1727386575; h=message-id:subject:from:to:cc:date:in-reply-to: references:content-transfer-encoding:mime-version; bh=D7qupsIENV9dgBtEuHbESk+G11qlO3bb6Yg4ZH5XZ0M=; b=SN4viWkxobwwybyQUznVHtlelUjAPCYqGlPu80XJJ4RDkwQLUBw8XN4u nsAjWfeLkr1F/LwOrU28ZvoYxfO6FQu32EMfz0NXW/XFzpR4qkaDrrXz1 DNCt7jM46cvjslcZQa089Ju19tkfZNFtlOytVT1GHsBZZYizGFD8eQkFC ltj0HnaeKjgzCDFBvB5yXT4gY7SrUcf9Bpv5eohuiCnFCHgcCaOdJCM0g hoRf6kxEKCAdn7nsMYpPmrJHHJGKOdyGhZ09+LXvUXKW0ajeYddNXtH43 4U9rFJpsJBNDv1xRD7E5hIEPLitcb/NIgwN5E2VxYg8JVtLiLs/O//2a5 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10846"; a="3477421" X-IronPort-AV: E=Sophos;i="6.03,182,1694761200"; d="scan'208";a="3477421" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Sep 2023 14:35:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10846"; a="922964462" X-IronPort-AV: E=Sophos;i="6.03,182,1694761200"; d="scan'208";a="922964462" Received: from vchippa-mobl.amr.corp.intel.com (HELO [10.212.190.215]) ([10.212.190.215]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Sep 2023 14:35:12 -0700 Message-ID: <01ab74767686f5a33a579b18a83392e92c312b93.camel@linux.intel.com> Subject: Re: [PATCH 0/2] Introduce SIS_CACHE to choose previous CPU during task wakeup From: Tim Chen To: Ingo Molnar , Chen Yu Cc: Peter Zijlstra , Mathieu Desnoyers , Ingo Molnar , Vincent Guittot , Juri Lelli , Tim Chen , Aaron Lu , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , K Prateek Nayak , "Gautham R . Shenoy" , linux-kernel@vger.kernel.org, Chen Yu Date: Wed, 27 Sep 2023 14:34:59 -0700 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.44.4 (3.44.4-2.fc36) MIME-Version: 1.0 X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Wed, 27 Sep 2023 14:36:33 -0700 (PDT) On Wed, 2023-09-27 at 10:00 +0200, Ingo Molnar wrote: > * Chen Yu wrote: >=20 > > When task p is woken up, the scheduler leverages select_idle_sibling() > > to find an idle CPU for it. p's previous CPU is usually a preference > > because it can improve cache locality. However in many cases, the > > previous CPU has already been taken by other wakees, thus p has to > > find another idle CPU. > >=20 > > Inhibit the task migration while keeping the work conservation of > > scheduler could benefit many workloads. Inspired by Mathieu's > > proposal to limit the task migration ratio[1], this patch considers > > the task average sleep duration. If the task is a short sleeping one, > > then tag its previous CPU as cache hot for a short while. During this > > reservation period, other wakees are not allowed to pick this idle CPU > > until a timeout. Later if the task is woken up again, it can find its > > previous CPU still idle, and choose it in select_idle_sibling(). >=20 > Yeah, so I'm not convinced about this at this stage. >=20 > By allowing a task to basically hog a CPU after it has gone idle already, > however briefly, we reduce resource utilization efficiency for the sake > of singular benchmark workloads. >=20 > In a mixed environment the cost of leaving CPUs idle longer than necessar= y > will show up - and none of these benchmarks show that kind of side effect > and indirect overhead. >=20 > This feature would be a lot more convincing if it tried to measure overhe= ad > in the pathological case, not the case it's been written for. >=20 Ingo, Mathieu's patches on detecting overly high task migrations and then rate limiting migration is a way to detect that tasks are getting=C2=A0 crazy doing CPU musical chairs and in a pathological state. Will the migration rate be a reasonable indicator that we need to do something to reduce pathological migrations like SIS_CACHE proposal so t= he tasks don't get jerked all over? Or you have some other better indicators in mind? We did some experiments on the OLTP workload on a 112 core 2 socket SPR machine. The OLTP workload have a mixture of threads handling database updates on disks and handling transaction queries over network. For Mathieu's original task migration rate limit patches, we saw 1.2% improvement and for Chen Yu's SIS_CACHE proposal, we=C2=A0 saw 0.7% improvement. System is running at ~94% busy so is under high utilization. The variation of this workload is less than 0.2%. There are improvements for such mix workload though it is not as much as the microbenchmarks. These data are perliminary and we are still doing more experiments. For the OLTP experiments, each socket with 64 cores are divided with sub-numa clusters of 4 nodes of 16 cores each so the scheduling overhead in idle CPU search is much less if SNC is off. =20 Thanks. Tim