Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp4124566yba; Tue, 9 Apr 2019 11:40:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqzsb16n86zh68ekMOUzDXxqLAedcw87VFJVxj5grGNtklQWUqCOqpz3yIDyDQWJJ27xX318 X-Received: by 2002:a63:9dc4:: with SMTP id i187mr33971073pgd.259.1554835223282; Tue, 09 Apr 2019 11:40:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554835223; cv=none; d=google.com; s=arc-20160816; b=bWpgVo7sPSIp0jdyBUI/dqIlxaMTruOIabuSyJ8pB7ozAJrjZtW7oTFbnpfEqStb3o 37HatUuw/6acAbddwiA9LAwSxrcFrAmShZK6XooOnQfXGzujxy90JMw/uLa+jdWlB4Hh ggp7NqGg+xl+1srhUo8PoDJw4F1e7NB58mMlBzba6rPF7gxIteuMlZ/WO5FGTlTxliZh 1c3tmcIOSCxdLOHGIaAyAYdDnGYZoQeb2R6QKLGrChY+R7CZgKIS7DAw0IEdOnwVSZ2u 77dTvsNyskxinenBauu8hFALjz06pqgYhIML+HqgiIN+R53lKZfJh8BjmaiM9IOj42Ck 9w6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=pvkQKdekOqJ4rygDyo9HwPn79tq0rdGXoEw1ROV5kgM=; b=Ap92bhoXR9kle5s1HXu84rxga7hUfaMLLM0Var2QVZEkfYKLMgo2Qetbvcz0YpJ5eN TMuFA8wSVFgVdMWd8JbZ4hs3+np3cKPxT59rR4oLDcX/s2dO/qY0KUPrJs8XsZWpu215 xarJp2qkSdddQW8hSjVppdhhkMnh2xZHWyuwCJGtVPrWTQ2He4HbvPCJmbdKyyfEUorY 5QQgYRI0KYhcVQw8eCxWyPBnUpK8YwujjahWR8ZjMY+YIXo3GZBl+dssYBihNx2H3nRj j2WnpKkQmGv6fPfOHdvMHQhJXoK/1usQu5jaBOEw9lVmJ8WdfbTnqlR9iwZRUpA0emtg QtxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@digitalocean.com header.s=google header.b=foQ6mlL5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=digitalocean.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b9si29340005pgn.457.2019.04.09.11.40.07; Tue, 09 Apr 2019 11:40:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@digitalocean.com header.s=google header.b=foQ6mlL5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=digitalocean.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726713AbfDISjd (ORCPT + 99 others); Tue, 9 Apr 2019 14:39:33 -0400 Received: from mail-qk1-f195.google.com ([209.85.222.195]:35363 "EHLO mail-qk1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726396AbfDISjd (ORCPT ); Tue, 9 Apr 2019 14:39:33 -0400 Received: by mail-qk1-f195.google.com with SMTP id a71so10931882qkg.2 for ; Tue, 09 Apr 2019 11:39:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digitalocean.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=pvkQKdekOqJ4rygDyo9HwPn79tq0rdGXoEw1ROV5kgM=; b=foQ6mlL5NR9ljclb8177sgdcjxZ0TnMVVrw/pOH56UH9xeRFYQlypgb7mIEV5ezwdz BDk7+rDdIAYn5N07VLMqKKW1yRYeSTfIF66vVz15RUPZTRUhqaMJ7lPx5F1F/CkPAO/+ phTHxjNyUzx9b8gMP9RiFpo5ekf9brPSUO260= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=pvkQKdekOqJ4rygDyo9HwPn79tq0rdGXoEw1ROV5kgM=; b=SnqNHZadTuDii9Jkjodtfcdg/c2qHRBsenh42q8TvDcDY3rwLYTi4/aSeny4Y61LNi Ld386x8znsnh6V6b9gq2kGMNTTJkiWwjF8SZIrGToET/e+zCkM1gfuNVfVOYRgPCP+ep 7YIbJ84f3q01DMV4lIa0SB0rvMgag3tEBLRFovMGOPExNbYAz2m4PQpu7PsS7LRPEYrV 49uiGmIXu8gbnsFFbYEefT1V4j2a7GeSbKYfV7PYlazd4tkrlzy5wKlGAQOzgd2HJIcc ILRKDCh66QIiTLxELDiUfVFSRk0DA/jxykcmu/1N7cZ9mAZ+QJujMAHyDKT81SXTAHUi 82Vg== X-Gm-Message-State: APjAAAW44STkuichTC0JN52BB+hWHJYbno7uZIypGpH9tjGJ/sk05J+L cb2SUdvGTC86UYQQGR9W1uc70P1ELTQ= X-Received: by 2002:a05:620a:1281:: with SMTP id w1mr30184179qki.7.1554835172072; Tue, 09 Apr 2019 11:39:32 -0700 (PDT) Received: from [192.168.1.240] (modemcable077.38-81-70.mc.videotron.ca. [70.81.38.77]) by smtp.gmail.com with ESMTPSA id n201sm18362745qka.10.2019.04.09.11.39.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 09 Apr 2019 11:39:31 -0700 (PDT) From: Julien Desfossez To: Peter Zijlstra , mingo@kernel.org, tglx@linutronix.de, pjt@google.com, tim.c.chen@linux.intel.com, torvalds@linux-foundation.org Cc: Julien Desfossez , linux-kernel@vger.kernel.org, subhra.mazumdar@oracle.com, fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com, Vineeth Pillai , Nishanth Aravamudan , Aaron Lu Subject: Re: [RFC][PATCH 13/16] sched: Add core wide task selection and scheduling. Date: Tue, 9 Apr 2019 14:38:55 -0400 Message-Id: <1554835135-11814-1-git-send-email-jdesfossez@digitalocean.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <20190218173514.667598558@infradead.org> References: <20190218173514.667598558@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We found the source of the major performance regression we discussed previously. It turns out there was a pattern where a task (a kworker in this case) could be woken up, but the core could still end up idle before that task had a chance to run. Example sequence, cpu0 and cpu1 and siblings on the same core, task1 and task2 are in the same cgroup with the tag enabled (each following line happens in the increasing order of time): - task1 running on cpu0, task2 running on cpu1 - sched_waking(kworker/0, target_cpu=cpu0) - task1 scheduled out of cpu0 - kworker/0 cannot run on cpu0 because of task2 is still running on cpu1 cpu0 is idle - task2 scheduled out of cpu1 - cpu1 doesn’t select kworker/0 for cpu0, because the optimization path ends the task selection if core_cookie is NULL for currently selected process and the cpu1’s runqueue. - cpu1 is idle --> both siblings are idle but kworker/0 is still in the run queue of cpu0. Cpu0 may stay idle for longer if it goes deep idle. With the fix below, we ensure to send an IPI to the sibling if it is idle and has tasks waiting in its runqueue. This fixes the performance issue we were seeing. Now here is what we can measure with a disk write-intensive benchmark: - no performance impact with enabling core scheduling without any tagged task, - 5% overhead if one tagged task is competing with an untagged task, - 10% overhead if 2 tasks tagged with a different tag are competing against each other. We are starting more scaling tests, but this is very encouraging ! diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e1fa10561279..02c862a5e973 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3779,7 +3779,22 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) trace_printk("unconstrained pick: %s/%d %lx\n", next->comm, next->pid, next->core_cookie); + rq->core_pick = NULL; + /* + * If the sibling is idling, we might want to wake it + * so that it can check for any runnable but blocked tasks + * due to previous task matching. + */ + for_each_cpu(j, smt_mask) { + struct rq *rq_j = cpu_rq(j); + rq_j->core_pick = NULL; + if (j != cpu && is_idle_task(rq_j->curr) && rq_j->nr_running) { + resched_curr(rq_j); + trace_printk("IPI(%d->%d[%d]) idle preempt\n", + cpu, j, rq_j->nr_running); + } + } goto done; }