Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp43792pxx; Mon, 26 Oct 2020 02:53:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxElaq2NBoD0cJwwJIj+Dx6Adx7eUJ5MDc/oz6FSoBjSImWd05YQ75s8VIOgV9SoVMIeWrA X-Received: by 2002:aa7:d4c6:: with SMTP id t6mr14472293edr.372.1603706036701; Mon, 26 Oct 2020 02:53:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603706036; cv=none; d=google.com; s=arc-20160816; b=bhk65nYWREteyGZtmdGiO05cMSzZSDXR8KiG/bTmrWNxWhnI07eKV7fIQrkY9fldTf dKz8bllVXM6HI3plv6MxbjSOy5Mm/fwJoocTJL6fa9AHAg5i7QqZYsGCeT9+3RZhv05A YBPWnFBsR7+QDuHdDN2kG1YqaKDqRjiu5T+M427TrLOK2h6EJeGO53lnjQrxuk785xUs Z6lOVqFsLhLLLTBmJX8lWLhu0wxahBY8w+U2map2u+b2hHhozUqTB4PND41Qwt9CxNq9 ewLahsNbmON52wyAHT8xUCYtusggAuCmPsVuESyPmVOrbXOb2Wzp9NHUmLL62DIGNiQF QuqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=MgzziDtWia0Qt6LjrjyL/SczFKo3h9Y19cub3ha/Qec=; b=fxr2VYMmgh5nofp33vW8QuIs4GtG+iUgT2tINBA51zxWq4qllKB3BW87ABsdQGlJe6 iWVUylmkvItI+tqGqiUaGi9Kh+ObsiumBhNrnkH/vrQDyGHngkcEIZ5Kw+LR5mzc68K+ mIcD/onCcGvC17m5LoBgcq5rp6Q42bQOb+mfFeB3vQZqBxRnuranU6jQ3KwAA3K3vaQJ LOmhPBexd8Wim7846mC2YgEuX1lFYBIOOOkodFgzbKFTDwXDDdRpE6wy3vN6q4TetksO gm6pxrckCCZWd63f31Qr0yMoWmpazX7r5VI8OUfHd/KBHJNuM0AsVgAR6ObmipmWkthD WZMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=vzl1pTCI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j21si6564188eds.161.2020.10.26.02.53.33; Mon, 26 Oct 2020 02:53:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=vzl1pTCI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1771993AbgJZIjs (ORCPT + 99 others); Mon, 26 Oct 2020 04:39:48 -0400 Received: from mail-lf1-f67.google.com ([209.85.167.67]:42073 "EHLO mail-lf1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1771987AbgJZIjr (ORCPT ); Mon, 26 Oct 2020 04:39:47 -0400 Received: by mail-lf1-f67.google.com with SMTP id a7so10646127lfk.9 for ; Mon, 26 Oct 2020 01:39:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=MgzziDtWia0Qt6LjrjyL/SczFKo3h9Y19cub3ha/Qec=; b=vzl1pTCIFx0XneZS6Ajx+G1Y+z/NUN55AaL/xLinXriNDsoDzOVBlAtHDnkpJ5aut6 CKGXeEjqJTs0RtB/10s5dXNALqTTsqCCMQD3eTj/ymO9oD9SIDPdKMvsEBYi6rnf27of tfgpCI6tY4L/4zfqfJ5IDLvRfnvJhTkqBMoWplEaDEq+vxARCyijUNGrbgrElCQqFgvu VWvQ1+qfMe44NKQzU+Q3fNIOXxVoXAzM5B13ZKH7qZZnfFrfuYpedYlbg9yN4ikbZRtx u0BkHrFF64UrPzXp0zy11HoEx5Jv7fKRuy77EcZEKGeV/PRlthPlJnxubKrqu9Q4RcfN hH4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=MgzziDtWia0Qt6LjrjyL/SczFKo3h9Y19cub3ha/Qec=; b=bDMRc1g9xe4ykLhSwh0VoZaOTMSuYSheR80v5CWItk1A53JuOmHzUP6ZKdM019oDRu 4Gqj5YzUYhDYJM7hCktEpwm1my/18b3kKxc/054jbhV7e0Cu/oXlETkQocio/b6DxGOE HQuHYY5ooL4gj4hY0C0gB0+hgFx2xzchPqd2GEcPDfia8ZTm2nCUe8e8jziArptwjeni ULoccGBtp/8Ol229ioJBW4TRpMQcUIdDqMGvmYXBZMVzYYx6Ua1OSU9M3XrQpYG1tfw8 ktpoJzik0iAzBaIS/Le0kA0AtmbsLYfFMWh+LgtQfk5W+AEHp1qumOKtR2jqOgGV9T4h mn1w== X-Gm-Message-State: AOAM531edQC2D76xThZdo8RjtQW9tmr9hYJ7WUwuu3TVmDIb2EBLU0fG /Ggy95bF6ERrloljHrPuGglJmTrCYpJiIU2YFwCbuA== X-Received: by 2002:a19:4a88:: with SMTP id x130mr4414957lfa.31.1603701584698; Mon, 26 Oct 2020 01:39:44 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Vincent Guittot Date: Mon, 26 Oct 2020 09:39:32 +0100 Message-ID: Subject: Re: [PATCH] fix scheduler regression from "sched/fair: Rework load_balance()" To: Chris Mason Cc: Peter Zijlstra , Johannes Weiner , Rik van Riel , linux-kernel Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Chris On Sat, 24 Oct 2020 at 01:49, Chris Mason wrote: > > Hi everyone, > > We=E2=80=99re validating a new kernel in the fleet, and compared with v5.= 2, Which version are you using ? several improvements have been added since v5.5 and the rework of load_bala= nce > performance is ~2-3% lower for some of our workloads. After some > digging, Johannes found that our involuntary context switch rate was ~2x > higher, and we were leaving a CPU idle a higher percentage of the time, > even though the workload was trying to saturate the system. > > We were able to reproduce the problem with schbench, and Johannes > bisected down to: > > commit 0b0695f2b34a4afa3f6e9aa1ff0e5336d8dad912 > Author: Vincent Guittot > Date: Fri Oct 18 15:26:31 2019 +0200 > > sched/fair: Rework load_balance() > > Our working theory is the load balancing changes are leaving processes > behind busy CPUs instead of moving them onto idle ones. I made a few > schbench modifications to make this easier to demonstrate: > > https://git.kernel.org/pub/scm/linux/kernel/git/mason/schbench.git/ > > My VM has 40 cpus (20 cores, 2 threads per core), and my schbench > command line is: What is the topology ? are they all part of the same LLC ? > > schbench -t 20 -r 0 -c 1000000 -s 1000 -i 30 -z 120 > > This has two message threads, and 20 workers per message thread. Once > woken up, the workers think for a full second, which means you=E2=80=99ll= have > some long latencies if you=E2=80=99re stuck behind one of these workers i= n the > runqueue. The message thread does a little bit of work and then sleeps, > so we end up with 40 threads hammering full blast on the CPU and 2 > threads popping in and out of idle. > > schbench times the delay from when a message thread wakes a worker to > when the worker runs. On a good kernel, the output looks like this: > > Latency percentiles (usec) runtime 1290 (s) (3280 total samples) > 50.0th: 155 (1653 samples) > 75.0th: 189 (808 samples) > 90.0th: 216 (501 samples) > 95.0th: 227 (163 samples) > *99.0th: 256 (123 samples) > 99.5th: 1510 (16 samples) > 99.9th: 3132 (13 samples) > min=3D21, max=3D3286 > > With 0b0695f2b34a, we get this: > > Latency percentiles (usec) runtime 1440 (s) (4480 total samples) > 50.0th: 147 (2261 samples) > 75.0th: 182 (1116 samples) > 90.0th: 205 (671 samples) > 95.0th: 224 (215 samples) > *99.0th: 12240 (173 samples) <=E2=80=94=E2=80=94 much higher p99= and up > 99.5th: 12752 (22 samples) > 99.9th: 13104 (18 samples) > min=3D21, max=3D13172 > > Since the idea is to fully load the machine with schbench, use schbench > -t , and make sure the box doesn=E2=80=99t have other st= uff > running in the background. I used a VM because it ended up giving more > consistent results on our kernel test machines, which have some periodic > noise running in the background. > > We=E2=80=99ve tried a few different approaches, but don=E2=80=99t quite h= ave a solid > fix yet. I thought I=E2=80=99d kick off the discussion with my most usef= ul > hunks so far: > > diff a/kernel/sched/fair.c b/kernel/sched/fair.c > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > > -chris