Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1111201ybl; Tue, 3 Dec 2019 01:46:49 -0800 (PST) X-Google-Smtp-Source: APXvYqzoYbxnJ4Vv9CxDJxEcbvXd7kfWWWR5GmWTmy2wHpR4fYtegCDMgfbEOCtadUiMyZ0lGk3e X-Received: by 2002:aca:52c4:: with SMTP id g187mr2916740oib.76.1575366409750; Tue, 03 Dec 2019 01:46:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575366409; cv=none; d=google.com; s=arc-20160816; b=siAmlA8skXmTTVuvCvLZrbna/TYAVWE8GyfZNyG/T9mVJez5UD39aDFBafi4wmN3LN odZmF+QfaVhfYcxx3utFcOpsy7etnr7tPsfHsRTMzWodxmkMaclWarg+IpV5RRd+KQCj 89O1R1RKLwSkIsGPVatfT573/rNFO9sSuxfrPjjFrmPUakShzZmFCaEQ1gkAeA1zbZFG mBoj2XnVzMKpYimh8w8puGt4zaiFwffLyK5Lf1v33No41wkObLnWfAWpKkAV99Lrh+PF 2iyPln+aUHMqNpAjDtOSawfzDRULaXioIVfUrZfVpxUImfJrtk0gyArHDc5AiYnjj1Cl X40Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=neTWFwuJR4Jukr2kV185rqamwxRtM5mkOfSfynGkp0Q=; b=bqhK7ZvTTBA4L17mqL6ZhqHhKF3bKt+NPw0r3xwQYXWLqltrGJZpan6kLz3QhaGMKe ZiqbjKZw4t42We8IN9jAuEzdO/hoCH/GqvegTOfandJIxpcZBg4wssIwEvKMDLWCMh0q quoTUrSYmsu/CU3tuB0tblvGl4pz4oqpWNZ7FWZlxUGQcAMQX/7bjJuNfVGwWyZuCoK9 JbdZitDEzB7qKO2Jgc54el1RXApZXUAMqJ9EYg+XAVVx1RUtC3N9DqTHksyRgTTw0Z5D 4ZfhAmaz7gU0sk+ZXfmW0t3eYnRi+YIagtRxUE2VqE3h/uHTBu9Afn09YYw1Ejzmbnkn Ol+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=xTItXRhl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v15si757620oic.68.2019.12.03.01.46.37; Tue, 03 Dec 2019 01:46:49 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=xTItXRhl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726075AbfLCJpx (ORCPT + 99 others); Tue, 3 Dec 2019 04:45:53 -0500 Received: from mail-lj1-f178.google.com ([209.85.208.178]:39378 "EHLO mail-lj1-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725954AbfLCJpx (ORCPT ); Tue, 3 Dec 2019 04:45:53 -0500 Received: by mail-lj1-f178.google.com with SMTP id e10so3003150ljj.6 for ; Tue, 03 Dec 2019 01:45:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=neTWFwuJR4Jukr2kV185rqamwxRtM5mkOfSfynGkp0Q=; b=xTItXRhlAS9cxWUEh9q8wXyZXw8qmon9lBOU+YKk3+jsr5zYrJpuOg82lsWCWE9x9X 5ADlxer0aiEW2iYzfNSLNiOj63PrR3F1nhK2PdD9hq2kxZiTHUou7Za4v3ekOJ7tCxNy uRMpB5tImy3zbNIFsWsk4L7/gH0FrKa1N+yCJoJNBcz+PC4S+b23G+oB15oopcBSChz8 vw7VojQo7xfiSfrnHHarbkiSjhP5VOgbvWWWTzDRZWe8vebQavuihtriA4mlUGNONjmQ +H1/t6RNLmdxVGu0iujLQM/DXgUOSvnjT5FObcWKUE4m/qL+FYOTUVsV0EepEalnzDnn /Rnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=neTWFwuJR4Jukr2kV185rqamwxRtM5mkOfSfynGkp0Q=; b=MJf8wO8oA7uiCvAzoJ7+YNUbJnJgxaOxoPqlk2nYbO05ruo+2Igmo0dMK/0c6fE8Ec 5XDXyTsrPQUeV1YQoVyCfMOoMsC6d9Pzou0cCqmiNuslIfg41z2qpSNoSIhOp2Yt+DLL zplqeXOstcalH+s00MYDq4fYwWXJ/jGOfRdV64C7+G0o8KDTCAifvkcnArVSW4+MBWn5 BG8O+7yHae4QlpVxn6lJniQkr6n3pmzujA6iFznq2kb/rslU368rXJvX2Zft/xO9/TBx Rqc5QknEXCD1EzHBvbk6oXYkkWVCZARebfKJhHS9FvCqOFwDPpBeKhMdD1R7G83ywhW2 StZg== X-Gm-Message-State: APjAAAXDeIMV/ZA4nmQMm1rPk5jnDPYpSDf6IEh5FEgqX+DBruAS51OE 276OnlFriuu5TqVTi4Q7172dgDtwAuycKsRGHOIF+g== X-Received: by 2002:a2e:9a51:: with SMTP id k17mr1372846ljj.206.1575366350405; Tue, 03 Dec 2019 01:45:50 -0800 (PST) MIME-Version: 1.0 References: <20191114113153.GB4213@ming.t460p> <20191114235415.GL4614@dread.disaster.area> <20191115010824.GC4847@ming.t460p> <20191115045634.GN4614@dread.disaster.area> <20191115070843.GA24246@ming.t460p> <20191128094003.752-1-hdanton@sina.com> <20191202024625.GD24512@ming.t460p> <20191202040256.GE2695@dread.disaster.area> <20191202212210.GA32767@lorien.usersys.redhat.com> In-Reply-To: <20191202212210.GA32767@lorien.usersys.redhat.com> From: Vincent Guittot Date: Tue, 3 Dec 2019 10:45:38 +0100 Message-ID: Subject: Re: single aio thread is migrated crazily by scheduler To: Phil Auld Cc: Dave Chinner , Ming Lei , Hillf Danton , linux-block , linux-fs , linux-xfs , linux-kernel , Christoph Hellwig , Jens Axboe , Peter Zijlstra , Rong Chen , Tejun Heo Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2 Dec 2019 at 22:22, Phil Auld wrote: > > Hi Vincent, > > On Mon, Dec 02, 2019 at 02:45:42PM +0100 Vincent Guittot wrote: > > On Mon, 2 Dec 2019 at 05:02, Dave Chinner wrote: > > ... > > > > So, we can fiddle with workqueues, but it doesn't address the > > > underlying issue that the scheduler appears to be migrating > > > non-bound tasks off a busy CPU too easily.... > > > > The root cause of the problem is that the sched_wakeup_granularity_ns > > is in the same range or higher than load balance period. As Peter > > explained, This make the kworker waiting for the CPU for several load > > period and a transient unbalanced state becomes a stable one that the > > scheduler to fix. With default value, the scheduler doesn't try to > > migrate any task. > > There are actually two issues here. With the high wakeup granularity > we get the user task actively migrated. This causes the significant > performance hit Ming was showing. With the fast wakeup_granularity > (or smaller IOs - 512 instead of 4k) we get, instead, the user task > migrated at wakeup to a new CPU for every IO completion. Ok, I haven't noticed that this one was a problem too. Do we have perf regression ? > > This is the 11k migrations per sec doing 11k iops. In this test it > is not by itself causing the measured performance issue. It generally > flips back and forth between 2 cpus for large periods. I think it is > crossing cache boundaries at times (but I have not looked closely > at the traces compared to the topology, yet). At task wake up, scheduler compares local and previous CPU to decide where to place the task and will then try to find an idle one which shares cache so I don't expect that it will cross cache boundary as local and previous are in your case. > > The active balances are what really hurts in thie case but I agree > that seems to be a tuning problem. > > > Cheers, > Phil > > > > > > Then, I agree that having an ack close to the request makes sense but > > forcing it on the exact same CPU is too restrictive IMO. Being able to > > use another CPU on the same core should not harm the performance and > > may even improve it. And that may still be the case while CPUs share > > their cache. > > > > > > > > -Dave. > > > > > > [*] Pay attention to the WQ_POWER_EFFICIENT definition for a work > > > queue: it's designed for interrupt routines that defer work via work > > > queues to avoid doing work on otherwise idle CPUs. It does this by > > > turning the per-cpu wq into an unbound wq so that work gets > > > scheduled on a non-idle CPUs in preference to the local idle CPU > > > which can then remain in low power states. > > > > > > That's the exact opposite of what using WQ_UNBOUND ends up doing in > > > this IO completion context: it pushes the work out over idle CPUs > > > rather than keeping them confined on the already busy CPUs where CPU > > > affinity allows the work to be done quickly. So while WQ_UNBOUND > > > avoids the user task being migrated frequently, it results in the > > > work being spread around many more CPUs and we burn more power to do > > > the same work. > > > > > > -- > > > Dave Chinner > > > david@fromorbit.com > > > > -- >