Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp459111ybl; Mon, 2 Dec 2019 13:23:18 -0800 (PST) X-Google-Smtp-Source: APXvYqyqGmmxgIKxBf3e1zBB3OzNfxf1lbt1Tan/aO7jqfFWnPdevtgD/Ni2whcXR4MXP6bZUgpx X-Received: by 2002:a9d:7a8f:: with SMTP id l15mr808416otn.109.1575321798148; Mon, 02 Dec 2019 13:23:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575321798; cv=none; d=google.com; s=arc-20160816; b=Coa2HIrerPIZdMcaj3UyQ9NlNPvTq2O1bH4b++HjSE0VtYzhPWmZ/1Hc0A8SoH0b+p 6iQs4l6ZBBYdiYF4+/I06ilBL0BhQwne1D5Wui2N907C+17EwskN5Vax0i3bqsd5fXGJ sekqaRnA0se5/xKjVTRDljefkxd7i+Iz2PIlERkS+nMeipAkuirp8IDSSPFPPm8CVVHN jyXKoqSYKgdrSs++5ZcxAKZ0Y12VQUknkRU1dP9E0bbZUXtyWtJvJGgAaj5Apn2f/p4F HU4/t7KkTDsG6NNPLSRWAi0Fho2vf5xzL7peOQ3na7mllRavAHTPtxiE1V80hJrW2v7w qvXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-disposition :content-transfer-encoding:user-agent:in-reply-to:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=GNy5mzrbFUfF+VYedCZNaTyj28AJmYDxC867Fp4tmD4=; b=itmxssCk2vQF69VHKeJvezVWiLRjMhdwYvaoCI6onoIZx3So+eeFQjOijVDBlcSPyX +tRNxGDr6R/xv5Ox0xtpmjVes1nYb4nfR5MQflXLeWZ5NzEx0nPr8xjFa2YLTfus0mo1 DQjmhWYMZufaz9WxtwsZCQYwTmjXI+FlPccwUXPdwUHnEYJDnkxI0H3kv6MruqF0cRe+ avSn+4J8k4uneCmE+FyUxHPDyIoKpOoN1qnDyLNr9TK7VNMD6kypZTEijekazmD7CB1U KG4XSoylndeCdNS1bJxPIFx6ub8UQrUObk4fBd1piHJgIJMqV/lQ4qrFB1ucE0+k2zEu 4fzQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MaNUiHwj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r1si158353otn.150.2019.12.02.13.23.05; Mon, 02 Dec 2019 13:23:18 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MaNUiHwj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726105AbfLBVWW (ORCPT + 99 others); Mon, 2 Dec 2019 16:22:22 -0500 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:48335 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725957AbfLBVWV (ORCPT ); Mon, 2 Dec 2019 16:22:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575321740; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GNy5mzrbFUfF+VYedCZNaTyj28AJmYDxC867Fp4tmD4=; b=MaNUiHwjSfyvj0/YdTuNSf4Jcybr1VFBnTE7FATZ4IblM4CtQaeQInhmw1r056mhbHdvL0 HFzvrcmv7tKpiGr7V/djoi+MF4+R75De/fJ/KGMJm0CNmbkSO2GPiTNbMvT9m7CiJuDvug tkI+EEe3fAXy5qq3Ivfw7Cv3iFlKFoM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-89-uh2s1mFnOQ66DD7AEEL6HA-1; Mon, 02 Dec 2019 16:22:17 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C1FE11005502; Mon, 2 Dec 2019 21:22:15 +0000 (UTC) Received: from lorien.usersys.redhat.com (ovpn-117-37.phx2.redhat.com [10.3.117.37]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2C41C5D6A7; Mon, 2 Dec 2019 21:22:12 +0000 (UTC) Date: Mon, 2 Dec 2019 16:22:10 -0500 From: Phil Auld To: Vincent Guittot Cc: Dave Chinner , Ming Lei , Hillf Danton , linux-block , linux-fs , linux-xfs , linux-kernel , Christoph Hellwig , Jens Axboe , Peter Zijlstra , Rong Chen , Tejun Heo Subject: Re: single aio thread is migrated crazily by scheduler Message-ID: <20191202212210.GA32767@lorien.usersys.redhat.com> References: <20191114113153.GB4213@ming.t460p> <20191114235415.GL4614@dread.disaster.area> <20191115010824.GC4847@ming.t460p> <20191115045634.GN4614@dread.disaster.area> <20191115070843.GA24246@ming.t460p> <20191128094003.752-1-hdanton@sina.com> <20191202024625.GD24512@ming.t460p> <20191202040256.GE2695@dread.disaster.area> MIME-Version: 1.0 In-Reply-To: User-Agent: Mutt/1.12.1 (2019-06-15) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-MC-Unique: uh2s1mFnOQ66DD7AEEL6HA-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Vincent, On Mon, Dec 02, 2019 at 02:45:42PM +0100 Vincent Guittot wrote: > On Mon, 2 Dec 2019 at 05:02, Dave Chinner wrote: ... > > So, we can fiddle with workqueues, but it doesn't address the > > underlying issue that the scheduler appears to be migrating > > non-bound tasks off a busy CPU too easily.... >=20 > The root cause of the problem is that the sched_wakeup_granularity_ns > is in the same range or higher than load balance period. As Peter > explained, This make the kworker waiting for the CPU for several load > period and a transient unbalanced state becomes a stable one that the > scheduler to fix. With default value, the scheduler doesn't try to > migrate any task. There are actually two issues here. With the high wakeup granularity we get the user task actively migrated. This causes the significant performance hit Ming was showing. With the fast wakeup_granularity (or smaller IOs - 512 instead of 4k) we get, instead, the user task migrated at wakeup to a new CPU for every IO completion. This is the 11k migrations per sec doing 11k iops. In this test it is not by itself causing the measured performance issue. It generally flips back and forth between 2 cpus for large periods. I think it is crossing cache boundaries at times (but I have not looked closely at the traces compared to the topology, yet). The active balances are what really hurts in thie case but I agree that seems to be a tuning problem. Cheers, Phil >=20 > Then, I agree that having an ack close to the request makes sense but > forcing it on the exact same CPU is too restrictive IMO. Being able to > use another CPU on the same core should not harm the performance and > may even improve it. And that may still be the case while CPUs share > their cache. >=20 > > > > -Dave. > > > > [*] Pay attention to the WQ_POWER_EFFICIENT definition for a work > > queue: it's designed for interrupt routines that defer work via work > > queues to avoid doing work on otherwise idle CPUs. It does this by > > turning the per-cpu wq into an unbound wq so that work gets > > scheduled on a non-idle CPUs in preference to the local idle CPU > > which can then remain in low power states. > > > > That's the exact opposite of what using WQ_UNBOUND ends up doing in > > this IO completion context: it pushes the work out over idle CPUs > > rather than keeping them confined on the already busy CPUs where CPU > > affinity allows the work to be done quickly. So while WQ_UNBOUND > > avoids the user task being migrated frequently, it results in the > > work being spread around many more CPUs and we burn more power to do > > the same work. > > > > -- > > Dave Chinner > > david@fromorbit.com >=20 --=20