Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp6617899ybl; Wed, 15 Jan 2020 07:32:17 -0800 (PST) X-Google-Smtp-Source: APXvYqx0PGxt0Hze+IFLmwnleIcBhlitSwkPNhDheNnJO3uvMXeU9FHTHtMKfSOYV43sxYcht5FW X-Received: by 2002:a05:6808:683:: with SMTP id k3mr275608oig.50.1579102337498; Wed, 15 Jan 2020 07:32:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579102337; cv=none; d=google.com; s=arc-20160816; b=1CBbM1VzAAtydCSZ4OKKwSUeldRlTdEQHuurpHiiefI6NNmJzNy6K1rFRo+PBYkZY/ SOJI1/4mSD1vb3yvmEchUpUMIiK3smh5aTd1a9YQQ+8FJlCArPsdyAxFvccIUeVzJDxk 46VpMoVGEVcxWjxFRjqKYxQEiMqROKo9NmCyRwjc96mtiItBwYGddCfB4rSBSm5Yfa12 3DW4OkscgtJLooBKZntBlLn2Pi3GEh/ueUfHGfe9wJXTgRPIPjj+kRv4DQPzpUDj3T4v 8wohK88GrMNWFpn7EO57jzbogZ86IzGqW/t4wIouKBVIFGvbBak0tmGrq+MeopHbALKA o3lw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=UoQWyBXVZM/591TsKYcgdXrNVNTf71NzqKDWCsbTcm8=; b=d3ogMSgNUXHZf7zsuywlnQoKkaHLoVm0GstiUF0f73eIQivkEGZkdP7pMq6Ce0dGEY 8dgBImTGqYheUnQtMRPGtgtwXFgkT0Y+sh4HAVbzaT+I3nYxOmedFEkeRufhrBeRWYXg +wX0Fn/Tow7w9Yc7I5wHyNTeRhTTrvVHRAXg0biRsS+PuZepGmQq1Yq358mcQtPDjsWo 2IFXZmg9exp0OA/xIKc/oB+IRavMihzGY99wi6Xttm4onSghbvVgiLhYEXKLlNbFigcn W7K0RKd00MprqEyCw92zSdHegfDWmUJy1jzkaJTr/pUyQqyYuq9On6IOQNYlFrk/tmbM KhoQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f3si9964604oia.264.2020.01.15.07.32.05; Wed, 15 Jan 2020 07:32:17 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729047AbgAOPax (ORCPT + 99 others); Wed, 15 Jan 2020 10:30:53 -0500 Received: from mail.kernel.org ([198.145.29.99]:50906 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726132AbgAOPax (ORCPT ); Wed, 15 Jan 2020 10:30:53 -0500 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A63502053B; Wed, 15 Jan 2020 15:30:51 +0000 (UTC) Date: Wed, 15 Jan 2020 10:30:49 -0500 From: Steven Rostedt To: David Laight Cc: 'Vincent Guittot' , Peter Zijlstra , Viresh Kumar , Ingo Molnar , Juri Lelli , Dietmar Eggemann , Ben Segall , Mel Gorman , linux-kernel Subject: Re: sched/fair: scheduler not running high priority process on idle cpu Message-ID: <20200115103049.06600f6e@gandalf.local.home> In-Reply-To: <9f98b2dd807941a3b85d217815a4d9aa@AcuMS.aculab.com> References: <212fabd759b0486aa8df588477acf6d0@AcuMS.aculab.com> <20200114115906.22f952ff@gandalf.local.home> <5ba2ae2d426c4058b314c20c25a9b1d0@AcuMS.aculab.com> <20200114124812.4d5355ae@gandalf.local.home> <878a35a6642d482aa0770a055506bd5e@AcuMS.aculab.com> <20200115081830.036ade4e@gandalf.local.home> <9f98b2dd807941a3b85d217815a4d9aa@AcuMS.aculab.com> X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 15 Jan 2020 15:11:32 +0000 David Laight wrote: > From: Steven Rostedt > > Sent: 15 January 2020 13:19 > > On Wed, 15 Jan 2020 12:44:19 +0000 > > David Laight wrote: > > > > > > Yes, even with CONFIG_PREEMPT, Linux has no guarantees of latency for > > > > any task regardless of priority. If you have latency requirements, then > > > > you need to apply the PREEMPT_RT patch (which may soon make it to > > > > mainline this year!), which spin locks and bh wont stop a task from > > > > scheduling (unless they need the same lock) > > > > Every time you add something to allow higher priority processes to run > > with less latency you add overhead. By just adding that spinlock check > > or to migrate a process to a idle cpu will add a measurable overhead, > > and as you state, distros won't like that. > > > > It's a constant game of give and take. > > I know exactly how much effect innocuous changes can have... > > Sorting out process migration on a 1024 cpu NUMA system must be a PITA. > > For this case an idle cpu doing a unlocked check for a processes that has > been waiting 'ages' to preempt the running process may not be too > expensive. How do you measure a process waiting for ages on another CPU? And then by the time you get the information to pull it, there's always the race that the process will get the chance to run. And if you think about it, by looking for a process waiting for a long time, it is likely it will start to run because "ages" means it's probably close to being released. > I presume the locks are in place for the migrate itself. Note, by grabbing locks on another CPU will incur overhead on that other CPU. I've seen huge latency caused by doing just this. > The only downside is that the process's data is likely to be in the wrong cache, > but unless the original cpu becomes available just after the migrate it is > probably still a win. If you are doing this with just tasks that are waiting for the CPU to be preemptable, then it is most likely not a win at all. Now, the RT tasks do have an aggressive push / pull logic, that keeps track of which CPUs are running lower priority tasks and will work hard to keep all RT tasks running (and aggressively migrate them). But this logic still only takes place at preemption points (cond_resched(), etc). -- Steve