Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp2378475ybl; Mon, 20 Jan 2020 01:40:39 -0800 (PST) X-Google-Smtp-Source: APXvYqw2JnjEK3ubc8l27xwIcufFQlJfte5+TVDK2HNNiLJYyZ5I00b0RIKLAoftXRysBIJBx14s X-Received: by 2002:aca:2118:: with SMTP id 24mr7743926oiz.28.1579513239767; Mon, 20 Jan 2020 01:40:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579513239; cv=none; d=google.com; s=arc-20160816; b=CGs7TEmW5m1Mq19bdlBe3oQ6I2P4kTy+g0IwQHUuRSnsv/L3NHT3BI1SuEUuE/jyjD ZMZ6wOqWCPel8WTk/lERhq3PWqn/y7+x5NZ1tMOlOLjSQ1gKgn3vgADwHociqNeOcPQf iHViYMYVZV6zU4YSTJfo0pvO3/pbq/26uvtWbTqsnEemCTl78fsqKvJhZxHzTVG7/RPE bfJ2qjkSMBg7o8It4NfeUKhuJagFescGFdLn7b5H/qBwSw+mC1hswB0ImULB0qZPlo+J k0wVNK7alTw6ZkoWyZjVMcdfozGqHCARTAIRPtIK/tfNKhm6HG75Eaf+1n5qOKf/ZMLA OCjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=+UQKScvE0TvLcZMSA8r7HOT6cUtPbAFqy+Z8UQTlzEo=; b=XvcaBcqNI6N5hnU6blLFB/T1B7BJgveiwcbNrD2+02vXtj7DJmRmcQOVqZMJALjlVQ Mwyoy+pDyQYp3SgtxsGAB8zu5ng7ZbUZdSNubr9JW8HaTeFOVfGgCWpQewmr4O1YPuoH XO5tvpC+qnsA1VwDQ7NiezHjGIEn85VdCTKTtR9aHsuM3mryBnDQvO+a9joRtnS+CjkW nMHn6bc6nbUisMoTMcUbYyxfj6gLK321mrHt+UgHpEH0CsLATyft/pBO2AagB31N3JgY GqqIt9xJLVzySkKOpJZT5KK/rdCFjCpL4nFCCjNSw5tYwRxEkg4OX9ZnpNSLTpsa5dW6 jdgQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u8si19025280otq.262.2020.01.20.01.40.27; Mon, 20 Jan 2020 01:40:39 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726626AbgATJjb (ORCPT + 99 others); Mon, 20 Jan 2020 04:39:31 -0500 Received: from foss.arm.com ([217.140.110.172]:57640 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726451AbgATJja (ORCPT ); Mon, 20 Jan 2020 04:39:30 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2A61C30E; Mon, 20 Jan 2020 01:39:30 -0800 (PST) Received: from [192.168.0.7] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A27B63F68E; Mon, 20 Jan 2020 01:39:28 -0800 (PST) Subject: Re: sched/fair: scheduler not running high priority process on idle cpu To: David Laight , 'Steven Rostedt' Cc: 'Vincent Guittot' , Peter Zijlstra , Viresh Kumar , Ingo Molnar , Juri Lelli , Ben Segall , Mel Gorman , linux-kernel References: <212fabd759b0486aa8df588477acf6d0@AcuMS.aculab.com> <20200114115906.22f952ff@gandalf.local.home> <5ba2ae2d426c4058b314c20c25a9b1d0@AcuMS.aculab.com> <20200114124812.4d5355ae@gandalf.local.home> <878a35a6642d482aa0770a055506bd5e@AcuMS.aculab.com> <20200115081830.036ade4e@gandalf.local.home> <9f98b2dd807941a3b85d217815a4d9aa@AcuMS.aculab.com> <20200115103049.06600f6e@gandalf.local.home> From: Dietmar Eggemann Message-ID: <26b2f8f7-b11f-0df0-5260-a232e1d5bf1a@arm.com> Date: Mon, 20 Jan 2020 10:39:27 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 15/01/2020 18:07, David Laight wrote: > From Steven Rostedt >> Sent: 15 January 2020 15:31 > ... >>> For this case an idle cpu doing a unlocked check for a processes that has >>> been waiting 'ages' to preempt the running process may not be too >>> expensive. >> >> How do you measure a process waiting for ages on another CPU? And then >> by the time you get the information to pull it, there's always the race >> that the process will get the chance to run. And if you think about it, >> by looking for a process waiting for a long time, it is likely it will >> start to run because "ages" means it's probably close to being released. > > Without a CBU (Crystal Ball Unit) you can always be unlucky. > But once you get over the 'normal' delays for a system call you probably > get an exponential (or is it logarithmic) distribution and the additional > delay is likely to be at least some fraction of the time it has already waited. > > While not entirely the same, but something I still need to look at further. > This is a histogram of time taken (in ns) to send on a raw IPv4 socket. > 0k: 1874462617 > 96k: 260350 > 160k: 30771 > 224k: 14812 > 288k: 770 > 352k: 593 > 416k: 489 > 480k: 368 > 544k: 185 > 608k: 63 > 672k: 27 > 736k: 6 > 800k: 1 > 864k: 2 > 928k: 3 > 992k: 4 > 1056k: 1 > 1120k: 0 > 1184k: 1 > 1248k: 1 > 1312k: 2 > 1376k: 3 > 1440k: 1 > 1504k: 1 > 1568k: 1 > 1632k: 4 > 1696k: 0 (5 times) > 2016k: 1 > 2080k: 0 > 2144k: 1 > total: 1874771078, average 32k > > I've improved it no end by using per-thread sockets and setting > the socket write queue size large. > But there are still some places where it takes > 600us. > The top end is rather more linear than one might expect. > >>> I presume the locks are in place for the migrate itself. >> >> Note, by grabbing locks on another CPU will incur overhead on that >> other CPU. I've seen huge latency caused by doing just this. > > I'd have thought this would only be significant if the cache line > ends up being used by both cpus? > >>> The only downside is that the process's data is likely to be in the wrong cache, >>> but unless the original cpu becomes available just after the migrate it is >>> probably still a win. >> >> If you are doing this with just tasks that are waiting for the CPU to >> be preemptable, then it is most likely not a win at all. > > You'd need a good guess that the wait would be long. > >> Now, the RT tasks do have an aggressive push / pull logic, that keeps >> track of which CPUs are running lower priority tasks and will work hard >> to keep all RT tasks running (and aggressively migrate them). But this >> logic still only takes place at preemption points (cond_resched(), etc). > > I guess this only 'gives away' extra RT processes. > Rather than 'stealing' them - which is what I need. Isn't part of the problem that RT doesn't maintain cp->pri_to_cpu[CPUPRI_IDLE] (CPUPRI_IDLE = 0). So push/pull (find_lowest_rq()) never returns a mask of idle CPUs. There was https://lore.kernel.org/r/1415260327-30465-2-git-send-email-pang.xunlei@linaro.org in 2014 but it didn't go mainline. There was a similar question in Nov last year: https://lore.kernel.org/r/CH2PR19MB3896AFE1D13AD88A17160860FC700@CH2PR19MB3896.namprd19.prod.outlook.com