Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752281AbbBXAtF (ORCPT ); Mon, 23 Feb 2015 19:49:05 -0500 Received: from na3sys010aog103.obsmtp.com ([74.125.245.74]:44429 "EHLO mail-pd0-f176.google.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751812AbbBXAtD (ORCPT ); Mon, 23 Feb 2015 19:49:03 -0500 X-Greylist: delayed 318 seconds by postgrey-1.27 at vger.kernel.org; Mon, 23 Feb 2015 19:49:03 EST Date: Mon, 23 Feb 2015 16:43:40 -0800 From: =?iso-8859-1?Q?J=F6rn?= Engel To: Steven Rostedt Cc: Peter Zijlstra , Ingo Molnar , Gregory Haskins , linux-kernel@vger.kernel.org Subject: RFC: revert 43fa5460fe60 Message-ID: <20150224004340.GC31433@Sligo.logfs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1944 Lines: 41 Hello Steven! I came across a silly problem that tempted me to revert 43fa5460fe60. We had a high-priority realtime thread woken, TIF_NEED_RESCHED was set for the running thread and the realtime thread didn't run for >2s. Problem was a system call that mapped a ton of device memory and never hit a cond_resched() point. Obvious solution is to fix the long-running system call. Applying that solution quickly turns into a game of whack-a-mole. Not the worst game in the world and all those moles surely deserve a solid hit on the head. But what is annoying in my case is that I had plenty of idle cpus during the entire time and the high-priority thread was allowed to run anywhere. So if the thread had been moved to a different runqueue immediately there would have been zero delay. Sure, the cache is semi-cold or the migration may even be cross-package. That is a tradeoff we are willing to make and we set the cpumask explicitly that way. We want this thread to run quickly, anywhere. So we could check for currently idle cpus when waking realtime threads. If there are any, immediately move the woken thread over. Maybe have a check whether the running thread on the current cpu is in a syscall and retain current behaviour if not. Now this is not quite the same as reverting 43fa5460fe60 and I would like to verify the idea before I spend time on a patch you would never consider merging anyway. J?rn -- As more agents are added, systems become more reliable in the total-effort case, but less reliable in the weakest-link case. What are the implications? Well, software companies schould hire more software testers and fewer (but more competent) programmers. -- Ross Anderson -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/