Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752769AbbD0GuP (ORCPT ); Mon, 27 Apr 2015 02:50:15 -0400 Received: from m15-111.126.com ([220.181.15.111]:33205 "EHLO m15-111.126.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752562AbbD0GuL (ORCPT ); Mon, 27 Apr 2015 02:50:11 -0400 From: Xunlei Pang To: linux-kernel@vger.kernel.org Cc: Peter Zijlstra , Steven Rostedt , Juri Lelli , Ingo Molnar , Xunlei Pang Subject: [RFC PATCH RESEND 0/4] Maintain the FIFO order for same priority RT tasks Date: Mon, 27 Apr 2015 14:48:34 +0800 Message-Id: <1430117318-2080-1-git-send-email-xlpang@126.com> X-Mailer: git-send-email 1.9.1 X-CM-TRANSID: C8mowEC5g0PH2z1VRqg4AA--.10035S2 X-Coremail-Antispam: 1Uf129KBjvJXoW3Ar47CFWkZr15KF48Ar43Wrg_yoW7Wryrpr yrG3y7tw4kAwsrt343Xr4xXr13ZFy8XF43Jrn7t34jyw45WF409ryFqF9IvFyxAFs7XF4q qanFv393G3yDZFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07jk8n5UUUUU= X-Originating-IP: [210.21.223.3] X-CM-SenderInfo: p0ost0bj6rjloofrz/1tbiWxXov1PM8KDYowABss Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5715 Lines: 119 From: Xunlei Pang 1. TWO FIFO QUEUES We know, there are two main queues each cpu for RT SMP scheduler. Let's name them "run queue" and "pushable queue" respectively. "run queue" is from the intra-cpu's perspective, while "pushable queue" is from the inter-cpu's perspective. "pushable queue" also has a very close relationship with task affinity. The two queues separately maintain their FIFO order for tasks queued at the same priority. If we got a wrong FIFO order of any of the two queues, we say it broke FIFO. 2. THE WAY THEY WORK Currently, when an rt task gets queued, it can be put to the head or tail of its "run queue" at the same priority according to different scenarios. Further if it is migratable, it will also and always be put to the tail of its "pushable queue" at the same priority because "plist" is used to manage this queue in strict FIFO order. a) If a task got enqueued or dequeued, it got added to or removed from the "run queue", and also "pushable queue"(added to tail) if it is migratable. b) If a runnable task acquires cpu(current task), it stays in the front of the "run queue", but got removed from the "pushable queue". c) If a runnable task releases cpu, it stays in the "run queue"(if it releases cpu involuntarily, stays just behind new current in this queue; if it releases cpu voluntarily like yield, usually was requeued behind other tasks queued at the same priority in this queue), and got added to the tail of its "pushable queue" at the same priority if it is migratable. d) If a tasks's affinity is changed from one cpu to multiple cpus, it got added to the tail of "pushable queue" at the same priority. e) If a tasks's affinity is changed from multiple cpus to one cpu, it got removed from the "pushable queue". 3. PROBLEMATIC CASES Unfortunately, in current kernel implementations, as far as I know, c) described in "2. THE WAY THEY WORK" have some cases that broke FIFO of one of the two queues for tasks queued at the same priority. - Case 1: current task gets preempted by a higher priority task current will be enqueued behind other tasks with the same priority in the "pushable queue", while it actually stays ahead of these tasks in the "run queue". Afterwards, if there comes a pull from other cpu or a push from local cpu, the task behind current in the "run queue" will be removed from the "pushable queue" and gets running, as the global rt scheduler fetches tasks from the head of the "pushable queue" to do the pulling or pushing. The kernel broke FIFO in this case. current task gets preempted by an equal priority task: This is done in check_preempt_equal_prio(), and can be divided into 3 sub cases. - Case 2 : p is the only one has the same priority as current. p preempts current successfully, but the kernel still had the same problem as in Case 1 during the preempting. - Case 3 : p is not the only one has the same priority as current, let's say q is already queued before p. so, in this case we should choose q to do the preempting, not p. So the kernel broke FIFO in this case. - Case 4 : p is going to preempt current, but when doing the actual preempting, current isn't pushed away due to the change of system status. so eventually p becomes the new current and gets running, while current is queued back waiting in the same "run queue". Obviously, the kernel broke FIFO in this case. NOTE: when a task's affinity is changed and it becomes migratable(see d) described in "2. THE WAY THEY WORK"), in current kernel implementation it will be queued behind other tasks with the same priority in the "pushable queue". This may seems contratry to the corresponding order in the "run queue". But from "pushable queue"'s prospective, when a task's affinity is changed and further got removed from or added to the "pushable queue" because of that, it means requeuing. In my view, I don't think this broke FIFO of the "pushable queue". Any discussion? But for "case 1" and "case 2", the task eventually wasn't got removed from or added to the "pushable queue" due to affinity changing, so its "pushable queue" FIFO order should also stays unchanged, and must be the same as its "run queue" order. So we say "case 1" and "case 2" broke FIFO of its "pushable queue". 4. SUMMARY case 1 : broke its "pushable queue" FIFO order. case 2 : broke its "pushable queue" FIFO order. case 3 : broke its "run queue" FIFO order. case 4 : broke its "run queue" FIFO order. 5. PATCHSET Patch 1 : Fix "Case 3". Patch 2 : Prerequisite for Patch 3 - Provide plist_add_head() for "pushable queue". Patch 3 : Fix "Case 1" and "Case 2". Patch 4 : Fix "Case 4". Xunlei Pang (4): sched/rt: Modify check_preempt_equal_prio() for multiple tasks queued at the same priority lib/plist: Provide plist_add_head() for nodes with the same prio sched/rt: Fix wrong SMP scheduler behavior for equal prio cases sched/rt: Requeue p back if the preemption initiated by check_preempt_equal_prio_common() failed include/linux/plist.h | 34 ++++++- include/linux/sched.h | 5 + include/linux/sched/rt.h | 16 +++ kernel/sched/core.c | 6 +- kernel/sched/rt.c | 253 ++++++++++++++++++++++++++++++++++++++++------- lib/plist.c | 28 +++++- 6 files changed, 299 insertions(+), 43 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/