Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp347950pxa; Fri, 21 Aug 2020 08:47:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyhZWRpDgAMOdZQRmYwKczfQHUjUQG+fqdGdL44bxp80SpPXyYqv4EYeLlIZBQuKKBvzXZX X-Received: by 2002:a17:906:2717:: with SMTP id z23mr3481811ejc.19.1598024833015; Fri, 21 Aug 2020 08:47:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1598024833; cv=none; d=google.com; s=arc-20160816; b=TNNX7rGFIBPNGVnOmUKWvf/yHzu8jABgk5qhtpfWhEWTv9RudcpJs9Dv0C8OlZH9f8 IFsUeYpBuMyX1uy5I/VjRlnHwjEQVASrPY1VHnl+L5Zqd2Udw4nHya2hiB/zTULWxGvu iCyMCGiJ3o49FiPAI9grCFlt/xlF5ocqgou6WJEj6xCb6tOuYu9Lcinkeu0UI8muBYD5 KwyZQx0rjbVNsOmsGwPDBPIzz2fn5vOYcclCzoMnGycAwyPcT1DApD2jQm5nVKJURwmT wdU6TGOgwZtYSTcFN6Hnz5olQyPGV4WEkIi2d5ahfVs4jnwrGv/Ol0Af1HMtIXnSrnfY sCMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:subject:cc :to:from; bh=+43FBFNVk74yQrgiv8+seyAblOYLv6gsto9N57Szhqo=; b=bCTFPEL8y9nQQ7i5TD2FEL0lYnUTbfy5xIPdk3tw9rKYrvgI0+Rr1THKE+YRG5bt7g tdrNAAXNf+0nkNFjY26vziw+9OmyDySekgfoe3u+CWIK+uxaaSbL+JVaq//4eFhS/jHS jCQCOsHYHeWoOetGLNDvK4EtjaLITp6wxVOPtbnpepaoZ57+fMbg6LQxbrwGJpsph/95 UG81HHJbrvxTAYpaBQTIadg4kBepW/ErNoqRjdYgnKMYxIKlzjjHqQQ6E34fCZgdbnO5 yCHTFmBZX9Rkk4v6o96RpxwcYBH4DHNdaCCrNTaDxLfrQeNea3cvvspM42FnC5mFvIQo PSVw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y16si1431586ejk.128.2020.08.21.08.46.49; Fri, 21 Aug 2020 08:47:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727887AbgHUPqO (ORCPT + 99 others); Fri, 21 Aug 2020 11:46:14 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:10253 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725828AbgHUPqL (ORCPT ); Fri, 21 Aug 2020 11:46:11 -0400 Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 54A0117A4900C0881F6B; Fri, 21 Aug 2020 23:46:07 +0800 (CST) Received: from huawei.com (10.151.151.249) by DGGEMS409-HUB.china.huawei.com (10.3.19.209) with Microsoft SMTP Server id 14.3.487.0; Fri, 21 Aug 2020 23:45:57 +0800 From: Dongjiu Geng To: , , , , , , , , , , , CC: , Subject: [PATCH v2] sched: Add trace for task wake up latency and leave running time Date: Sat, 22 Aug 2020 00:08:14 +0000 Message-ID: <20200822000814.33189-1-gengdongjiu@huawei.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.151.151.249] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 1) The perf event sched_switch is used to record scheduling tracks, which consumes a large amount of CPU resources and disk space. Therefore, a mechanism is required to record only the scheduling delay greater than a certain threshold. This patch supports this feature. For example, if the threshold is 10 ms, you can run the following command to only record the scheduling track with the scheduling delay greater than 10 ms. echo 'time > 10000000' > /sys/kernel/debug/tracing/events/sched/sched_wakeup_latency/filter echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup_latency/enable 2) Similarly, when tune performance, we usually want to know which tasks are not running for a long time that is greater than a certain threshold, Use the following commands can easily implement it. echo 'time > 10000000' > /sys/kernel/debug/tracing/events/sched/sched_leave_running_time/filter echo 1 > /sys/kernel/debug/tracing/events/sched/sched_leave_running_time/enable Signed-off-by: Dongjiu Geng --- include/linux/sched.h | 7 ++++ include/trace/events/sched.h | 66 ++++++++++++++++++++++++++++++++++++ kernel/sched/core.c | 20 +++++++++++ 3 files changed, 93 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index 93ecd930efd3..edb622c40a90 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1324,6 +1324,13 @@ struct task_struct { /* CPU-specific state of this task: */ struct thread_struct thread; + /* Task wake up time stamp */ + u64 ts_wakeup; + /* Previous task switch out time stamp */ + u64 pre_ts_end; + /* Next task switch in time stamp */ + u64 next_ts_start; + bool wakeup_state; /* * WARNING: on x86, 'thread_struct' contains a variable-sized * structure. It *MUST* be at the end of 'task_struct'. diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index fec25b9cfbaf..e99c6d573a42 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -183,6 +183,72 @@ TRACE_EVENT(sched_switch, __entry->next_comm, __entry->next_pid, __entry->next_prio) ); +DECLARE_EVENT_CLASS(sched_latency_template, + + TP_PROTO(bool preempt, + struct task_struct *prev, + struct task_struct *next, + u64 time), + + TP_ARGS(preempt, prev, next, time), + + TP_STRUCT__entry( + __array( char, prev_comm, TASK_COMM_LEN ) + __field( pid_t, prev_pid ) + __field( int, prev_prio ) + __field( long, prev_state ) + __array( char, next_comm, TASK_COMM_LEN ) + __field( pid_t, next_pid ) + __field( int, next_prio ) + __field( u64, time ) + ), + + TP_fast_assign( + memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN); + __entry->prev_pid = prev->pid; + __entry->prev_prio = prev->prio; + __entry->prev_state = __trace_sched_switch_state(preempt, prev); + memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN); + __entry->next_pid = next->pid; + __entry->next_prio = next->prio; + __entry->time = time; + /* XXX SCHED_DEADLINE */ + ), + + TP_printk("prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s ==> next_comm=%s next_pid=%d next_prio=%d passed time=%llu (ns)", + __entry->prev_comm, __entry->prev_pid, __entry->prev_prio, + + (__entry->prev_state & (TASK_REPORT_MAX - 1)) ? + __print_flags(__entry->prev_state & (TASK_REPORT_MAX - 1), "|", + { TASK_INTERRUPTIBLE, "S" }, + { TASK_UNINTERRUPTIBLE, "D" }, + { __TASK_STOPPED, "T" }, + { __TASK_TRACED, "t" }, + { EXIT_DEAD, "X" }, + { EXIT_ZOMBIE, "Z" }, + { TASK_PARKED, "P" }, + { TASK_DEAD, "I" }) : + "R", + + __entry->prev_state & TASK_REPORT_MAX ? "+" : "", + __entry->next_comm, __entry->next_pid, __entry->next_prio, + __entry->time) +); + +DEFINE_EVENT(sched_latency_template, sched_wakeup_latency, + TP_PROTO(bool preempt, + struct task_struct *prev, + struct task_struct *next, + u64 time), + TP_ARGS(preempt, prev, next, time)); + +DEFINE_EVENT(sched_latency_template, sched_leave_running_time, + TP_PROTO(bool preempt, + struct task_struct *prev, + struct task_struct *next, + u64 time), + TP_ARGS(preempt, prev, next, time)); + /* * Tracepoint for a task being migrated: */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 8471a0f7eb32..b5a1928dc948 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2464,6 +2464,8 @@ static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags, { check_preempt_curr(rq, p, wake_flags); p->state = TASK_RUNNING; + p->ts_wakeup = local_clock(); + p->wakeup_state = true; trace_sched_wakeup(p); #ifdef CONFIG_SMP @@ -2846,6 +2848,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) success = 1; trace_sched_waking(p); p->state = TASK_RUNNING; + p->ts_wakeup = local_clock(); + p->wakeup_state = true; trace_sched_wakeup(p); goto out; } @@ -3299,6 +3303,9 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) plist_node_init(&p->pushable_tasks, MAX_PRIO); RB_CLEAR_NODE(&p->pushable_dl_tasks); #endif + p->ts_wakeup = 0; + p->wakeup_state = false; + p->pre_ts_end = p->next_ts_start = 0; return 0; } @@ -3355,6 +3362,8 @@ void wake_up_new_task(struct task_struct *p) post_init_entity_util_avg(p); activate_task(rq, p, ENQUEUE_NOCLOCK); + p->ts_wakeup = local_clock(); + p->wakeup_state = true; trace_sched_wakeup_new(p); check_preempt_curr(rq, p, WF_FORK); #ifdef CONFIG_SMP @@ -4521,8 +4530,19 @@ static void __sched notrace __schedule(bool preempt) psi_sched_switch(prev, next, !task_on_rq_queued(prev)); + next->next_ts_start = prev->pre_ts_end = local_clock(); + trace_sched_switch(preempt, prev, next); + if (next->ts_wakeup && next->wakeup_state) { + trace_sched_wakeup_latency(preempt, prev, next, + next->next_ts_start - next->ts_wakeup); + next->wakeup_state = false; + } + + trace_sched_leave_running_time(preempt, prev, + next, next->next_ts_start - next->pre_ts_end); + /* Also unlocks the rq: */ rq = context_switch(rq, prev, next, &rf); } else { -- 2.17.1