Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp342755pxa; Fri, 21 Aug 2020 08:38:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw8ip5VikdrruY1/Kk8X+MXnA8iPVOyLZOO/esYcKPUH2ChrElDpIB+xKpYG7E7rktMOjji X-Received: by 2002:a50:c089:: with SMTP id k9mr3340790edf.110.1598024334045; Fri, 21 Aug 2020 08:38:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1598024334; cv=none; d=google.com; s=arc-20160816; b=Y6kHjKvXDf8ZJL7QDtVG/9onA7R4y28vRhzFh3lpJwjG/9EV2GnVIGNgbCraGQxj21 eyZCGekubJBmsJ1KejlLMtM++zOk+Hv3QPFBRLE75FxOCTdLlVvkHpuYoZEEGb7yJIms nZZM18psikPudYmZF8ZiNL/TA4VFF4PCmvDlW22YMaIayfrp33shHv3458NDhgVV8q2O BGcAOoTknLPT6avSG6GLFw5sTejUwm6eMHuooBHt2gOTKV0Fo6veM1+Qnhes2rmSppHL ITWZvx9iWkZPkEo3k9MA/kyjUPKsjjSUSsQcLcv0jhAe3KwM4pSMG/BJSBB7ELyD5nQJ HfCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:subject:cc :to:from; bh=Q14CCqlpn/5RGsAIc05mfbInrseVyDbxMZFKkac/uVE=; b=vKdRiHCoUOpeEp+7K8n6PE4U4inW6PmnyNp8NZXRXypxEqdgow1SVmKop+LYJyTTLN Z54Xg6N75d1iFifTRxP8Y60zuvN0+dsp4KZaHzQrmIBxwg9Y1aYvwwN1AxxCcomLslt+ XtdMpPn8oIH7S351lXqvrVU5Gyq19GpCw1XrTM7YIOrplhHPVGloT7hYIBED0BeoRVdP tyuJDLX8v9ZdhtzxXKljM7rWZ2w4WacbkcsIQk09KZ5DuTbNgT8J5LkLN33chMPlA8V9 cC4wL5LwRryT21KEs9rDkjc6Vk8YJKYbf4NQ/eyJqqhwXSVunW/gh1Bdb21MfWMFTInK PPng== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id qc14si1559828ejb.714.2020.08.21.08.38.30; Fri, 21 Aug 2020 08:38:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728174AbgHUPhd (ORCPT + 99 others); Fri, 21 Aug 2020 11:37:33 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:10252 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728020AbgHUPha (ORCPT ); Fri, 21 Aug 2020 11:37:30 -0400 Received: from DGGEMS408-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id A82DCE63793C753BE110; Fri, 21 Aug 2020 23:37:20 +0800 (CST) Received: from huawei.com (10.151.151.249) by DGGEMS408-HUB.china.huawei.com (10.3.19.208) with Microsoft SMTP Server id 14.3.487.0; Fri, 21 Aug 2020 23:37:11 +0800 From: Dongjiu Geng To: , , , , , , , , , , , CC: , Subject: [PATCH] sched: Add trace for task wake up latency and leave running time Date: Fri, 21 Aug 2020 23:59:28 +0000 Message-ID: <20200821235928.32727-1-gengdongjiu@huawei.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.151.151.249] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 1) The perf event sched_switch is used to record scheduling tracks, which consumes a large amount of CPU resources and disk space. Therefore, a mechanism is required to record only the scheduling delay greater than a certain threshold. This patch supports this feature. For example, if the threshold is 10 ms, you can run the following command to only record the scheduling track with the scheduling delay greater than 10 ms. echo 'time > 10000000' > /sys/kernel/debug/tracing/events/sched/sched_wakeup_latency/filter echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup_latency/enable 2) Similarly, when tune performance, we usually want to know which tasks are not running for a long time, Use the following methods to save CPU overhead and storage space. echo 'time > 10000000' > /sys/kernel/debug/tracing/events/sched/sched_leave_running_time/filter echo 1 > /sys/kernel/debug/tracing/events/sched/sched_leave_running_time/enable Signed-off-by: Dongjiu Geng --- include/linux/sched.h | 7 ++++ include/trace/events/sched.h | 66 ++++++++++++++++++++++++++++++++++++ kernel/sched/core.c | 20 +++++++++++ 3 files changed, 93 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index 93ecd930efd3..edb622c40a90 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1324,6 +1324,13 @@ struct task_struct { /* CPU-specific state of this task: */ struct thread_struct thread; + /* Task wake up time stamp */ + u64 ts_wakeup; + /* Previous task switch out time stamp */ + u64 pre_ts_end; + /* Next task switch in time stamp */ + u64 next_ts_start; + bool wakeup_state; /* * WARNING: on x86, 'thread_struct' contains a variable-sized * structure. It *MUST* be at the end of 'task_struct'. diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index fec25b9cfbaf..e99c6d573a42 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -183,6 +183,72 @@ TRACE_EVENT(sched_switch, __entry->next_comm, __entry->next_pid, __entry->next_prio) ); +DECLARE_EVENT_CLASS(sched_latency_template, + + TP_PROTO(bool preempt, + struct task_struct *prev, + struct task_struct *next, + u64 time), + + TP_ARGS(preempt, prev, next, time), + + TP_STRUCT__entry( + __array( char, prev_comm, TASK_COMM_LEN ) + __field( pid_t, prev_pid ) + __field( int, prev_prio ) + __field( long, prev_state ) + __array( char, next_comm, TASK_COMM_LEN ) + __field( pid_t, next_pid ) + __field( int, next_prio ) + __field( u64, time ) + ), + + TP_fast_assign( + memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN); + __entry->prev_pid = prev->pid; + __entry->prev_prio = prev->prio; + __entry->prev_state = __trace_sched_switch_state(preempt, prev); + memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN); + __entry->next_pid = next->pid; + __entry->next_prio = next->prio; + __entry->time = time; + /* XXX SCHED_DEADLINE */ + ), + + TP_printk("prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s ==> next_comm=%s next_pid=%d next_prio=%d passed time=%llu (ns)", + __entry->prev_comm, __entry->prev_pid, __entry->prev_prio, + + (__entry->prev_state & (TASK_REPORT_MAX - 1)) ? + __print_flags(__entry->prev_state & (TASK_REPORT_MAX - 1), "|", + { TASK_INTERRUPTIBLE, "S" }, + { TASK_UNINTERRUPTIBLE, "D" }, + { __TASK_STOPPED, "T" }, + { __TASK_TRACED, "t" }, + { EXIT_DEAD, "X" }, + { EXIT_ZOMBIE, "Z" }, + { TASK_PARKED, "P" }, + { TASK_DEAD, "I" }) : + "R", + + __entry->prev_state & TASK_REPORT_MAX ? "+" : "", + __entry->next_comm, __entry->next_pid, __entry->next_prio, + __entry->time) +); + +DEFINE_EVENT(sched_latency_template, sched_wakeup_latency, + TP_PROTO(bool preempt, + struct task_struct *prev, + struct task_struct *next, + u64 time), + TP_ARGS(preempt, prev, next, time)); + +DEFINE_EVENT(sched_latency_template, sched_leave_running_time, + TP_PROTO(bool preempt, + struct task_struct *prev, + struct task_struct *next, + u64 time), + TP_ARGS(preempt, prev, next, time)); + /* * Tracepoint for a task being migrated: */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 8471a0f7eb32..b5a1928dc948 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2464,6 +2464,8 @@ static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags, { check_preempt_curr(rq, p, wake_flags); p->state = TASK_RUNNING; + p->ts_wakeup = local_clock(); + p->wakeup_state = true; trace_sched_wakeup(p); #ifdef CONFIG_SMP @@ -2846,6 +2848,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) success = 1; trace_sched_waking(p); p->state = TASK_RUNNING; + p->ts_wakeup = local_clock(); + p->wakeup_state = true; trace_sched_wakeup(p); goto out; } @@ -3299,6 +3303,9 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) plist_node_init(&p->pushable_tasks, MAX_PRIO); RB_CLEAR_NODE(&p->pushable_dl_tasks); #endif + p->ts_wakeup = 0; + p->wakeup_state = false; + p->pre_ts_end = p->next_ts_start = 0; return 0; } @@ -3355,6 +3362,8 @@ void wake_up_new_task(struct task_struct *p) post_init_entity_util_avg(p); activate_task(rq, p, ENQUEUE_NOCLOCK); + p->ts_wakeup = local_clock(); + p->wakeup_state = true; trace_sched_wakeup_new(p); check_preempt_curr(rq, p, WF_FORK); #ifdef CONFIG_SMP @@ -4521,8 +4530,19 @@ static void __sched notrace __schedule(bool preempt) psi_sched_switch(prev, next, !task_on_rq_queued(prev)); + next->next_ts_start = prev->pre_ts_end = local_clock(); + trace_sched_switch(preempt, prev, next); + if (next->ts_wakeup && next->wakeup_state) { + trace_sched_wakeup_latency(preempt, prev, next, + next->next_ts_start - next->ts_wakeup); + next->wakeup_state = false; + } + + trace_sched_leave_running_time(preempt, prev, + next, next->next_ts_start - next->pre_ts_end); + /* Also unlocks the rq: */ rq = context_switch(rq, prev, next, &rf); } else { -- 2.17.1