Received: by 2002:a05:6358:700f:b0:131:369:b2a3 with SMTP id 15csp2049707rwo; Thu, 3 Aug 2023 04:05:55 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG0jB/w1iz544mxCP9KOsn6Te4rO1PuhxykregW54smwPakEtOKDW3XP7hyiR+PGpj7nG3N X-Received: by 2002:a50:ed8a:0:b0:523:1091:9f9e with SMTP id h10-20020a50ed8a000000b0052310919f9emr437599edr.20.1691060755269; Thu, 03 Aug 2023 04:05:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691060755; cv=none; d=google.com; s=arc-20160816; b=Iziw49yHi5eD7v/CowenuT4HNiTWXMbJfoR0XltaXqKWGWfciWo1Nlxzu/cw5ZQtcX U4rX/+5PqsOkYHNHm/1qYPrua3y4iDd8adMEy71h2D7OKxBBY/YEgSZLbtzY/87I2uVf 583TKHywN6A53sQztvlktovaWjmgNfToXCsmEE/vX/CQsB3gPlwwJfyHlEPXWEbR/tWr ESqczMtrXqN6e3DLq3SCa0mxAPuA05WKIL9Ywegi/rrVbwTASyAHdC3oZ4VWDFBUnx7s IBzgbSmGB/c5gUOb5fFNz/5EZliBUaCZAkztcCVaaxrPn384yTDNufr0iOAZVDYn4lQD Hkwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=4VvfRqkA/2aEhc9dKuxRzS9pqO7utTqYV434jiMO3hU=; fh=UpFtFCzii5ykkGoXq8QCM+IXKNgpf9xKHjNRjBS9wMU=; b=d8B5iCCMupfOWuOyXRKWdqAmiRlo4WFh1po+cNZqxB+JMEig9CoL11ucVrxirHWLSe EPbbv4j9hqJ+slgqIYiatc6KyvHPrFOdA75XZd4v10blPAaZVhhy44fD5JgCNxG0eK6S acAyKUa7WWZAjTk1StS8NduxMn1IbJJV9C8aQjcZitiIpgDIBEcr0UqN1uBf5SJPMVp/ 0OHNz7p4xnYrO9uiLQfpWnnHOJgGb30gahEsiwtVkB8TdS7gUzRgJW8dYyisAH0BKzBj zA5WmEFoPIPPWMtSucCTSApY2Bo2yt1W89rDYc2UR9EiMQzjx7tKjICLTvHdanKKDmjU z3AA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x8-20020aa7d388000000b00522584a0485si103311edq.315.2023.08.03.04.05.29; Thu, 03 Aug 2023 04:05:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234868AbjHCJ3m (ORCPT + 99 others); Thu, 3 Aug 2023 05:29:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59216 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229792AbjHCJ3g (ORCPT ); Thu, 3 Aug 2023 05:29:36 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B15BDEE; Thu, 3 Aug 2023 02:29:34 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4498561D10; Thu, 3 Aug 2023 09:29:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0E1BBC433C8; Thu, 3 Aug 2023 09:29:31 +0000 (UTC) Date: Thu, 3 Aug 2023 05:29:30 -0400 From: Steven Rostedt To: Ze Gao Cc: Adrian Hunter , Alexander Shishkin , Arnaldo Carvalho de Melo , Ian Rogers , Ingo Molnar , Jiri Olsa , Mark Rutland , Masami Hiramatsu , Namhyung Kim , Peter Zijlstra , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-trace-kernel@vger.kernel.org, Ze Gao Subject: Re: [RFC PATCH v6 4/5] sched, tracing: add to report task state in symbolic chars Message-ID: <20230803052930.33337082@gandalf.local.home> In-Reply-To: <20230803083352.1585-5-zegao@tencent.com> References: <20230803083352.1585-1-zegao@tencent.com> <20230803083352.1585-5-zegao@tencent.com> X-Mailer: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 3 Aug 2023 04:33:51 -0400 Ze Gao wrote: > Internal representations of task state are likely to be changed > or ordered, and reporting them to userspace without exporting > them as part of API is basically wrong, which can easily break > a userspace observability tool as kernel evolves. For example, > perf suffers from this and still reports wrong states as of this > writing. > > OTOH, some masqueraded states like TASK_REPORT_IDLE and > TASK_REPORT_MAX are also reported inadvertently, which confuses > things even more and most userspace tools do not even take them > into consideration. > > So add a new variable in company with the old raw value to > report task state in symbolic chars, which are self-explaining > and no further translation is needed. Of course this does not > break any userspace tool. > > Note for PREEMPT_ACTIVE, we introduce 'p' to report it and use > the old conventions for the rest. The above is actually good. > > Signed-off-by: Ze Gao > Reviewed-by: Masami Hiramatsu (Google) > Acked-by: Ian Rogers > --- > include/trace/events/sched.h | 44 ++++++++++++++++++++++-------------- > 1 file changed, 27 insertions(+), 17 deletions(-) > > diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h > index 43492daefa6f..ae5b486cc969 100644 > --- a/include/trace/events/sched.h > +++ b/include/trace/events/sched.h > @@ -6,6 +6,7 @@ > #define _TRACE_SCHED_H > > #include > +#include > #include > #include > #include > @@ -214,6 +215,27 @@ static inline short __trace_sched_switch_state(bool preempt, > > return state ? (1 << (state - 1)) : state; > } > + > +static inline char __trace_sched_switch_state_char(bool preempt, > + unsigned int prev_state, > + struct task_struct *p) > +{ > + long state; > + > +#ifdef CONFIG_SCHED_DEBUG > + WARN_ON_ONCE(p != current); > +#endif /* CONFIG_SCHED_DEBUG */ > + > + /* > + * For PREEMPT_ACTIVE, we introduce 'p' to report it and use the old > + * conventions for the rest. > + */ > + if (preempt) > + return 'p'; > + > + state = __task_state_index(prev_state, p->exit_state); > + return task_index_to_char(state); > +} > #endif /* CREATE_TRACE_POINTS */ > > /* > @@ -236,6 +258,7 @@ TRACE_EVENT(sched_switch, > __array( char, prev_comm, TASK_COMM_LEN ) > __array( char, next_comm, TASK_COMM_LEN ) > __field( short, prev_state ) > + __field( char, prev_state_char ) > ), > > TP_fast_assign( > @@ -246,26 +269,13 @@ TRACE_EVENT(sched_switch, > memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN); > memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN); > __entry->prev_state = __trace_sched_switch_state(preempt, prev_state, prev); > + __entry->prev_state_char = __trace_sched_switch_state_char(preempt, prev_state, prev); > /* XXX SCHED_DEADLINE */ > ), > > - TP_printk("prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s ==> next_comm=%s next_pid=%d next_prio=%d", > - __entry->prev_comm, __entry->prev_pid, __entry->prev_prio, > - > - (__entry->prev_state & (TASK_REPORT_MAX - 1)) ? > - __print_flags(__entry->prev_state & (TASK_REPORT_MAX - 1), "|", > - { TASK_INTERRUPTIBLE, "S" }, > - { TASK_UNINTERRUPTIBLE, "D" }, > - { __TASK_STOPPED, "T" }, > - { __TASK_TRACED, "t" }, > - { EXIT_DEAD, "X" }, > - { EXIT_ZOMBIE, "Z" }, > - { TASK_PARKED, "P" }, > - { TASK_DEAD, "I" }) : > - "R", I just realized, I have user space code that looks at this. As in the format file we have: print fmt: "prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s ==> next_comm=%s next_pid=%d next_prio=%d", REC->prev_comm, REC->prev_pid, REC->prev_prio, (REC->prev_state & ((((0x00000000 | 0x00000001 | 0x00000002 | 0x00000004 | 0x00000008 | 0x00000010 | 0x00000020 | 0x00000040) + 1) << 1) - 1)) ? __print_flags(REC->prev_state & ((((0x00000000 | 0x00000001 | 0x00000002 | 0x00000004 | 0x00000008 | 0x00000010 | 0x00000020 | 0x00000040) + 1) << 1) - 1), "|", { 0x00000001, "S" }, { 0x00000002, "D" }, { 0x00000004, "T" }, { 0x00000008, "t" }, { 0x00000010, "X" }, { 0x00000020, "Z" }, { 0x00000040, "P" }, { 0x00000080, "I" }) : "R", REC->prev_state & (((0x00000000 | 0x00000001 | 0x00000002 | 0x00000004 | 0x00000008 | 0x00000010 | 0x00000020 | 0x00000040) + 1) << 1) ? "+" : "", REC->next_comm, REC->next_pid, REC->next_prio And I have used this in applications to find out what values "S" and "D" are. So, we need to keep that still. But we can add the prev_state_char to the output too. "prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s[%c] ==> next_comm=%s next_pid=%d next_prio=%d" > - > - __entry->prev_state & TASK_REPORT_MAX ? "+" : "", > - __entry->next_comm, __entry->next_pid, __entry->next_prio) > + TP_printk("prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%c ==> next_comm=%s next_pid=%d next_prio=%d", > + __entry->prev_comm, __entry->prev_pid, __entry->prev_prio, __entry->prev_state_char, __entry->next_comm, > + __entry->next_pid, __entry->next_prio) > ); > > /* -- Steve