Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp710738rwr; Thu, 20 Apr 2023 05:35:26 -0700 (PDT) X-Google-Smtp-Source: AKy350acrVcQjA6QN5fLP5Q5/HMgQIFiz46u4RRia1aGD7PArk7AIvhHXVNHCLBtkyav4IRWJFG0 X-Received: by 2002:a17:903:2309:b0:1a1:ee8c:eef5 with SMTP id d9-20020a170903230900b001a1ee8ceef5mr1781766plh.7.1681994126255; Thu, 20 Apr 2023 05:35:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681994126; cv=none; d=google.com; s=arc-20160816; b=uJNJC+b6/V7Gji5CKPtouEU4/RFRJrtR8Kc7HXrM7WXrjQWq4Z7D1RPvnwo1jkGtsb YVuzDIrKca15uGsE9PiJG/SVfQv7iwLK5hCmP6W3qSNN8iVcQc8yQLwtiTtytzOnXeha /XiX17szA+xPU1I3dMHXrzhOak7I9YWhqfqDxXoLokcyDLps5ob3PCJNmqTP+EVfgoVm 5JqHR/mwQnMd+ZiwszRPChOJ9ut92YQhiWUWBus+htwAYA7CHftCUVEgymsm4p+we47F enA8MCrVdw2FIwFq2oLdQr29QUmU5yMqoB4M6+YbR1sToERxueQBoB6Y/kdIPSBriBz/ Uo3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=Odalr8bRlKOlEX6GL0GR9fUjN3uRbZQSkln9xW9DeVA=; b=xkhjBuZkODeTUtlwDUXYjQSap7ntHf5gRKGk29DTTECSAVa6bskWC/b+bbF11J+Uv2 wSZ/URijHgDGmKm0MUnbViAuaAq6ndL3QgIfTy04U75JpaGbvUHymm4jbsfQVSJQbGFA 0eaxE8pycn278vfjB5gn7ltLciv1msja8NQNNcNqzheES3QfOx+IT5EztWYYe4uyWSMx k/xkKn/d93l6yMxv5iv0KTpQMYCOmajWNjALS+Jr9Qz714ZVRTa83csnknyRQ7w4a2Md SzooSFXEd+7CyJYb8kGwuRa0oq7lJx9aav5uEvcf29xFCzRh8vb2nRaVp8IKc1kFgSuj QjDQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b2-20020a170902650200b0019ea8e6213esi1680557plk.102.2023.04.20.05.35.06; Thu, 20 Apr 2023 05:35:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229775AbjDTMb0 (ORCPT + 99 others); Thu, 20 Apr 2023 08:31:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229913AbjDTMbZ (ORCPT ); Thu, 20 Apr 2023 08:31:25 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1C69E6A49 for ; Thu, 20 Apr 2023 05:31:02 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 78DC61480; Thu, 20 Apr 2023 05:31:45 -0700 (PDT) Received: from [192.168.1.3] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 359263F587; Thu, 20 Apr 2023 05:31:00 -0700 (PDT) Message-ID: <91ba66e7-737f-6526-a703-a755e114f9d4@arm.com> Date: Thu, 20 Apr 2023 13:30:59 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Subject: Re: [PATCH] perf cs-etm: Add support for coresight trace for any range of CPUs Content-Language: en-US To: Ganapatrao Kulkarni Cc: mathieu.poirier@linaro.org, acme@kernel.org, darren@os.amperecomputing.com, scott@os.amperecomputing.com, scclevenger@os.amperecomputing.com, linux-kernel@vger.kernel.org, coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org, mike.leach@linaro.org, suzuki.poulose@arm.com References: <20230419172101.78638-1-gankulkarni@os.amperecomputing.com> <84eb3363-2ef8-d3f1-4613-805959dbf334@os.amperecomputing.com> From: James Clark In-Reply-To: <84eb3363-2ef8-d3f1-4613-805959dbf334@os.amperecomputing.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 20/04/2023 12:47, Ganapatrao Kulkarni wrote: > > Hi James, > > On 20-04-2023 03:13 pm, James Clark wrote: >> >> >> On 19/04/2023 18:21, Ganapatrao Kulkarni wrote: >>> The current implementation supports coresight trace for a range of >>> CPUs, if the first CPU is CPU0. >>> >>> Adding changes to enable coresight trace for any range of CPUs by >>> decoding the first CPU also from the header. >>> Later, first CPU id is used instead of CPU0 across the decoder >>> functions. >>> >>> Signed-off-by: Ganapatrao Kulkarni >>> --- >>>   .../perf/util/cs-etm-decoder/cs-etm-decoder.c |  4 +- >>>   .../perf/util/cs-etm-decoder/cs-etm-decoder.h |  3 +- >>>   tools/perf/util/cs-etm.c                      | 62 ++++++++++++------- >>>   3 files changed, 42 insertions(+), 27 deletions(-) >>> >>> diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c >>> b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c >>> index 82a27ab90c8b..41ab299b643b 100644 >>> --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c >>> +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c >>> @@ -724,7 +724,7 @@ cs_etm_decoder__create_etm_decoder(struct >>> cs_etm_decoder_params *d_params, >>>   } >>>     struct cs_etm_decoder * >>> -cs_etm_decoder__new(int decoders, struct cs_etm_decoder_params >>> *d_params, >>> +cs_etm_decoder__new(int first_decoder, int decoders, struct >>> cs_etm_decoder_params *d_params, >>>               struct cs_etm_trace_params t_params[]) >>>   { >>>       struct cs_etm_decoder *decoder; >>> @@ -769,7 +769,7 @@ cs_etm_decoder__new(int decoders, struct >>> cs_etm_decoder_params *d_params, >>>       /* init raw frame logging if required */ >>>       cs_etm_decoder__init_raw_frame_logging(d_params, decoder); >>>   -    for (i = 0; i < decoders; i++) { >>> +    for (i = first_decoder; i < decoders; i++) { >>>           ret = cs_etm_decoder__create_etm_decoder(d_params, >>>                                &t_params[i], >>>                                decoder); >>> diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h >>> b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h >>> index 92a855fbe5b8..b06193fc75b4 100644 >>> --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h >>> +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h >>> @@ -90,7 +90,8 @@ int cs_etm_decoder__process_data_block(struct >>> cs_etm_decoder *decoder, >>>                          size_t len, size_t *consumed); >>>     struct cs_etm_decoder * >>> -cs_etm_decoder__new(int num_cpu, >>> +cs_etm_decoder__new(int first_decoder, >>> +            int decoders, >>>               struct cs_etm_decoder_params *d_params, >>>               struct cs_etm_trace_params t_params[]); >>>   diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c >>> index 94e2d02009eb..2619513ae088 100644 >>> --- a/tools/perf/util/cs-etm.c >>> +++ b/tools/perf/util/cs-etm.c >>> @@ -55,6 +55,8 @@ struct cs_etm_auxtrace { >>>       u8 has_virtual_ts; /* Virtual/Kernel timestamps in the trace. */ >>>         int num_cpu; >>> +    int first_cpu; >>> +    int last_cpu; >>>       u64 latest_kernel_timestamp; >>>       u32 auxtrace_type; >>>       u64 branches_sample_type; >>> @@ -638,14 +640,13 @@ static void cs_etm__set_trace_param_ete(struct >>> cs_etm_trace_params *t_params, >>>   } >>>     static int cs_etm__init_trace_params(struct cs_etm_trace_params >>> *t_params, >>> -                     struct cs_etm_auxtrace *etm, >>> -                     int decoders) >>> +                     struct cs_etm_auxtrace *etm) >>>   { >>>       int i; >>>       u32 etmidr; >>>       u64 architecture; >>>   -    for (i = 0; i < decoders; i++) { >>> +    for (i = etm->first_cpu; i < etm->last_cpu; i++) { >>>           architecture = etm->metadata[i][CS_ETM_MAGIC]; >>>             switch (architecture) { >>> @@ -817,7 +818,7 @@ static void cs_etm__free(struct perf_session >>> *session) >>>       /* Then the RB tree itself */ >>>       intlist__delete(traceid_list); >>>   -    for (i = 0; i < aux->num_cpu; i++) >>> +    for (i = aux->first_cpu; i < aux->last_cpu; i++) >>>           zfree(&aux->metadata[i]); >>>         thread__zput(aux->unknown_thread); >>> @@ -921,7 +922,8 @@ static struct cs_etm_queue >>> *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm, >>>        * Each queue can only contain data from one CPU when >>> unformatted, so only one decoder is >>>        * needed. >>>        */ >>> -    int decoders = formatted ? etm->num_cpu : 1; >>> +    int first_decoder = formatted ? etm->first_cpu : 0; >>> +    int decoders = first_decoder + (formatted ? etm->num_cpu : 1); >>>         etmq = zalloc(sizeof(*etmq)); >>>       if (!etmq) >>> @@ -937,7 +939,7 @@ static struct cs_etm_queue >>> *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm, >>>       if (!t_params) >>>           goto out_free; >>>   -    if (cs_etm__init_trace_params(t_params, etm, decoders)) >>> +    if (cs_etm__init_trace_params(t_params, etm)) >>>           goto out_free; >>>         /* Set decoder parameters to decode trace packets */ >>> @@ -947,8 +949,7 @@ static struct cs_etm_queue >>> *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm, >>>                       formatted)) >>>           goto out_free; >>>   -    etmq->decoder = cs_etm_decoder__new(decoders, &d_params, >>> -                        t_params); >>> +    etmq->decoder = cs_etm_decoder__new(first_decoder, decoders, >>> &d_params, t_params); >>>         if (!etmq->decoder) >>>           goto out_free; >>> @@ -2959,11 +2960,11 @@ static int cs_etm__queue_aux_records(struct >>> perf_session *session) >>>    * Loop through the ETMs and complain if we find at least one where >>> ts_source != 1 (virtual >>>    * timestamps). >>>    */ >>> -static bool cs_etm__has_virtual_ts(u64 **metadata, int num_cpu) >>> +static bool cs_etm__has_virtual_ts(u64 **metadata, struct >>> cs_etm_auxtrace *etm) >>>   { >>>       int j; >>>   -    for (j = 0; j < num_cpu; j++) { >>> +    for (j = etm->first_cpu; j < etm->last_cpu; j++) { >>>           switch (metadata[j][CS_ETM_MAGIC]) { >>>           case __perf_cs_etmv4_magic: >>>               if (HAS_PARAM(j, ETMV4, TS_SOURCE) || >>> metadata[j][CS_ETMV4_TS_SOURCE] != 1) >>> @@ -2982,13 +2983,14 @@ static bool cs_etm__has_virtual_ts(u64 >>> **metadata, int num_cpu) >>>   } >>>     /* map trace ids to correct metadata block, from information in >>> metadata */ >>> -static int cs_etm__map_trace_ids_metadata(int num_cpu, u64 **metadata) >>> +static int cs_etm__map_trace_ids_metadata(struct cs_etm_auxtrace *etm) >>>   { >>>       u64 cs_etm_magic; >>> +    u64 **metadata = etm->metadata; >>>       u8 trace_chan_id; >>>       int i, err; >>>   -    for (i = 0; i < num_cpu; i++) { >>> +    for (i = etm->first_cpu; i < etm->last_cpu; i++) { >>>           cs_etm_magic = metadata[i][CS_ETM_MAGIC]; >>>           switch (cs_etm_magic) { >>>           case __perf_cs_etmv3_magic: >>> @@ -3015,12 +3017,13 @@ static int cs_etm__map_trace_ids_metadata(int >>> num_cpu, u64 **metadata) >>>    * If we found AUX_HW_ID packets, then set any metadata marked as >>> unused to the >>>    * unused value to reduce the number of unneeded decoders created. >>>    */ >>> -static int cs_etm__clear_unused_trace_ids_metadata(int num_cpu, u64 >>> **metadata) >>> +static int cs_etm__clear_unused_trace_ids_metadata(struct >>> cs_etm_auxtrace *etm) >>>   { >>>       u64 cs_etm_magic; >>> +    u64 **metadata = etm->metadata; >>>       int i; >>>   -    for (i = 0; i < num_cpu; i++) { >>> +    for (i = etm->first_cpu; i < etm->last_cpu; i++) { >>>           cs_etm_magic = metadata[i][CS_ETM_MAGIC]; >>>           switch (cs_etm_magic) { >>>           case __perf_cs_etmv3_magic: >>> @@ -3049,7 +3052,7 @@ int cs_etm__process_auxtrace_info_full(union >>> perf_event *event, >>>       int event_header_size = sizeof(struct perf_event_header); >>>       int total_size = auxtrace_info->header.size; >>>       int priv_size = 0; >>> -    int num_cpu; >>> +    int num_cpu, first_cpu = 0, last_cpu; >>>       int err = 0; >>>       int aux_hw_id_found; >>>       int i, j; >>> @@ -3068,22 +3071,31 @@ int cs_etm__process_auxtrace_info_full(union >>> perf_event *event, >>>       /* First the global part */ >>>       ptr = (u64 *) auxtrace_info->priv; >>>       num_cpu = ptr[CS_PMU_TYPE_CPUS] & 0xffffffff; >>> -    metadata = zalloc(sizeof(*metadata) * num_cpu); >>> + >>> +    /* Start parsing after the common part of the header */ >>> +    i = CS_HEADER_VERSION_MAX; >>> + >>> +    /*Get CPU id of first event */ >>> +    first_cpu = ptr[i + CS_ETM_CPU]; >>> +    last_cpu = first_cpu + num_cpu; >>> + >>> +    if (first_cpu > cpu__max_cpu().cpu || >>> +            last_cpu > cpu__max_cpu().cpu) >>> +        return -EINVAL; >>> + >>> +    metadata = zalloc(sizeof(*metadata) * last_cpu); >> >> Hi Ganapatrao, >> >> I think I see what the problem is, but I'm wondering if a better fix >> would be to further decouple the CPU ID from the index in the array. >> >> With your change it's not clear what happens with sparse recordings, for >> example 'perf record -e cs_etm// -C 1,3,5'. And it seems like there is > > This patch fixes for any range of CPUs. > Record with sparse list of CPUs will not work with current code still. > Is there a major issue that means sparse can't be done? I'm thinking it would be best to fix both issues with one change while we understand this part rather than a half fix that might have do be completely re-understood and re-done later anyway. Unless there is some big blocker but I can't see it? >> some wastage in the zalloc here for example if only CPU 256 is traced >> then we'd still make 256 decoders but 255 of them would be unused? >> >> I tried to test sparse recordings, but your change doesn't apply to the >> latest coresight/next branch. I did notice that 'perf report -D' doesn't >> work with them on coresight/next (it just quits), but I couldn't see if >> that's fixed with your change. > > My patch is rebased on 6.3-RC7 codebase with Mike's 3 perf patches > related to dynamic id [1] support(queued for 6.4). > > "perf report -D" works for me. I was referring to sparse CPU lists, which I think you mentioned above doesn't work even with this patch. > > [1] https://www.spinics.net/lists/linux-perf-users/msg27452.html > It should be based on the next branch here: git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git >> >> Would a better fix not be to keep the metadata loops from 0-N and >> instead save the CPU ID in cs_etm_decoder_params or the decoder. That >> way it would support both sparse and not starting from 0 cases? I think > > Yep, I though this initially, it got complicated due to for more > for-loops. I will try again and post V2. > I can't imagine it would need any extra for loops off the top of my head. Just saving the CPU ID in a few structs and using it wherever it's needed instead of the loop index. I imagine most of the loops would actually stay the same rather than be changed like you have in V1. >> the code would be better if it's worded like "i < recorded_cpus" rather >> than "i < cpu" to make it clear that i isn't actually the CPU ID it's >> just an index. > > Yes, makes sense to call it "recorded_cpus". > >> >> Also a new test for this scenario would probably be a good idea. >> >> Thanks >> James >> > Thanks, > Ganapat