Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp13159942rwd; Fri, 23 Jun 2023 17:07:30 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4LbVXTbelLYLmISW5SDbBLnZJq92NEcVSbH6V24pLOBiyzBtYxpFHrOlbC1JUrozqMipfd X-Received: by 2002:a05:6a20:7289:b0:11f:5151:a3b6 with SMTP id o9-20020a056a20728900b0011f5151a3b6mr30382625pzk.0.1687565249735; Fri, 23 Jun 2023 17:07:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687565249; cv=none; d=google.com; s=arc-20160816; b=oEFzCm6qaHy+3VSPDErzXnJJmYjqp3OFqxCFRKAm2B3kB6q7+R+1Qtg24E/mORvGKI QQD1jxb/4dCphYTvWpJ+YHzo9ZwLzSrN6N2ZKxfK7X89kXCdgTgZ0pSVcgi4C3cndIwr KNayv/4mi2e+rZO+WASlOZ9/ueuorh2uplegNP12AnD5Z3zrTbc4wesYCbQfcFhDgoaJ IOdTKxsGvC1lq0RXY91ZDaRsJ0NpcQ15welgomc7yDS2wZvsu6Z/Xk1uP73STgm6IBJ4 RsjZTJHHBxGKkzl4rHDHyCCbRnx1TAHcREwmoD+DdkOum8MuOe56QmC4rHtn9+kl5CIj B20A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version; bh=1arHzR5L23G+lXVpelP0sbL8klfrKSCW1o25x9CJle8=; fh=pRwrXWz/o+gNa8gp9LGHMhCUhND3lNkhvq52WfZAwVI=; b=zARSmjqTp2iDgDWyWzstWsuSiI1wzUGOKykfZG00pK+ATsd42lsAUQSiRrWW9hCszj dWtdSGw6pq2CSOB9FnHOrUcrM2+IXYknhNg2y5Zofs7tmBMxvDgUFVyCuxfY7oKIClr0 Xfc3Lg/iaYEH06GC5cFr5WDXrEYBgihNZKa/BuqD3WIXlZnwVHWVpIFFPcZMxJdLimxq 7C4CQdGA6bgtLwqLgKAXWnJ9LMl2v/P65eFGtyrF0nYqV0JbywdW91gkck+vMImvdNhh ClmiA/GtEIlxJY9M696VAkta1e57AqYbx4ER4CTkB8AtatOxX4Zv/3hPZ+jnI3cRTK2D Cpqg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e3-20020a636903000000b0054fec1eb2d8si518951pgc.177.2023.06.23.17.07.17; Fri, 23 Jun 2023 17:07:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231849AbjFXADe convert rfc822-to-8bit (ORCPT + 99 others); Fri, 23 Jun 2023 20:03:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33440 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231506AbjFXADc (ORCPT ); Fri, 23 Jun 2023 20:03:32 -0400 Received: from mail-yb1-f172.google.com (mail-yb1-f172.google.com [209.85.219.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 896FC2707; Fri, 23 Jun 2023 17:03:24 -0700 (PDT) Received: by mail-yb1-f172.google.com with SMTP id 3f1490d57ef6-bf3934ee767so3628346276.0; Fri, 23 Jun 2023 17:03:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687565004; x=1690157004; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pGxsOBbKB22bf0B3O3FFaShm0eqOgqt5FgbcjeTORJ4=; b=P6ZsZ+A2FGZ3iKm/RpnXpE0aVSgZ8w1jGpZZpPmzUEnUvHj1BZNA5mTL/3GmG1cVYW 1tqgf/ViH4FKPIoPT3H4nw3z8HO0fykTvXyPeW+NYlvlqfzxbey2QRpbOfJHPsZ7jtho XgdmijS6wzf9R20WQcxLYf6HLT6HMcd2gPlx7+g4+FWhSXzVuG3JaRVS6s8xsqRyCrPo 0sKeUUISDOei/PKW5fqbrBapBmjK9CibIlWGkOky2oOb8wzX+cIfyYdHepHM77rYhto6 5anm2YFjhfwd/IGXCJw1rtswWAdt+3pAqgHW6agr5mdbxgAXD5xrb2jSrePSY6tbfl3k 91TQ== X-Gm-Message-State: AC+VfDxQyjcUYeMLW4Sq6pKdxhknnOIrHJHzQ30p8ZM4nmjwXeLguHRe 6k9iEMLtqMU6cmlsPtpSHYez/HTORsFzc3QgXOA= X-Received: by 2002:a25:4007:0:b0:b94:bbf2:19a3 with SMTP id n7-20020a254007000000b00b94bbf219a3mr24508831yba.18.1687565003693; Fri, 23 Jun 2023 17:03:23 -0700 (PDT) MIME-Version: 1.0 References: <3772bce9068962f2a4c57672e919ebdf30edbc5c.1687375189.git.anupnewsmail@gmail.com> In-Reply-To: <3772bce9068962f2a4c57672e919ebdf30edbc5c.1687375189.git.anupnewsmail@gmail.com> From: Namhyung Kim Date: Fri, 23 Jun 2023 17:03:12 -0700 Message-ID: Subject: Re: [PATCH 4/9] scripts: python: Implement parsing of input data in convertPerfScriptProfile To: Anup Sharma Cc: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Anup, On Wed, Jun 21, 2023 at 12:41 PM Anup Sharma wrote: > > The lines variable is created by splitting the profile string into individual > lines. It allows for iterating over each line for processing. > > The line is considered the start of a sample. It is matched against a regular > expression pattern to extract relevant information such as before_time_stamp, > time_stamp, threadNamePidAndTidMatch, threadName, pid, and tid. > > The stack frames of the current sample are then parsed in a nested loop. > Each stackFrameLine is matched against a regular expression pattern to > extract rawFunc and mod information. > > Also fixed few checkpatch warnings. > > Signed-off-by: Anup Sharma > --- > .../scripts/python/firefox-gecko-converter.py | 62 ++++++++++++++++++- > 1 file changed, 60 insertions(+), 2 deletions(-) > > diff --git a/tools/perf/scripts/python/firefox-gecko-converter.py b/tools/perf/scripts/python/firefox-gecko-converter.py > index 0ff70c0349c8..e5bc7a11c3e6 100644 > --- a/tools/perf/scripts/python/firefox-gecko-converter.py > +++ b/tools/perf/scripts/python/firefox-gecko-converter.py > @@ -1,4 +1,5 @@ > #!/usr/bin/env python3 > +# SPDX-License-Identifier: GPL-2.0 Please put this line in the first commit. > import re > import sys > import json > @@ -14,13 +15,13 @@ def isPerfScriptFormat(profile): > firstLine = profile[:profile.index('\n')] > return bool(re.match(r'^\S.*?\s+(?:\d+/)?\d+\s+(?:\d+\d+\s+)?[\d.]+:', firstLine)) > > -def convertPerfScriptProfile(profile): > +def convertPerfScriptProfile(profile): You'd better configure your editor to warn or even fix the trailing whitespace automatically. Thanks, Namhyung > > def addSample(threadName, stackArray, time): > nonlocal name > if name != threadName: > name = threadName > - # TODO: > + # TODO: > # get_or_create_stack will create a new stack if it doesn't exist, or return the existing stack if it does. > # get_or_create_frame will create a new frame if it doesn't exist, or return the existing frame if it does. > stack = reduce(lambda prefix, stackFrame: get_or_create_stack(get_or_create_frame(stackFrame), prefix), stackArray, None) > @@ -54,3 +55,60 @@ def convertPerfScriptProfile(profile): > thread = _createtread(threadName, pid, tid) > threadMap[tid] = thread > thread['addSample'](threadName, stack, time_stamp) > + > + lines = profile.split('\n') > + > + line_index = 0 > + startTime = 0 > + while line_index < len(lines): > + line = lines[line_index] > + line_index += 1 > + # perf script --header outputs header lines beginning with # > + if line == '' or line.startswith('#'): > + continue > + > + sample_start_line = line > + > + sample_start_match = re.match(r'^(.*)\s+([\d.]+):', sample_start_line) > + if not sample_start_match: > + print(f'Could not parse line as the start of a sample in the "perf script" profile format: "{sample_start_line}"') > + continue > + > + before_time_stamp = sample_start_match[1] > + time_stamp = float(sample_start_match[2]) * 1000 > + threadNamePidAndTidMatch = re.match(r'^(.*)\s+(?:(\d+)\/)?(\d+)\b', before_time_stamp) > + > + if not threadNamePidAndTidMatch: > + print('Could not parse line as the start of a sample in the "perf script" profile format: "%s"' % sampleStartLine) > + continue > + threadName = threadNamePidAndTidMatch[1].strip() > + pid = int(threadNamePidAndTidMatch[2] or 0) > + tid = int(threadNamePidAndTidMatch[3] or 0) > + if startTime == 0: > + startTime = time_stamp > + # Parse the stack frames of the current sample in a nested loop. > + stack = [] > + while line_index < len(lines): > + stackFrameLine = lines[line_index] > + line_index += 1 > + if stackFrameLine.strip() == '': > + # Sample ends. > + break > + stackFrameMatch = re.match(r'^\s*(\w+)\s*(.+) \(([^)]*)\)', stackFrameLine) > + if stackFrameMatch: > + rawFunc = stackFrameMatch[2] > + mod = stackFrameMatch[3] > + rawFunc = re.sub(r'\+0x[\da-f]+$', '', rawFunc) > + > + if rawFunc.startswith('('): > + continue # skip process names > + > + if mod: > + # If we have a module name, provide it. > + # The code processing the profile will search for > + # "functionName (in libraryName)" using a regexp, > + # and automatically create the library information. > + rawFunc += f' (in {mod})' > + > + stack.append(rawFunc) > + > -- > 2.34.1 >