Received: by 2002:a05:6500:1b8f:b0:1fa:5c73:8e2d with SMTP id df15csp1437666lqb; Thu, 30 May 2024 10:03:52 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUhQsESTtftcffoo3yd/4SAevNvLQwGEt2lqcNHrTaZq+6D1gQEDqfnnfT1+dhHi+CfYFcqhMw7I9mwx/bss3CW7sznLBvkaOMt+Rf3hQ== X-Google-Smtp-Source: AGHT+IEdM5PG8wM0XF16H0BuxQ7cvgTP0kujdiomDhGWJRoQVq8GAwd8mhTL2oRhWp+QCN5sER2Z X-Received: by 2002:a05:6a20:6a20:b0:1b1:d74d:87b0 with SMTP id adf61e73a8af0-1b264d5fce9mr4199702637.22.1717088632380; Thu, 30 May 2024 10:03:52 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717088632; cv=pass; d=google.com; s=arc-20160816; b=0dOtLh8ZKl7pPHmwbJfpI0+GbTkqLjgFPGntIG1ZG0m6AQ6Zhm0RSGJZjaFbPM9xOO fojmFF2i+93OMCpDmHU8vPonNOQf/e9v5GTEhejJ9+fc4xCak7A6mRds3DM7G2GSUeoE 05hNv9WDKpMQp9hpCaQiD8AQi22myMaFvqDV/9ykIkE7409Uzx0+reJvbXa108nh1KLl pU50SQp0tbQHd1CNL+qTHbQQv/0fn2qJLLvWD+vY/cw3lzSpT/RpzZhzhRZyOd/zPcyP oYKVTLZwT3yeRy0wVCCD8PGpDfnd0i/rWKSzAfyM7YDG5t5vA78DoQ35Uyzhr5WL7EJr pPCA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=r+MrYC3M359zN++AcvPSJthoFBLMWtCnEv2B4iFCbx8=; fh=61mngf5UoyE4fH9ln2B3aBAriGtcgmMgEEl0/ZpLV6I=; b=07i2SYzCkZLHDieDmHTKU29kH0i1VIqb2ijqijxnh9mLwYcXtlcbVXgr3iKCUhFAlL Vyk97rwZABihPaCSWCXDO80psvnVt1MLOHk1B9NviL3iK8Ld+hw/LuR3D00C1TK5PE2M cTaeEJjgBr+1S9cNG2PjkJh+KA4q6h5Y5cfXIZSsXrJoIkrJG2a2gDf2sBGWBmA/Zv4J oK89K6y2INW8972H8dRCVRb2mhT0+sERz/Y+S0J43Ni/ahbDIInehBhYLS/3ZwsO5+Uu 8cMF96ZrNv++qfd2rDRcpZjGJ2J70d0w1Ab4UyqcAKVeUfAqSXHeVCW9iDTtFjzPvD8o VlHw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=jQNbUlNO; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-195694-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-195694-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id 41be03b00d2f7-6c0e256b505si1158702a12.476.2024.05.30.10.03.51 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 10:03:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-195694-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=jQNbUlNO; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-195694-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-195694-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 12463B21350 for ; Thu, 30 May 2024 17:02:12 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DF1B43FBB2; Thu, 30 May 2024 17:02:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="jQNbUlNO" Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7105F2E620 for ; Thu, 30 May 2024 17:02:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.173 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717088523; cv=none; b=q+CtuhYTsM4TWnSSa4sQ78IbrwD5ow7y0UwqEzVsfo4THzhpmydGFNEYeapjBhaI0pFf1MW77e1r560l0ES27efT7m+hWP5ylC08MX2XxCZdjN+7rD30KKONVcGz5Z4wh3F+qkPsRk8nMjfrpuXWaCWQSfvd9NxsdKgsJPdBN68= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717088523; c=relaxed/simple; bh=Gqy/sZ3D2l/jo2B+eHz+uLxuyS7+sy8fk7hAzmAFXHk=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=fKF+eK562vUDqIQuKFB55z89w3+m7UdD6j/QhQqTLJI4jCW7zsC6+hSKUYagZk9PilAaUUs0NS72nz47BN1TblPei0EbFwx1x1I4E+Q4y73SvdzSkLfYcJUZVKALDJE9/HKI9p+RznT9XYFfc29mcDnXuKyiw/c882u2ahYVv1s= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=jQNbUlNO; arc=none smtp.client-ip=209.85.160.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-43dfe020675so11171cf.0 for ; Thu, 30 May 2024 10:02:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717088520; x=1717693320; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=r+MrYC3M359zN++AcvPSJthoFBLMWtCnEv2B4iFCbx8=; b=jQNbUlNOGp1VIeOwe7XhrQrUkKwf/mvh0G8YLfAgi38OG9+evDVUvXrYta7rIUS1Oa 8sJn66T4Hv70e/EvatABRO6xqMshj9HqZi/CScf6mVLwCWdQJcVqbAXl6ROu9gNHsEDv iJk0B2E7HMSkGDGs+5dOitBGqGoZJvgEp3XrsW0AjsRiU73YLjpaekJ0+Bcp7IOcMpkS D2ou+sYKlSa4MH2qEIZo6uhMkKb0jkMjSYvwyqQC1xX6+SkBvvnDIvWMGHezSWESl8pp h+ZFC5DQb74LCv9BEUEqkNh05KHJK8v7yHZD1UABZsyROK+G6155rsyjrfUNyTSloYYy 8eBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717088520; x=1717693320; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r+MrYC3M359zN++AcvPSJthoFBLMWtCnEv2B4iFCbx8=; b=wsBxxmG1qM6Mmk3Uh/VxVpsPRF2MOFdFqGRf4wm1uJr9aYVg23LGi5F24FeBvQXhvB 7iLiY2UaIQzFo/FREO2LEWR/BeH7R7+X9Q0Keux1qjIPGvAylAfbAwWQeB1u7JkhOb2x xmyjWDAA2eh3zTTlqXaX/pJihwwjkXUrI/qyR1AgXz9hL1RL+X2ZRKg09HndCvvWhkO2 YALpBbl2eCbGGHIIn6wVw9OrXfCEu/5VG0LhlszvWNkB+VdHO2/nyJWQlv/4jAB4oTb6 FAWl3W236n0uzCicFTqMVel54tzsAFSosjyWON/37Dg37yJjG9EGwLjUz/z0elQEe7XD pgdA== X-Forwarded-Encrypted: i=1; AJvYcCVO5jVqP7ZGTWL4MCirnbImHaUU8FbzfiKxibw61W4VAiz3XE68ECblqxAdus49HUa3gwLIk3Rp94waiUGRjps0gayY2QrDI+DN3S7k X-Gm-Message-State: AOJu0YylW5X7C/v8bCWIHK1IVxFwDL8FVgsZTbQxmdQ/kx8HFmMxhAGK KLFegmcbPf/JV8mapDfp6QxVUoXuYnrH5BgRrFEVizUwOBZcz1vUlIneVvpueWQf7rlnd7eRrq0 IZBQ5T0OdQspT/CoN8bTAw7ztHPo7M4pqeOjt X-Received: by 2002:a05:622a:544e:b0:43a:b51c:46ca with SMTP id d75a77b69052e-43feb5182fdmr3288611cf.29.1717088519844; Thu, 30 May 2024 10:01:59 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240519181716.4088459-1-irogers@google.com> In-Reply-To: From: Ian Rogers Date: Thu, 30 May 2024 10:01:47 -0700 Message-ID: Subject: Re: [PATCH v1] tools api io: Move filling the io buffer to its own function To: Namhyung Kim Cc: Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, May 30, 2024 at 9:44=E2=80=AFAM Namhyung Kim = wrote: > > On Thu, May 23, 2024 at 9:47=E2=80=AFPM Ian Rogers w= rote: > > > > On Thu, May 23, 2024 at 4:25=E2=80=AFPM Namhyung Kim wrote: > > > > > > On Sun, May 19, 2024 at 11:17=E2=80=AFAM Ian Rogers wrote: > > > > > > > > In general a read fills 4kb so filling the buffer is a 1 in 4096 > > > > operation, move it out of the io__get_char function to avoid some > > > > checking overhead and to better hint the function is good to inline= . > > > > > > > > For perf's IO intensive internal (non-rigorous) benchmarks there's = a > > > > near 8% improvement to kallsyms-parsing with a default build. > > > > > > Oh, is it just from removing the io->eof check? Otherwise I don't > > > see any difference. > > > > I was hoping that by moving the code out-of-line then the hot part of > > the function could be inlined into things like reading the hex > > character. I didn't see that, presumably there are too many callers > > and so that made the inliner think sharing would be best even though > > the hot code is a compare, pointer dereference and an increment. I > > tried forcing inlining but it didn't seem to win over just having the > > code out-of-line. The eof check should be very well predicted. The > > out-of-line code was branched over forward, which should be 1 > > mispredict but again not a huge deal. I didn't do a more thorough > > analysis as I still prefer to have the cold code out-of-line. > > Ok, I don't see much difference with this change. But the change itself > looks fine. > > Thanks, > Namhyung > > > Before: > > # Running internals/synthesize benchmark... > Computing performance of single threaded perf event synthesis by > synthesizing events on the perf process itself: > Average synthesis took: 237.274 usec (+- 0.066 usec) > Average num. events: 24.000 (+- 0.000) > Average time per event 9.886 usec > Average data synthesis took: 241.126 usec (+- 0.087 usec) > Average num. events: 128.000 (+- 0.000) > Average time per event 1.884 usec > > # Running internals/kallsyms-parse benchmark... > Average kallsyms__parse took: 184.374 ms (+- 0.022 ms) > > # Running internals/inject-build-id benchmark... > Average build-id injection took: 20.096 msec (+- 0.115 msec) > Average time per event: 1.970 usec (+- 0.011 usec) > Average memory usage: 11574 KB (+- 29 KB) > Average build-id-all injection took: 13.477 msec (+- 0.100 msec) > Average time per event: 1.321 usec (+- 0.010 usec) > Average memory usage: 11160 KB (+- 0 KB) > > # Running internals/evlist-open-close benchmark... > Number of cpus: 64 > Number of threads: 1 > Number of events: 1 (64 fds) > Number of iterations: 100 > evlist__open: Permission denied > > # Running internals/pmu-scan benchmark... > Computing performance of sysfs PMU event scan for 100 times > Average core PMU scanning took: 135.880 usec (+- 0.249 usec) > Average PMU scanning took: 816.745 usec (+- 48.293 usec) > > > After: > > # Running internals/synthesize benchmark... > Computing performance of single threaded perf event synthesis by > synthesizing events on the perf process itself: > Average synthesis took: 235.711 usec (+- 0.067 usec) > Average num. events: 24.000 (+- 0.000) > Average time per event 9.821 usec > Average data synthesis took: 240.992 usec (+- 0.058 usec) > Average num. events: 128.000 (+- 0.000) > Average time per event 1.883 usec > > # Running internals/kallsyms-parse benchmark... > Average kallsyms__parse took: 179.664 ms (+- 0.043 ms) So this is still 2%. I was building without options like DEBUG=3D1 enabled, so perhaps that'd explain the difference. Anyway, if you're more comfortable with a commit message saying a 2% performance win I don't mind it being updated or I can upload a v2. It's likely this is being over-thought given the change :-) Thanks, Ian > # Running internals/inject-build-id benchmark... > Average build-id injection took: 19.901 msec (+- 0.117 msec) > Average time per event: 1.951 usec (+- 0.011 usec) > Average memory usage: 12163 KB (+- 10 KB) > Average build-id-all injection took: 13.627 msec (+- 0.086 msec) > Average time per event: 1.336 usec (+- 0.008 usec) > Average memory usage: 11160 KB (+- 0 KB) > > # Running internals/evlist-open-close benchmark... > Number of cpus: 64 > Number of threads: 1 > Number of events: 1 (64 fds) > Number of iterations: 100 > evlist__open: Permission denied > > # Running internals/pmu-scan benchmark... > Computing performance of sysfs PMU event scan for 100 times > Average core PMU scanning took: 136.540 usec (+- 0.294 usec) > Average PMU scanning took: 819.415 usec (+- 48.437 usec)