Received: by 2002:a05:6500:1b8f:b0:1fa:5c73:8e2d with SMTP id df15csp1430192lqb; Thu, 30 May 2024 09:51:51 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXJi6yp0b4tgjDfO0Irs1VtVgTxjgVuW6SbW7nDhneoMF5QDFBoIrXZGJ4SjDlu9ZLCH+HxPVThZnooJMO0SKj+1g8zFR4PHlq9p11FmQ== X-Google-Smtp-Source: AGHT+IE4MmH+XulApdIyRTvrtjULDURMhS6Zl41x/HYaUvZp5vnHTNWawLk1VLYoiHPgi98OesnG X-Received: by 2002:a17:902:d491:b0:1f4:8363:a6fc with SMTP id d9443c01a7336-1f6194f9e89mr33460215ad.25.1717087910909; Thu, 30 May 2024 09:51:50 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717087910; cv=pass; d=google.com; s=arc-20160816; b=x6acAVHmCNiwqrd4ScSRS1J9S0wetMTejjHEqQ5XX0wM5GSg3tYembdZZwPjuprBi9 XgjmubOQ/YTpInsCjngSQIwKsT5l2D7FL645ncF3C0mPeIlWI6u67cceQ1R3rhvQHN9J 2VloNtjVr9UqTwg6O5e+CjMvTyy7JlfDhoarL2VP+Zw6OGEIJX6TRBvEbQ6FKD4agMF+ zbXdD8pmU/g3OxYM9p8dECsQNjL0tuIWcdOlXcOX8UxdpesqZdUgVLjSSchz9FEQm3ro Fc2/zltUH5pU2eBWjo7BPjBz+MFiA7e92Pgro2Uy05gHLjcZDY4aLbW9AAbZSthrIim1 RBqA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence; bh=NZDMlH3wwmcKVtEagfPCH0RxE62sWqc/UQQgjvFYOqg=; fh=N2rn7jYn3gzEUJdUnKr15yQM/rBQax/cg6THwvJZrtE=; b=mNRNpFSai2l5hvBiO231JCMoF75xgAC7IVncBtE53k6KAK+Ws75ZIfkjF08rf6j0ge aMtT6+PXZaoAJ9Wp1fykcFA8FyWDWI14LccdxwhWvT6SQFJxHKKShjQWhANE0qI5vzoM WDrS9JpHJbrfGpspXxpwqn1eRNH0Koy5gt/T7+6LniOQFfBEQt/TNWNvF2l5TGhOMgRw NF8GQmTiEpMkQgeFo67KKr2g56j+vCgmTlYMdoYK/mwM/49/fH+wepMzG6ANEKVclCSb 22Py3y5b8jAtgENjJ/UviRwzuNn2iT9ok9FcVNGbZnarvPwB6/BsG8BbjyQBlCsgo2GS ZYyA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-195666-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-195666-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id d9443c01a7336-1f63233d184si164835ad.99.2024.05.30.09.51.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 09:51:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-195666-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-195666-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-195666-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 49F66B22EFD for ; Thu, 30 May 2024 16:44:24 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id AE6342CCBE; Thu, 30 May 2024 16:44:16 +0000 (UTC) Received: from mail-pj1-f51.google.com (mail-pj1-f51.google.com [209.85.216.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDDF823741; Thu, 30 May 2024 16:44:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717087456; cv=none; b=QkD2Gy+lnCvyUl25USI/kk2l6+FpUdMymr+CfjhdIRo9mAAbuDToro85I7K4GoC+rE+t1B3N4ma9SRWNYG+cscNu7XPHvqiteJc4LdJlYKt7eku10Q4ikT7pHhHD6XYGJfmA9xxyBwV5QEXzpNkHyMPE/tbFOTh453GAP3+jR60= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717087456; c=relaxed/simple; bh=DErZGw60vY5VVok3vqsO3s4NbZA0aegd2r/sMDdVj8w=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=e3/iUga0Qrzzw0kvRUz5me6MWw+CC88aTKoRfp2wUmb4S5mzXt2c+pFS95a3IEIyiptOFZbkPFulCREA8/HgFSeMTuHrt+BINIQYgnsLBj6i4Ght2RvjrVXQI27D/tsfetbsN3SOkwESC8DGGM6NC3HWlCjzKE1BuHPvEDAkaUo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org; spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.216.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pj1-f51.google.com with SMTP id 98e67ed59e1d1-2bf59b781d6so908976a91.0; Thu, 30 May 2024 09:44:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717087454; x=1717692254; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NZDMlH3wwmcKVtEagfPCH0RxE62sWqc/UQQgjvFYOqg=; b=L7yAeD49JSEGcItP8/52NQxa+YFZdSGQHOyVfUqNN5KX2+dBk6mbZwcCWlOeBTrn4p 868uzoPfQa21RKto8XOcSISash1a2/kQvZxz+BbVrU2wy8K+EnJcmn3SvxqGQYK223hT r2q613EeBkQfDZrE0jT7I/Blv+vImx3Wi8x9j3UQUXIhlzaktnenuUHUfMV8+FDx9wlS 32yFaSTcG5LQIsCX9sA04gTLTdUsBAeqosV65PtsP/pnvrHBSHtxpcCU0ACJqNxd1700 iE5WvLHfzlMQVmfC+O6+ZACYS/4HuCxLgxIdKJa5RyGJ09GkdI2EwMWdvFgY6CP0PajG QRQg== X-Forwarded-Encrypted: i=1; AJvYcCWoTD4R5/A0O0yL0InuLWfzfNBPG8KTJwfR+apjxFw2Tfzl4bhaj1n0BsKncxQNzpUg1gUolCGRPR72K0h70uhG0AP7wxAZK9/r/fVN1E+csJVyRP2G1C3OM7c2C8ALV6jj4YD7ZLABVSUV77V1pA== X-Gm-Message-State: AOJu0Yxbl3cyKmQHrINRtxZdSA5+m1HSnc1IsLwaNe/24sLB5X4CgzUu z9EXa9o4PoF7q/ezmahuQmtZHT4WPk7TFf38pk5dydA/6yjwpPvAbzlFYqDgFNLjVO4FV5oAcfb M1Z696JJJy0jASKWXbWGu8KeRUvD72Pa1 X-Received: by 2002:a17:90a:b884:b0:2b3:28df:96b1 with SMTP id 98e67ed59e1d1-2c1ab9ddda7mr2613497a91.7.1717087453788; Thu, 30 May 2024 09:44:13 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240519181716.4088459-1-irogers@google.com> In-Reply-To: From: Namhyung Kim Date: Thu, 30 May 2024 09:44:02 -0700 Message-ID: Subject: Re: [PATCH v1] tools api io: Move filling the io buffer to its own function To: Ian Rogers Cc: Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, May 23, 2024 at 9:47=E2=80=AFPM Ian Rogers wro= te: > > On Thu, May 23, 2024 at 4:25=E2=80=AFPM Namhyung Kim wrote: > > > > On Sun, May 19, 2024 at 11:17=E2=80=AFAM Ian Rogers wrote: > > > > > > In general a read fills 4kb so filling the buffer is a 1 in 4096 > > > operation, move it out of the io__get_char function to avoid some > > > checking overhead and to better hint the function is good to inline. > > > > > > For perf's IO intensive internal (non-rigorous) benchmarks there's a > > > near 8% improvement to kallsyms-parsing with a default build. > > > > Oh, is it just from removing the io->eof check? Otherwise I don't > > see any difference. > > I was hoping that by moving the code out-of-line then the hot part of > the function could be inlined into things like reading the hex > character. I didn't see that, presumably there are too many callers > and so that made the inliner think sharing would be best even though > the hot code is a compare, pointer dereference and an increment. I > tried forcing inlining but it didn't seem to win over just having the > code out-of-line. The eof check should be very well predicted. The > out-of-line code was branched over forward, which should be 1 > mispredict but again not a huge deal. I didn't do a more thorough > analysis as I still prefer to have the cold code out-of-line. Ok, I don't see much difference with this change. But the change itself looks fine. Thanks, Namhyung Before: # Running internals/synthesize benchmark... Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 237.274 usec (+- 0.066 usec) Average num. events: 24.000 (+- 0.000) Average time per event 9.886 usec Average data synthesis took: 241.126 usec (+- 0.087 usec) Average num. events: 128.000 (+- 0.000) Average time per event 1.884 usec # Running internals/kallsyms-parse benchmark... Average kallsyms__parse took: 184.374 ms (+- 0.022 ms) # Running internals/inject-build-id benchmark... Average build-id injection took: 20.096 msec (+- 0.115 msec) Average time per event: 1.970 usec (+- 0.011 usec) Average memory usage: 11574 KB (+- 29 KB) Average build-id-all injection took: 13.477 msec (+- 0.100 msec) Average time per event: 1.321 usec (+- 0.010 usec) Average memory usage: 11160 KB (+- 0 KB) # Running internals/evlist-open-close benchmark... Number of cpus: 64 Number of threads: 1 Number of events: 1 (64 fds) Number of iterations: 100 evlist__open: Permission denied # Running internals/pmu-scan benchmark... Computing performance of sysfs PMU event scan for 100 times Average core PMU scanning took: 135.880 usec (+- 0.249 usec) Average PMU scanning took: 816.745 usec (+- 48.293 usec) After: # Running internals/synthesize benchmark... Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 235.711 usec (+- 0.067 usec) Average num. events: 24.000 (+- 0.000) Average time per event 9.821 usec Average data synthesis took: 240.992 usec (+- 0.058 usec) Average num. events: 128.000 (+- 0.000) Average time per event 1.883 usec # Running internals/kallsyms-parse benchmark... Average kallsyms__parse took: 179.664 ms (+- 0.043 ms) # Running internals/inject-build-id benchmark... Average build-id injection took: 19.901 msec (+- 0.117 msec) Average time per event: 1.951 usec (+- 0.011 usec) Average memory usage: 12163 KB (+- 10 KB) Average build-id-all injection took: 13.627 msec (+- 0.086 msec) Average time per event: 1.336 usec (+- 0.008 usec) Average memory usage: 11160 KB (+- 0 KB) # Running internals/evlist-open-close benchmark... Number of cpus: 64 Number of threads: 1 Number of events: 1 (64 fds) Number of iterations: 100 evlist__open: Permission denied # Running internals/pmu-scan benchmark... Computing performance of sysfs PMU event scan for 100 times Average core PMU scanning took: 136.540 usec (+- 0.294 usec) Average PMU scanning took: 819.415 usec (+- 48.437 usec)