Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp3972908rwl; Tue, 27 Dec 2022 19:19:09 -0800 (PST) X-Google-Smtp-Source: AMrXdXuvFdGCKjcL4OsDtFk9ZLP/Tb3Mil5hG/Sj+Ggxn42vxNjq6yAtAV7WAcdw6WdhGVXub/JS X-Received: by 2002:a17:907:8c81:b0:7c1:962e:cf28 with SMTP id td1-20020a1709078c8100b007c1962ecf28mr21183700ejc.23.1672197549223; Tue, 27 Dec 2022 19:19:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672197549; cv=none; d=google.com; s=arc-20160816; b=ZSaNT2SI/RYP8YD8HwED/i2mEdx5gwdpxJVgPa7VGX54zoYyLrv8n/Yutr82G08QkD lx17rQAjEl3nBx+aCGKXCYipqR/NWgIZTUjPcLBaZHoWEjr3TLELBb14XzGt9zQZPe+l SzI+z0pgMIkOsr7pO9lsARVziATVKlgII/1w8dSoF39kVmepagWU0Wn6zUFEZSJsTA3J r2av6ydKww59WpzwKxHa5cNWfULTmj7rs3TdTodn/OJNlq411oiwiwfmFuH+e47C4ZC+ hE/HxYOjzgAdqRHCoEW3cy3fGl4eE/u8RmzBgvYnVrdzYXM50PKDacA8b91Az8gQs64h /PPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=lttqZKeMZ42Wftl+a8lEKBwDe2IpQRheXD3/dUIOrPE=; b=Ztr7RdW+ORKJwIGWoqhsCqYc87DCKTtEoTGGJoFf8NJ/B0FqasEx0MLYDMpC+Xvkix hiGNG9mewhRRtzozLtIYa9Co3eDJZnCeJZ827Iqh6PpL9NnFFkiOCEgDk1ieWlCn+GKS jq27IBxsFz4OuWiobUZhtdARCU/Sc7PnolLikKMtEJLWOOJE7I5a3e1Wf05VilT5DZ89 TgJCsaZI5a6rqqpMz7pIVtxc3/K6bLrn1eqLySWm9hIwtbT0LxYbvLEQfxh4M6pO92PM l/ud128t3Io1rahhaENuMSKoRmoGS1ly2A3yS5APSVJo0nGyIChdctLHje6XBwMz78Ev Y/3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@igel-co-jp.20210112.gappssmtp.com header.s=20210112 header.b=vLsxbKjn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gb31-20020a170907961f00b007b889a69895si12034462ejc.589.2022.12.27.19.18.54; Tue, 27 Dec 2022 19:19:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@igel-co-jp.20210112.gappssmtp.com header.s=20210112 header.b=vLsxbKjn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230457AbiL1CYZ (ORCPT + 65 others); Tue, 27 Dec 2022 21:24:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229912AbiL1CYX (ORCPT ); Tue, 27 Dec 2022 21:24:23 -0500 Received: from mail-vk1-xa34.google.com (mail-vk1-xa34.google.com [IPv6:2607:f8b0:4864:20::a34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA1FAA456 for ; Tue, 27 Dec 2022 18:24:21 -0800 (PST) Received: by mail-vk1-xa34.google.com with SMTP id r3so6887259vkq.13 for ; Tue, 27 Dec 2022 18:24:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=igel-co-jp.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lttqZKeMZ42Wftl+a8lEKBwDe2IpQRheXD3/dUIOrPE=; b=vLsxbKjnqmN7ZOTHeMz7NeULAaBP7Lfep7XHvKEKppWhzJsr0kdCDXNIueyDJW7iVL la70SsV6Ie6ZMuuy9gZiH0wLLJMRYSg2bTYDV14tlozyS3xAPU9gEHvXZVlTYbrgIuP8 aE9qw7gWzsANwTwSfOkhq3eh/Fo6ZRciivXlZObFquyr6pMO+DGhNIoEEW34DoTxowO0 CqQx7pam0TnoF72zI1W2XVJI5r5x9tD2jHyPsMl3d/IiP/Yx8UCWe2S1Ziw91SkBk8bU NjTJYO+lSwXDsoRKiRin6oAsX+ieLJnv8B2QE/DPzvPaDbHM60/L30bM7tXNrBDN+PqM qXQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lttqZKeMZ42Wftl+a8lEKBwDe2IpQRheXD3/dUIOrPE=; b=FoWa8xhxjWcJ10b4Fcc8i/58u4HzpnnRHnFTzRxmmTsOQvxGWBZC1Vf/iXDvWY6/S8 iHhDsfpmrFhvmZMLPclkRn3EJD5zyPoaDf0jUNCuc+sN0iLsvx1zWvIFNX7D0Kic8XPm Jrgwp9B+vhA8bBc/TTE5ymvqarYtJpLr65RffAULYCPqlWMLywobll0YK3NjHbC5aiOD C/m6L7CpypoS0/yrYP9KUBNteka37CFJgOsU8Z9M2PZDo1MYLpE4r9INjJDW5apGG63y WzwUmMzWVHEEwCXgIF3YBfIYUDysa7t5rf7ZyVI1MMETP/sc37/vdUOZOVOOPBPu9OoT giCQ== X-Gm-Message-State: AFqh2koPT4mXbMou8jig0Hh0qEIpl94/dzvbn7FFgQKS8Zj57kKQgiXT Rj82Yl1vpO4lxz4bI9U08XdZkEoS+GFVTBW1p/QtCA== X-Received: by 2002:a1f:2016:0:b0:3d5:53d8:aa10 with SMTP id g22-20020a1f2016000000b003d553d8aa10mr1129190vkg.21.1672194261032; Tue, 27 Dec 2022 18:24:21 -0800 (PST) MIME-Version: 1.0 References: <20221227022528.609839-1-mie@igel.co.jp> <20221227022528.609839-5-mie@igel.co.jp> <20221227020007-mutt-send-email-mst@kernel.org> <20221227075332-mutt-send-email-mst@kernel.org> In-Reply-To: <20221227075332-mutt-send-email-mst@kernel.org> From: Shunsuke Mie Date: Wed, 28 Dec 2022 11:24:10 +0900 Message-ID: Subject: Re: [RFC PATCH 4/9] vringh: unify the APIs for all accessors To: "Michael S. Tsirkin" Cc: Jason Wang , Rusty Russell , kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2022=E5=B9=B412=E6=9C=8827=E6=97=A5(=E7=81=AB) 23:37 Michael S. Tsirkin : > > On Tue, Dec 27, 2022 at 07:22:36PM +0900, Shunsuke Mie wrote: > > 2022=E5=B9=B412=E6=9C=8827=E6=97=A5(=E7=81=AB) 16:49 Shunsuke Mie : > > > > > > 2022=E5=B9=B412=E6=9C=8827=E6=97=A5(=E7=81=AB) 16:04 Michael S. Tsirk= in : > > > > > > > > On Tue, Dec 27, 2022 at 11:25:26AM +0900, Shunsuke Mie wrote: > > > > > Each vringh memory accessors that are for user, kern and iotlb ha= s own > > > > > interfaces that calls common code. But some codes are duplicated = and that > > > > > becomes loss extendability. > > > > > > > > > > Introduce a struct vringh_ops and provide a common APIs for all a= ccessors. > > > > > It can bee easily extended vringh code for new memory accessor an= d > > > > > simplified a caller code. > > > > > > > > > > Signed-off-by: Shunsuke Mie > > > > > --- > > > > > drivers/vhost/vringh.c | 667 +++++++++++------------------------= ------ > > > > > include/linux/vringh.h | 100 +++--- > > > > > 2 files changed, 225 insertions(+), 542 deletions(-) > > > > > > > > > > diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c > > > > > index aa3cd27d2384..ebfd3644a1a3 100644 > > > > > --- a/drivers/vhost/vringh.c > > > > > +++ b/drivers/vhost/vringh.c > > > > > @@ -35,15 +35,12 @@ static __printf(1,2) __cold void vringh_bad(c= onst char *fmt, ...) > > > > > } > > > > > > > > > > /* Returns vring->num if empty, -ve on error. */ > > > > > -static inline int __vringh_get_head(const struct vringh *vrh, > > > > > - int (*getu16)(const struct vrin= gh *vrh, > > > > > - u16 *val, const _= _virtio16 *p), > > > > > - u16 *last_avail_idx) > > > > > +static inline int __vringh_get_head(const struct vringh *vrh, u1= 6 *last_avail_idx) > > > > > { > > > > > u16 avail_idx, i, head; > > > > > int err; > > > > > > > > > > - err =3D getu16(vrh, &avail_idx, &vrh->vring.avail->idx); > > > > > + err =3D vrh->ops.getu16(vrh, &avail_idx, &vrh->vring.avail-= >idx); > > > > > if (err) { > > > > > vringh_bad("Failed to access avail idx at %p", > > > > > &vrh->vring.avail->idx); > > > > > > > > I like that this patch removes more lines of code than it adds. > > > > > > > > However one of the design points of vringh abstractions is that the= y were > > > > carefully written to be very low overhead. > > > > This is why we are passing function pointers to inline functions - > > > > compiler can optimize that out. > > > > > > > > I think that introducing ops indirect functions calls here is going= to break > > > > these assumptions and hurt performance. > > > > Unless compiler can somehow figure it out and optimize? > > > > I don't see how it's possible with ops pointer in memory > > > > but maybe I'm wrong. > > > I think your concern is correct. I have to understand the compiler > > > optimization and redesign this approach If it is needed. > > > > Was any effort taken to test effect of these patches on performance= ? > > > I just tested vringh_test and already faced little performance reduct= ion. > > > I have to investigate that, as you said. > > I attempted to test with perf. I found that the performance of patched = code > > is almost the same as the upstream one. However, I have to investigate = way > > this patch leads to this result, also the profiling should be run on > > more powerful > > machines too. > > > > environment: > > $ grep 'model name' /proc/cpuinfo > > model name : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz > > model name : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz > > model name : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz > > model name : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz > > > > results: > > * for patched code > > Performance counter stats for 'nice -n -20 ./vringh_test_patched > > --parallel --eventidx --fast-vringh --indirect --virtio-1' (20 runs): > > > > 3,028.05 msec task-clock # 0.995 CPUs > > utilized ( +- 0.12% ) > > 78,150 context-switches # 25.691 K/sec > > ( +- 0.00% ) > > 5 cpu-migrations # 1.644 /sec > > ( +- 3.33% ) > > 190 page-faults # 62.461 /sec > > ( +- 0.41% ) > > 6,919,025,222 cycles # 2.275 GHz > > ( +- 0.13% ) > > 8,990,220,160 instructions # 1.29 insn per > > cycle ( +- 0.04% ) > > 1,788,326,786 branches # 587.899 M/sec > > ( +- 0.05% ) > > 4,557,398 branch-misses # 0.25% of all > > branches ( +- 0.43% ) > > > > 3.04359 +- 0.00378 seconds time elapsed ( +- 0.12% ) > > > > * for upstream code > > Performance counter stats for 'nice -n -20 ./vringh_test_base > > --parallel --eventidx --fast-vringh --indirect --virtio-1' (10 runs): > > > > 3,058.41 msec task-clock # 0.999 CPUs > > utilized ( +- 0.14% ) > > 78,149 context-switches # 25.545 K/sec > > ( +- 0.00% ) > > 5 cpu-migrations # 1.634 /sec > > ( +- 2.67% ) > > 194 page-faults # 63.414 /sec > > ( +- 0.43% ) > > 6,988,713,963 cycles # 2.284 GHz > > ( +- 0.14% ) > > 8,512,533,269 instructions # 1.22 insn per > > cycle ( +- 0.04% ) > > 1,638,375,371 branches # 535.549 M/sec > > ( +- 0.05% ) > > 4,428,866 branch-misses # 0.27% of all > > branches ( +- 22.57% ) > > > > 3.06085 +- 0.00420 seconds time elapsed ( +- 0.14% ) > > > How you compiled it also matters. ATM we don't enable retpolines > and it did not matter since we didn't have indirect calls, > but we should. Didn't yet investigate how to do that for virtio tools. I think the retpolines certainly affect performance. Thank you for pointing it out. I'd like to start the investigation that how to apply the retpolines to the virtio tools. > > > Thank you for your comments. > > > > Thanks! > > > > > > > > > > > Best, > > > Shunsuke. >