Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp18174751rwd; Tue, 27 Jun 2023 12:49:05 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4SXvSWVPKsLZYg/taz6UPYcElRlTw5IfB8WiuGHCyrd7s2EaubBlublMFZtpehaePCVuRk X-Received: by 2002:a17:907:3687:b0:982:501a:62be with SMTP id bi7-20020a170907368700b00982501a62bemr26424968ejc.39.1687895344818; Tue, 27 Jun 2023 12:49:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687895344; cv=none; d=google.com; s=arc-20160816; b=u3FrCo3Ts29BZuCGqCaXHgR5o+lz33kXptJ1naOk4LS/nO3f60BnkTewSxRUJBNaj+ ngAAEfJV0HDgLmRuOFV9xFFTuPB7rBV9AwkbvJJLPOk5LNX+J6X+sNYUiBEjoyW2K6/7 YTS6ABghVFYbmNlKWoT3e78M1xv52qM16bLgXbNBWYsyjyzZWS2yOPkgJTpQVkli7Mwv OwyAKzWfmCPVJ/Mnrrij47wE5+RLWs19rMWEEawKf/YCb8Z9d+xA3pIpuw2eXeUD92fP /1SirCrAlJH0Pns+KLUx4NTGiAHu4cCVAGcdDazDuRuedApg9XmLLJdBxI7ONrQGS0l0 zC0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=nWPaWm/SbxNmO7KGYVPvx1QzoejjqxszCBE87FaJ0SI=; fh=RnKCN4TBbDAXDDUjXHTveX3yNEggzErS7oNz0FWe244=; b=frIZ4c6EyftysgIaJYF6Xm2PO9czsJ8LagqzCuIstPGPVCng6+aueRZMyQETAvu8Nq IeWqWwuJRH7bZikmTRXrB31w+Q/K08iRwQopoXrQnPthyq8KedTjiKKxkkBFCU94KHLY QcoTVg2wsutysiAZj1VsMWd/bDinEamfC6d1MV0TgcWMTXzZ/NSalPQglhvel82zRhfL /jtpNaAKYUVkS/9Sg1MXFHge2pyP6YDslavf6eofXBt1sRztRyBjgn8BCg6pW7f6D/Xs qq/ziKrVIzRPwSMuw/HCcFMFNHoM9nKsA1Qz5HJvzd3p3xPgFNjiTrep4xiQOODEjEiE qq6Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20221208.gappssmtp.com header.s=20221208 header.b=noqsKYKS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b11-20020a170906d10b00b009829ea99a16si4577750ejz.577.2023.06.27.12.48.28; Tue, 27 Jun 2023 12:49:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20221208.gappssmtp.com header.s=20221208 header.b=noqsKYKS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229676AbjF0TMV (ORCPT + 99 others); Tue, 27 Jun 2023 15:12:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229953AbjF0TMT (ORCPT ); Tue, 27 Jun 2023 15:12:19 -0400 Received: from mail-lf1-x129.google.com (mail-lf1-x129.google.com [IPv6:2a00:1450:4864:20::129]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C1671BD5 for ; Tue, 27 Jun 2023 12:12:18 -0700 (PDT) Received: by mail-lf1-x129.google.com with SMTP id 2adb3069b0e04-4fa08687246so4996706e87.1 for ; Tue, 27 Jun 2023 12:12:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20221208.gappssmtp.com; s=20221208; t=1687893137; x=1690485137; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=nWPaWm/SbxNmO7KGYVPvx1QzoejjqxszCBE87FaJ0SI=; b=noqsKYKSyfXlrIAQfs8JJESxdmLsycA+cJDXOWLGOWTw7dpsRxk5p+iuPyy1WJmAzc ABpuTdx2zQlB2EF538pu0Ppg/oJw3ZuQVH0Dg9hcOxO/li/+pzkJxuoyS1CIArTczBh2 +8aWbRPzQNctdM9avQQxMRalsFlx0FMpQ7Z0Xo5DKOolWZlOrx/iig53JK84eSElLpIQ BEpTrwFXr1SrEfhEGuwJFELTbMOfQhGk3/hlJn8tg37iHRBaDpBjCW9LatIDbVFnv0jT ZEExPdaY0T8VRwEosMBjjTYLCiJKPjdoG3xBkVRFwle9jAxFy6vKSht6vzikRWN1fqnN T3tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687893137; x=1690485137; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nWPaWm/SbxNmO7KGYVPvx1QzoejjqxszCBE87FaJ0SI=; b=UVHmgOIbCCbtJ+lXnj4uJA368YLUUsXmhHXcujm5yJogLiKwJ3BNxOdWGYEA1WGu1K 8cwz4+avGWpcnmyFXYaOtz3Xoz0tTc+OoqAsy2SdbF6tyG8S3pw5c6HZ82nVCOzDvNIQ zMccG8Xl/FiJnf4XEimwTSoJPunJdOwHsaYq1C3qr6ShRN43WBI2WGQqooPpl2wrYhiP RQAFtPzqrVs6nzhBGBRRwtF/t6gOByXwknmMpfeSVmBDkM92i2796sbw08hBMLW3iHu6 5yLjsiDen4Xv+l66SS2l+vkdzKcr1a4Z3Ll9luHu6vFpzOvFKx0ROUOBGAQklFhgb3qB Giqg== X-Gm-Message-State: AC+VfDxXXbNELiewG5TkAZb5WXFXhiO7JHovwumi01PADLm+4fwOOyzf mGVMycC2kCm3jgqLgg+Jrci1foT+Wa56bUQLW/D2nw== X-Received: by 2002:a05:6512:32cb:b0:4fa:f1da:e6b9 with SMTP id f11-20020a05651232cb00b004faf1dae6b9mr5929372lfg.42.1687893136983; Tue, 27 Jun 2023 12:12:16 -0700 (PDT) MIME-Version: 1.0 References: <20230623222016.3742145-1-evan@rivosinc.com> <20230623222016.3742145-2-evan@rivosinc.com> <64F2D853-61E5-49CF-BAB5-AAFB8697683E@jrtc27.com> In-Reply-To: <64F2D853-61E5-49CF-BAB5-AAFB8697683E@jrtc27.com> From: Evan Green Date: Tue, 27 Jun 2023 12:11:40 -0700 Message-ID: Subject: Re: [PATCH 1/2] RISC-V: Probe for unaligned access speed To: Jessica Clarke Cc: Palmer Dabbelt , linux-doc@vger.kernel.org, Yangyu Chen , Conor Dooley , Guo Ren , Jisheng Zhang , linux-riscv , Jonathan Corbet , Xianting Tian , Masahiro Yamada , Greentime Hu , Simon Hosie , Li Zhengyu , Andrew Jones , Albert Ou , Alexandre Ghiti , Ley Foon Tan , Paul Walmsley , Heiko Stuebner , Anup Patel , linux-kernel@vger.kernel.org, Sia Jee Heng , Palmer Dabbelt , Andy Chiu Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 26, 2023 at 2:42=E2=80=AFPM Jessica Clarke = wrote: > > On 23 Jun 2023, at 23:20, Evan Green wrote: > > > > Rather than deferring misaligned access speed determinations to a vendo= r > > function, let's probe them and find out how fast they are. If we > > determine that a misaligned word access is faster than N byte accesses, > > mark the hardware's misaligned access as "fast". > > How sure are you that your measurements can be extrapolated and aren=E2= =80=99t > an artefact of the testing process? For example, off the top of my head: > > * The first run will potentially be penalised by data cache misses, > untrained prefetchers, TLB misses, branch predictors, etc. compared > with later runs. You have one warmup, but who knows how many > iterations it will take to converge? I'd expect the cache penalties to be reasonably covered by a single warmup. You're right about branch prediction, which is why I tried to use a large-ish buffer size, minimize the ratio of conditionals to loads/stores, and do the test for a decent number of iterations (on my THead, about 1800 and 400 for words and bytes). When I ran the test a handful of times, I did see variation on the order of ~5%. But the comparison of the two numbers doesn't seem to be anywhere near that margin (THead C906 was ~4x faster doing misaligned word accesses, others with slow misaligned accesses also reporting numbers not anywhere close to each other). > > * The code being benchmarked isn=E2=80=99t the code being run, so differe= nces > in access patterns, loop unrolling, loop alignment, etc. may cause the > real code to behave differently (and perhaps change which is better). I'm not trying to make statements about memcpy specifically, but (only) about misaligned accesses, which is why I tried to write loops that isolated that element as much as possible. > > The non-determinism that could in theory result from this also seems > like a not great idea to have. This is fair, if we have machines where this waffles from boot to boot that's not great. In theory if misaligned word accesses come out to being almost exactly equal to N byte accesses, then it doesn't matter which you choose, though of course it could still make a difference in practice. The alternative though of providing no info just pushes the same problem out into userspace, which seems worse. -Evan