Received: by 2002:a05:7412:bc1a:b0:d7:7d3a:4fe2 with SMTP id ki26csp530796rdb; Sat, 19 Aug 2023 11:36:07 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEkw1BiuQCV9lAeOLqCG3e90o1tbjcp8WY4Lr8X+g/XrwmBUg7TYVtMJtkvhA3hJ2niQOTm X-Received: by 2002:a17:902:ea04:b0:1b8:811:b079 with SMTP id s4-20020a170902ea0400b001b80811b079mr3117583plg.0.1692470167319; Sat, 19 Aug 2023 11:36:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692470167; cv=none; d=google.com; s=arc-20160816; b=PXfSCoaf3qUXh1buMlzx/bdLH2K3eBUeT20yXskqG98W7IHfpd7KT/jqQ+wrLjQBBT MxcFYQQH6DQ4NKcTZ5CU3kl4XnY03UxN3oqTMT7dt3YjzF3/5Ty4IpFMfamyByVgezBM tvdmPFs01vbIap97mtQWxVVMVz87Z8/D/7I/jDf4ea4J7mdCVRA2GZueGqpAtbYHZPmG lhkjiAR/Tq+l15glol6TM+2fQVg5MehsunzMD73LL2/ZxU746QP724bH2D1BZZDq+GJ6 8lSavBBG2kbTgfNd7DbDJk/ZUc0mk9nUM0AhiU1o8POxUMUux3xG+6mQ3eh8SoZx6ikx Hqzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=rO/GUjqy2MCWkNcI9MNL4IXjkgGHQcESzZrRDHtdeBw=; fh=7L1JIjc4dstlwj/b8dGTd4DC+gVdx8r4Y29Tb9eXOQ8=; b=Q7pRYMeU8uptynbOPYisxD14a+I/emW9GNjygou5Jb4I6SJA+RW5RMU8fuGtsx5HKT HwddjBR6wH/+DZtLqNZDMBHKWu2QH9f5q70Qy214cDOTPd5z8DAWmRqvhn3oeb1PNSe/ b0YKW2MNnH/RgfdaTfVoOSNIl1yZpcnw1AAy89oHzHjK59WFbSb5D2W8CD2oVWArVron OOL5JYfO92Z6QTxldnQAaoMBNl6pnP2pjDDcSrROI+Nx1HBbHosZKxLq4SwL1CjnBYR5 y7f0A7n6Lae5vxFGKKr27/ZqYjaDkGcep725z4OeXYwaHAUIcUMHn4dK1glIym/BDbs+ cbHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20221208.gappssmtp.com header.s=20221208 header.b=HUOkjNnS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id kz11-20020a170902f9cb00b001bc74f6a951si3626460plb.250.2023.08.19.11.36.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 19 Aug 2023 11:36:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20221208.gappssmtp.com header.s=20221208 header.b=HUOkjNnS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B79562D2E2; Sat, 19 Aug 2023 01:28:24 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1379793AbjHRTnC (ORCPT + 99 others); Fri, 18 Aug 2023 15:43:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1379986AbjHRTml (ORCPT ); Fri, 18 Aug 2023 15:42:41 -0400 Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE4204C37 for ; Fri, 18 Aug 2023 12:42:10 -0700 (PDT) Received: by mail-pl1-x636.google.com with SMTP id d9443c01a7336-1bbff6b2679so10015465ad.1 for ; Fri, 18 Aug 2023 12:42:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20221208.gappssmtp.com; s=20221208; t=1692387709; x=1692992509; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=rO/GUjqy2MCWkNcI9MNL4IXjkgGHQcESzZrRDHtdeBw=; b=HUOkjNnS0G3UQF0NDHPyYjqP02QdPU9zoAWS4AH5JJoGjL0DKKk3iAViyAZfC5ZCJ6 SxBo984z15jkyaC+/oo7e38wht/CkFKF32cNAg/ktTK/OHghzqeZ2a3holHzb+UefpqC hRwhMs+sRjX004iMu/N9E3kvfDhErk0C9UH6hec9/ivN13LhQyZOxCZ2hPeSrMwVQ93D 1vYgSld99LAY3VoEWPms7Xkww94o3MyydYPNKGGGWMQqbKLUqrYNmDASAriNZC27VEit MUUYU5p7YVRm74h+Y9jiLR/azSBHXs7+m8GomS8KHfSgE2dtbUzgyAEK98dBZooBIDbs azSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692387709; x=1692992509; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=rO/GUjqy2MCWkNcI9MNL4IXjkgGHQcESzZrRDHtdeBw=; b=BUg++vG86rD9kHqjSbotdllXymN5m2FF42Ht1A2XsqgGjuTAvNfCl0mvY+/6JdUQ4T OawcaZzq+N243ggxWqPEbvLtbzv6LB/pmYQHcRqc2kHB0BlBiEmRxCL5UDZudqrudG8j 09Fz1ng8xxZjoBZVbCDpflal2+zPTd/5OCItJOLEjttPGku1f/lnXWxNd2H5miGcESwv o7GhRO0y/uWlr2vUfTpHFNKuhl1LmGpe4ekr3bJ+w+fPcjnt40alrJNoQXYz1VF8pQuM d4KNb0i9wOCmg9vg7PQRH9srfs9wmTDHc3gkuabVW0lq5izFwqn31j2nYdWJNdNVI5eY pCRQ== X-Gm-Message-State: AOJu0YxSiPJf7ypmWdE6p6Xpq6W+ph9UfWD7q1YlIG6mifrxGzBW3e6y aP4ip22yC+CV1tVSyf6JqOjrzQ== X-Received: by 2002:a17:903:2445:b0:1bc:6861:d746 with SMTP id l5-20020a170903244500b001bc6861d746mr185975pls.58.1692387709302; Fri, 18 Aug 2023 12:41:49 -0700 (PDT) Received: from evan.ba.rivosinc.com ([66.220.2.162]) by smtp.gmail.com with ESMTPSA id j10-20020a170902da8a00b001a5fccab02dsm2126614plx.177.2023.08.18.12.41.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Aug 2023 12:41:48 -0700 (PDT) From: Evan Green To: Palmer Dabbelt Cc: David Laight , Simon Hosie , Evan Green , Albert Ou , Alexandre Ghiti , Andrew Jones , Andy Chiu , Anup Patel , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Conor Dooley , Greentime Hu , Guo Ren , Heiko Stuebner , Jisheng Zhang , Jonathan Corbet , Ley Foon Tan , Marc Zyngier , Masahiro Yamada , Palmer Dabbelt , Paul Walmsley , Randy Dunlap , Samuel Holland , Sia Jee Heng , Sunil V L , Xianting Tian , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org Subject: [PATCH v4 0/2] RISC-V: Probe for misaligned access speed Date: Fri, 18 Aug 2023 12:41:34 -0700 Message-Id: <20230818194136.4084400-1-evan@rivosinc.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The current setting for the hwprobe bit indicating misaligned access speed is controlled by a vendor-specific feature probe function. This is essentially a per-SoC table we have to maintain on behalf of each vendor going forward. Let's convert that instead to something we detect at runtime. We have two assembly routines at the heart of our probe: one that does a bunch of word-sized accesses (without aligning its input buffer), and the other that does byte accesses. If we can move a larger number of bytes using misaligned word accesses than we can with the same amount of time doing byte accesses, then we can declare misaligned accesses as "fast". The tradeoff of reducing this maintenance burden is boot time. We spend 4-6 jiffies per core doing this measurement (0-2 on jiffie edge alignment, and 4 on measurement). The timing loop was based on raid6_choose_gen(), which uses (16+1)*N jiffies (where N is the number of algorithms). By taking only the fastest iteration out of all attempts for use in the comparison, variance between runs is very low. On my THead C906, it looks like this: [ 0.047563] cpu0: Ratio of byte access time to unaligned word access is 4.34, unaligned accesses are fast Several others have chimed in with results on slow machines with the older algorithm, which took all runs into account, including noise like interrupts. Even with this variation, results indicate that in all cases (fast, slow, and emulated) the measured numbers are nowhere near each other (always multiple factors away). Changes in v4: - Avoid the bare 64-bit divide which fails to link on 32-bit systems, use div_u64() (Palmer, buildrobot) Changes in v3: - Fix documentation indentation (Conor) - Rename __copy_..._unaligned() to __riscv_copy_..._unaligned() (Conor) - Renamed c0,c1 to start_cycles, end_cycles (Conor) - Renamed j0,j1 to start_jiffies, now - Renamed check_unaligned_access0() to check_unaligned_access_boot_cpu() (Conor) Changes in v2: - Explain more in the commit message (Conor) - Use a new algorithm that looks for the fastest run (David) - Clarify documentatin further (David and Conor) - Unify around a single word, "unaligned" (Conor) - Align asm operands, and other misc whitespace changes (Conor) Evan Green (2): RISC-V: Probe for unaligned access speed RISC-V: alternative: Remove feature_probe_func Documentation/riscv/hwprobe.rst | 11 ++- arch/riscv/errata/thead/errata.c | 8 --- arch/riscv/include/asm/alternative.h | 5 -- arch/riscv/include/asm/cpufeature.h | 2 + arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/alternative.c | 19 ----- arch/riscv/kernel/copy-unaligned.S | 71 ++++++++++++++++++ arch/riscv/kernel/copy-unaligned.h | 13 ++++ arch/riscv/kernel/cpufeature.c | 104 +++++++++++++++++++++++++++ arch/riscv/kernel/smpboot.c | 3 +- 10 files changed, 198 insertions(+), 39 deletions(-) create mode 100644 arch/riscv/kernel/copy-unaligned.S create mode 100644 arch/riscv/kernel/copy-unaligned.h -- 2.34.1