Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp412179ybh; Mon, 20 Jul 2020 21:05:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzyeMv6FtXOtvezB30fCeyrQYKkBA6BFbg5tVDQ8SfgRJ5sdmH1V0wplI0gHkOUK5TClqr5 X-Received: by 2002:a50:d7c1:: with SMTP id m1mr24686155edj.217.1595304328170; Mon, 20 Jul 2020 21:05:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595304328; cv=none; d=google.com; s=arc-20160816; b=y9zyi5W1MDRi2joPvEo/aJPSHKIjTD6D7qIHIPQCMVIFzSAjt15wS0SMRNCj5EExK0 avK1Z7SNi5VGWwjiygXWquHRvTxfoDPR6rTENfR2CyxpDX+wb86n7YPetO868abop00J 67pNWg9BmiflqpbcRkWVGI4/G+1NrVHcH3DuPDuIo0PGGiR9EsGPDDSda3xFVhUVR9kl kTiusOhso7LKh/xf+A2hLAJJIlsnVnbro3x27D8MipcGaB1dqnW7QgYTGw5sZlnyz4GB Elo7RCt70/6V5rnyNsgrV+FUIBv1ST3oPeb4btVFxSFJotrw361VgBekTXycIyY4lOY0 UwRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:to:from:cc:in-reply-to:subject:date:dkim-signature; bh=8WVvQB40RK6deZ4Cy8t+mvWJCvDXakAnuO2sCFnXikw=; b=ae/O/XOp0Y8NSj6g2LtBWYhm4BSVBK6d2uEwt+ZzVvDB2APKtyLOaZL6ZlQJMc6La1 G8HfTd4LPpVwJva9kBmMpQkBh8aBg8zqy6sT3O2nFRrcK6yAgBnKjVikhGM/Ud/k3l13 FV6zIjwSzUHXmAchRKHnf4rrTgCUjzh9DFu9LWJAsVfLDrg5D/oxj6OXyUC+EmoR1FFA DXbEC9ndY/ZN+5cAIg2Z6bFvX5WQEI+hCvG3idwR/8IY49aQSflOaaLf7BJ9Hk8HoyCD TcUGB9lEfzYvs1EPOhTZWuYyrfD1bcfCGxtzR/FdjNabo7eLNyHCByZlT06N6SQDTuMu K49g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@dabbelt-com.20150623.gappssmtp.com header.s=20150623 header.b=b1p8qScJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z4si12300552edr.144.2020.07.20.21.05.04; Mon, 20 Jul 2020 21:05:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@dabbelt-com.20150623.gappssmtp.com header.s=20150623 header.b=b1p8qScJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726029AbgGUEEr (ORCPT + 99 others); Tue, 21 Jul 2020 00:04:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40642 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725796AbgGUEEr (ORCPT ); Tue, 21 Jul 2020 00:04:47 -0400 Received: from mail-pf1-x444.google.com (mail-pf1-x444.google.com [IPv6:2607:f8b0:4864:20::444]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 19628C061794 for ; Mon, 20 Jul 2020 21:04:47 -0700 (PDT) Received: by mail-pf1-x444.google.com with SMTP id t11so10083709pfq.11 for ; Mon, 20 Jul 2020 21:04:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dabbelt-com.20150623.gappssmtp.com; s=20150623; h=date:subject:in-reply-to:cc:from:to:message-id:mime-version :content-transfer-encoding; bh=8WVvQB40RK6deZ4Cy8t+mvWJCvDXakAnuO2sCFnXikw=; b=b1p8qScJ8GMb4ibmmM5cZtXTLkFKOD3JrCeBI1OZKIm858b3i43FB7wdkJwsC0b6NR iZBQBCgY9/ko7mN2IszymALJ/LJ5iXQkaE8iA01ymARxxPXFRDqEzyAS8iaE3B1eI0DX HTEB3LrHTCXWnXgP5O2rbjCnWowYP78cMmmCUSEfIA4712jGTww+BNEVuqSmfhzKJwMU 0sf4DWKT0Jf+WaPh9UfcAr4/T7pNn2I2ehMEI9tI0+BLnIrOD8Qzu2NwPb/5AtEMyR54 p6aSUNCweq1bhauXUf+xb1AOsUa5jrRQ1AAXIT34vapUZD7KNWrCRyCvRPnFtvPGYUEW IzGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:subject:in-reply-to:cc:from:to:message-id :mime-version:content-transfer-encoding; bh=8WVvQB40RK6deZ4Cy8t+mvWJCvDXakAnuO2sCFnXikw=; b=sMhHbxTQeSovURBb3fJxpsGeb9hu4ykl61LuFjKCoZeVHkoc9WdPhuqaPEjmfPW1C9 XSWwO67ZM1g6UbnYFrOSqRLmGrAy4wRz15yW/jPsG5fUDxIYGt2hyeuelFmJbjWJPSdu Z+k8lyiNoPOAH/LrM10sSSAEN0lCuFQ2O9RvVaRhbXN7/mUwJIF/BrLo1Ep35QNz2AFL stRvEK8kTGRUqpyZemBJxrE/a2RkF2dq7YTo+ZNL62UcSwvHW9AAqJ00CIx6hQoFfilK CEM9g1kAtsCv3HB693YkljfikEaVqQqzx1cf4eN4HlQu1ns6VP+ljmakLGt1TNkM6OD3 YONw== X-Gm-Message-State: AOAM533O5ABO2G9HFYR1pTu9Y0/3lixb3XYxY3EbFTZ4G56dK7XD22Rm K0KJ3UYOzuNTBZaIQqux8Lwu/blzk5Q= X-Received: by 2002:a62:6047:: with SMTP id u68mr21561090pfb.238.1595304286185; Mon, 20 Jul 2020 21:04:46 -0700 (PDT) Received: from localhost (76-210-143-223.lightspeed.sntcca.sbcglobal.net. [76.210.143.223]) by smtp.gmail.com with ESMTPSA id r204sm18981148pfc.134.2020.07.20.21.04.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Jul 2020 21:04:45 -0700 (PDT) Date: Mon, 20 Jul 2020 21:04:45 -0700 (PDT) X-Google-Original-Date: Mon, 20 Jul 2020 21:04:43 PDT (-0700) Subject: Re: [PATCH] riscv: Select ARCH_HAS_DEBUG_VM_PGTABLE In-Reply-To: <0d7d0c38-5b67-1793-47d7-b8a7714838ee@arm.com> CC: kernel@esmil.dk, linux-riscv@lists.infradead.org, Paul Walmsley , linux-kernel@vger.kernel.org From: Palmer Dabbelt To: anshuman.khandual@arm.com Message-ID: Mime-Version: 1.0 (MHng) Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 14 Jul 2020 20:20:54 PDT (-0700), anshuman.khandual@arm.com wrote: > > > On 07/15/2020 02:56 AM, Emil Renner Berthing wrote: >> This allows the pgtable tests to be built. >> >> Signed-off-by: Emil Renner Berthing >> --- >> >> The tests seem to succeed both in Qemu and on the HiFive Unleashed >> >> Both with and without the recent additions in >> https://lore.kernel.org/linux-riscv/1594610587-4172-1-git-send-email-anshuman.khandual@arm.com/ > > That's great, thanks for testing. Actually, looking at this I'm not sure it actually helps us any. This changes the behavior of two functions. Pulling out the relevant sections, I see: unsigned int __sw_hweight32(unsigned int w) { #ifdef CONFIG_ARCH_HAS_FAST_MULTIPLIER w -= (w >> 1) & 0x55555555; w = (w & 0x33333333) + ((w >> 2) & 0x33333333); w = (w + (w >> 4)) & 0x0f0f0f0f; return (w * 0x01010101) >> 24; #else unsigned int res = w - ((w >> 1) & 0x55555555); res = (res & 0x33333333) + ((res >> 2) & 0x33333333); res = (res + (res >> 4)) & 0x0F0F0F0F; res = res + (res >> 8); return (res + (res >> 16)) & 0x000000FF; #endif } and unsigned long memchr_inv(unsigned long value64) { #if defined(CONFIG_ARCH_HAS_FAST_MULTIPLIER) && BITS_PER_LONG == 64 value64 *= 0x0101010101010101ULL; #elif defined(CONFIG_ARCH_HAS_FAST_MULTIPLIER) value64 *= 0x01010101; value64 |= value64 << 32; #else value64 |= value64 << 8; value64 |= value64 << 16; value64 |= value64 << 32; #endif return value64; } GCC optimizer the multiplication out of the first one: __sw_hweight32: li a4,1431654400 srliw a5,a0,1 addi a4,a4,1365 and a5,a5,a4 subw a0,a0,a5 li a5,858992640 srliw a4,a0,2 addi a5,a5,819 and a0,a5,a0 and a5,a5,a4 addw a5,a0,a5 srliw a0,a5,4 addw a0,a0,a5 li a5,252645376 addi a5,a5,-241 and a5,a5,a0 srliw a0,a5,8 addw a5,a0,a5 srliw a0,a5,16 addw a0,a0,a5 andi a0,a0,0xff ret __sw_hweight32: li a5,1431654400 srliw a4,a0,1 addi a5,a5,1365 and a5,a5,a4 subw a0,a0,a5 li a5,858992640 srliw a4,a0,2 addi a5,a5,819 and a0,a5,a0 and a5,a5,a4 addw a5,a0,a5 srliw a0,a5,4 addw a5,a0,a5 li a0,252645376 addi a0,a0,-241 and a5,a0,a5 slliw a0,a5,8 addw a0,a0,a5 slliw a5,a0,16 addw a0,a0,a5 srliw a0,a0,24 ret so that doesn't matter. The second one is really a wash: memchr_inv: ld a5,.LC0 mul a0,a0,a5 ret .rodata .LC0: .dword 72340172838076673 vs memchr_inv: slli a5,a0,8 or a5,a5,a0 slli a0,a5,16 or a0,a0,a5 slli a5,a0,32 or a0,a5,a0 ret It's unlikely that load ends up relaxed, so it's going to be two instructions. That means we have 4 cycles to forward the load and multiply, for a cache hit. IIRC the multiplier on the existing hardware isn't that fast -- GCC lists it as imul as 10 cycles, but I remember it being more like 5 so that might just be an architecture-inaccurate tuning in the generic pipeline model. This is out of the inner loop, so it's probably not all that important anyway. The result isn't used for a while so on a bigger machine it's probably worth picking the smaller code path, but it seems like a very small thing to optimize for either way. I'm actually a bit surprised about this. Do you have benchmarks that indicate ARCH_HAS_FAST_MULTIPLIER is actually faster? Otherwise I guess I'm going to reject this, as it's really more ARCH_HAS_FAST_MULTIPLIER_AND_FAST_LARGE_CONSTANTS than just ARCH_HAS_FAST_MULTIPLIER.