Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp5990099rdb; Thu, 14 Dec 2023 05:41:14 -0800 (PST) X-Google-Smtp-Source: AGHT+IGz2hJMxITIlMD/cbIBFUJiqhWE33n7PSgPG9X9+EBKz6+VAMXG4s5IwxB/TfYkqhFY+H/F X-Received: by 2002:a17:902:b688:b0:1d0:6cfd:d3c4 with SMTP id c8-20020a170902b68800b001d06cfdd3c4mr8609385pls.17.1702561274327; Thu, 14 Dec 2023 05:41:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702561274; cv=none; d=google.com; s=arc-20160816; b=opqXnUW8HMSsU1MnessK0bDKIWbZCYfgwU2ZobRGJK/ZJaGpqtQm+vDAobhpFRGPsU etraIwXpJgY9YJ0Pk5mNywu5xrTEAj92BE2PIhblEWgpG+jccJy58UGtZIFqMc6ejgE2 pogQntRXrw1Cqi+uRpicxQ7dNNTiejXq7W8+7ZdFYQcpz6QMuW/mW2zQ+73hSbOegPHp F3r5sg9pF9o4XJnBipLWlsLCLZDvknuFvNUBpuxhRVBif/RRzMgCbuej1tmMrhIEsz7f 7zgpxl3M4ffv1Z6hBda1Q3znFKqQedOFCZjrRufP3i7QunC5Iubc54nniY5RmYKGu5y2 p6zQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=oMsV46kKSgMom3RnbClZVDCGzqljAEUvAg5OZ0hHb+s=; fh=4gORYhFSYoqka+FQpeWyYrWRu87SIJkuZ+H/I6XZHP8=; b=E3FmcLsF1hIELVDQt0NggTn4+JNWevmVKx0GwX0+lh6dVOJgiNO/elTN9D26GRDLQK SE1unYdZMCNM6/e51d3mKqIRMlmNItLxE/Z0fssAT/KfmUqg81gSrCDVJtUFBFiCgq4O BNWc4QHokTiScJ0eNFhGA+bwY+6PLY9Hu9t35ivxfXWYhpPJCBz4KCil/8MDrwsR+x5d r0JJ/Ae7yDUrQchb+zqN8A0h5eTNPrWnyvqDpyRU0n2O3t7mYpX2Q2r7F8q3xbsdaFp8 Io4vXiPkAWhVYYbPXBhP0Sh+FWfFVNehjwY350i0ZZTA83nq+sszDCYcQBFO4WB2w5bX VEJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=m1CROop2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id n15-20020a170903110f00b001d0891f986asi11425826plh.525.2023.12.14.05.41.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 05:41:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=m1CROop2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 3F2D9802401F; Thu, 14 Dec 2023 05:41:12 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229602AbjLNNlA (ORCPT + 99 others); Thu, 14 Dec 2023 08:41:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55312 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1573280AbjLNNKj (ORCPT ); Thu, 14 Dec 2023 08:10:39 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3AAA3129 for ; Thu, 14 Dec 2023 05:10:41 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CD9B4C433CD for ; Thu, 14 Dec 2023 13:10:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1702559440; bh=DlgxjSeiJJTnnWWZhlter7Y+dJ5cGY4p6x7pxDXTL9M=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=m1CROop2SygAcnUtsy9DFbpmHtKOQfx0QxjSvdGybM9re4GHKNRfTclRAN0iNTFzZ dgk+8stGhHKqJFfbmCJa+OAQfa+x3zW/0Pl2K1gaOXU5dLU8buH3v7RRWRi07z2a2e gyvryA73q0Xiw1B6BuVF4ZZpIeIYLMmrfnaa+3kTXQLe3TBzmubXKG87HxWkuXm/jK XrF8TVyEiXN2iZpjNu9K5EzZ0ScpovS5J48nkgAjV03jJrmn0LCVhWREsT6RJ3Q8El px+H1nnmqLndd5nm4S2n5jGgkD02wYQ0cURypfntC5YLj4+xLVXZlv5GEozrzLtbA/ AqxuMx136/xkg== Received: by mail-ed1-f51.google.com with SMTP id 4fb4d7f45d1cf-5527ee1b5c3so719532a12.1 for ; Thu, 14 Dec 2023 05:10:40 -0800 (PST) X-Gm-Message-State: AOJu0YyrvQ82geBalniAhd5PnGIpy2k+KSIaMzifJ6x2ZspIfB4h6Kl9 9FJ5/smqu7T7Qdt6xGyHZhAVhEvntmTaIP9Wl0c= X-Received: by 2002:a17:906:14c:b0:a22:ebf1:1c9f with SMTP id 12-20020a170906014c00b00a22ebf11c9fmr3714768ejh.72.1702559439263; Thu, 14 Dec 2023 05:10:39 -0800 (PST) MIME-Version: 1.0 References: <20231214130206.21219-1-xry111@xry111.site> In-Reply-To: <20231214130206.21219-1-xry111@xry111.site> From: Huacai Chen Date: Thu, 14 Dec 2023 21:10:27 +0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] LoongArch: Micro-optimize sc_save_fcc and sc_restore_fcc for LA464 To: Xi Ruoyao Cc: WANG Xuerui , loongarch@lists.linux.dev, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 14 Dec 2023 05:41:12 -0800 (PST) Emmm, I want to keep the code simpler. :) Huacai On Thu, Dec 14, 2023 at 9:02=E2=80=AFPM Xi Ruoyao wrot= e: > > On LA464 movcf2gr is 7 times slower than movcf2fr + movfr2gr, and > movgr2cf is 15 times (!) slower than movgr2fr + movfr2cf. > > On LA664 movcf2fr + movfr2gr has a similar performance with movcf2gr, > and movgr2fr + movfr2cf has a similar performance with movgr2cf. > > To use FP registers in sc_save_fcc and sc_restore_fcc we need to save > FP/LSX/LASX registers before sc_save_fcc, and restore FP/LSX/LASX > registers after sc_restore_fcc. > > Signed-off-by: Xi Ruoyao > --- > arch/loongarch/kernel/fpu.S | 94 +++++++++++++++++++++---------------- > 1 file changed, 54 insertions(+), 40 deletions(-) > > diff --git a/arch/loongarch/kernel/fpu.S b/arch/loongarch/kernel/fpu.S > index d53ab10f4644..ecb127f9a673 100644 > --- a/arch/loongarch/kernel/fpu.S > +++ b/arch/loongarch/kernel/fpu.S > @@ -96,43 +96,57 @@ > .endm > > .macro sc_save_fcc base, tmp0, tmp1 > - movcf2gr \tmp0, $fcc0 > - move \tmp1, \tmp0 > - movcf2gr \tmp0, $fcc1 > - bstrins.d \tmp1, \tmp0, 15, 8 > - movcf2gr \tmp0, $fcc2 > - bstrins.d \tmp1, \tmp0, 23, 16 > - movcf2gr \tmp0, $fcc3 > - bstrins.d \tmp1, \tmp0, 31, 24 > - movcf2gr \tmp0, $fcc4 > - bstrins.d \tmp1, \tmp0, 39, 32 > - movcf2gr \tmp0, $fcc5 > - bstrins.d \tmp1, \tmp0, 47, 40 > - movcf2gr \tmp0, $fcc6 > - bstrins.d \tmp1, \tmp0, 55, 48 > - movcf2gr \tmp0, $fcc7 > - bstrins.d \tmp1, \tmp0, 63, 56 > - EX st.d \tmp1, \base, 0 > + movcf2fr ft0, $fcc0 > + movcf2fr ft1, $fcc1 > + movfr2gr.s \tmp0, ft0 > + movfr2gr.s \tmp1, ft1 > + EX st.b \tmp0, \base, 0 > + EX st.b \tmp0, \base, 8 > + movcf2fr ft0, $fcc2 > + movcf2fr ft1, $fcc3 > + movfr2gr.s \tmp0, ft0 > + movfr2gr.s \tmp1, ft1 > + EX st.b \tmp0, \base, 16 > + EX st.b \tmp0, \base, 24 > + movcf2fr ft0, $fcc3 > + movcf2fr ft1, $fcc4 > + movfr2gr.s \tmp0, ft0 > + movfr2gr.s \tmp1, ft1 > + EX st.b \tmp0, \base, 32 > + EX st.b \tmp0, \base, 40 > + movcf2fr ft0, $fcc5 > + movcf2fr ft1, $fcc6 > + movfr2gr.s \tmp0, ft0 > + movfr2gr.s \tmp1, ft1 > + EX st.b \tmp0, \base, 48 > + EX st.b \tmp0, \base, 56 > .endm > > .macro sc_restore_fcc base, tmp0, tmp1 > - EX ld.d \tmp0, \base, 0 > - bstrpick.d \tmp1, \tmp0, 7, 0 > - movgr2cf $fcc0, \tmp1 > - bstrpick.d \tmp1, \tmp0, 15, 8 > - movgr2cf $fcc1, \tmp1 > - bstrpick.d \tmp1, \tmp0, 23, 16 > - movgr2cf $fcc2, \tmp1 > - bstrpick.d \tmp1, \tmp0, 31, 24 > - movgr2cf $fcc3, \tmp1 > - bstrpick.d \tmp1, \tmp0, 39, 32 > - movgr2cf $fcc4, \tmp1 > - bstrpick.d \tmp1, \tmp0, 47, 40 > - movgr2cf $fcc5, \tmp1 > - bstrpick.d \tmp1, \tmp0, 55, 48 > - movgr2cf $fcc6, \tmp1 > - bstrpick.d \tmp1, \tmp0, 63, 56 > - movgr2cf $fcc7, \tmp1 > + EX ld.b \tmp0, \base, 0 > + EX ld.b \tmp1, \base, 8 > + movgr2fr.w ft0, \tmp0 > + movgr2fr.w ft1, \tmp1 > + movfr2cf $fcc0, ft0 > + movfr2cf $fcc1, ft1 > + EX ld.b \tmp0, \base, 16 > + EX ld.b \tmp1, \base, 24 > + movgr2fr.w ft0, \tmp0 > + movgr2fr.w ft1, \tmp1 > + movfr2cf $fcc2, ft0 > + movfr2cf $fcc3, ft1 > + EX ld.b \tmp0, \base, 32 > + EX ld.b \tmp1, \base, 40 > + movgr2fr.w ft0, \tmp0 > + movgr2fr.w ft1, \tmp1 > + movfr2cf $fcc4, ft0 > + movfr2cf $fcc5, ft1 > + EX ld.b \tmp0, \base, 48 > + EX ld.b \tmp1, \base, 56 > + movgr2fr.w ft0, \tmp0 > + movgr2fr.w ft1, \tmp1 > + movfr2cf $fcc6, ft0 > + movfr2cf $fcc7, ft1 > .endm > > .macro sc_save_fcsr base, tmp0 > @@ -449,9 +463,9 @@ SYM_FUNC_END(_init_fpu) > * a2: fcsr > */ > SYM_FUNC_START(_save_fp_context) > - sc_save_fcc a1 t1 t2 > sc_save_fcsr a2 t1 > sc_save_fp a0 > + sc_save_fcc a1 t1 t2 > li.w a0, 0 # success > jr ra > SYM_FUNC_END(_save_fp_context) > @@ -462,8 +476,8 @@ SYM_FUNC_END(_save_fp_context) > * a2: fcsr > */ > SYM_FUNC_START(_restore_fp_context) > - sc_restore_fp a0 > sc_restore_fcc a1 t1 t2 > + sc_restore_fp a0 > sc_restore_fcsr a2 t1 > li.w a0, 0 # success > jr ra > @@ -475,9 +489,9 @@ SYM_FUNC_END(_restore_fp_context) > * a2: fcsr > */ > SYM_FUNC_START(_save_lsx_context) > - sc_save_fcc a1, t0, t1 > sc_save_fcsr a2, t0 > sc_save_lsx a0 > + sc_save_fcc a1, t0, t1 > li.w a0, 0 # success > jr ra > SYM_FUNC_END(_save_lsx_context) > @@ -488,8 +502,8 @@ SYM_FUNC_END(_save_lsx_context) > * a2: fcsr > */ > SYM_FUNC_START(_restore_lsx_context) > - sc_restore_lsx a0 > sc_restore_fcc a1, t1, t2 > + sc_restore_lsx a0 > sc_restore_fcsr a2, t1 > li.w a0, 0 # success > jr ra > @@ -501,9 +515,9 @@ SYM_FUNC_END(_restore_lsx_context) > * a2: fcsr > */ > SYM_FUNC_START(_save_lasx_context) > - sc_save_fcc a1, t0, t1 > sc_save_fcsr a2, t0 > sc_save_lasx a0 > + sc_save_fcc a1, t0, t1 > li.w a0, 0 # success > jr ra > SYM_FUNC_END(_save_lasx_context) > @@ -514,8 +528,8 @@ SYM_FUNC_END(_save_lasx_context) > * a2: fcsr > */ > SYM_FUNC_START(_restore_lasx_context) > - sc_restore_lasx a0 > sc_restore_fcc a1, t1, t2 > + sc_restore_lasx a0 > sc_restore_fcsr a2, t1 > li.w a0, 0 # success > jr ra > -- > 2.43.0 >