Received: by 2002:a05:6a10:d5a5:0:0:0:0 with SMTP id gn37csp4712577pxb; Tue, 5 Oct 2021 08:48:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzXM0tcc3uTSHxYLmdg+i4wAQXyEpKSr+3RYJoeAxye+zbVYePvFnka13ZxYpM9wYVUyDDn X-Received: by 2002:a17:902:8503:b0:13a:366:8c46 with SMTP id bj3-20020a170902850300b0013a03668c46mr5875342plb.37.1633448893298; Tue, 05 Oct 2021 08:48:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633448893; cv=none; d=google.com; s=arc-20160816; b=d7DduP/zuXfFmE7Dh9Ex1Cjw5VvK/Q/qRCcpTFgKixFI6NovuJmdnbFIKaYKMg4SiP IWCfWcsD1+Q5RadOsFPbz0pqhneL+zLiw2EozSi86cP9xraXUYuu5+yZFAPTDkaPBS9D EQNUzxxScQK7oAZ10nsR39Yuef8M18kFViCxgRzhWSCxGHHgLhrFa9P1hIGYpvCXTmIS 7W6EMHIOiL77W7rlJTybCj7GIIgSHDVSAQL7wJUyBldQVe7R5XdJGVwqYL8PPPiaidVY Vinl5my88ahjnNkOxSTWYDM3iIvttVTzfxt3dSW2HCmlNi5Skep3+6/WSwQK/NTv5ILJ LX/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=vfa0EbbE2laUFMxdghsJxYa09TklBLgSPleTjeSdegg=; b=YclPBDHcSnt1YimG5Y4NMiWlSgQJ4vLbGzAJRwHLJhWEssrrENdxFqhHsI/Xkvnf7Y BiJGmYcB7SU80W7KETEUPN9kcfc7TRrQahrvbjemx5/Pif7kZ4KhRwWxCL/KXV0immeJ UsOU8zU1eDM2M1sJ3RN5ZXO2BW8m/3+C6c6MSSd4PfDJpefeQJNlkeqto1O5+gezTEN4 kLkq6ebF2YPVQQkRIILhmheW4LXncGpIczWh1mFed3O6nTOrTwakXCY+oQBHVLNAp4lp MSkGEShcbbsEM3rAKQVubGqzZIGP3BOD8aAsmz0TozIpaTZhak3a3IGmiNtM3iQ1MTzl w29Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sifive.com header.s=google header.b=cmWdAMXN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h129si24753301pfb.56.2021.10.05.08.47.57; Tue, 05 Oct 2021 08:48:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@sifive.com header.s=google header.b=cmWdAMXN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236255AbhJEPtJ (ORCPT + 99 others); Tue, 5 Oct 2021 11:49:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35874 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235588AbhJEPtI (ORCPT ); Tue, 5 Oct 2021 11:49:08 -0400 Received: from mail-qv1-xf35.google.com (mail-qv1-xf35.google.com [IPv6:2607:f8b0:4864:20::f35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66FCDC061749 for ; Tue, 5 Oct 2021 08:47:17 -0700 (PDT) Received: by mail-qv1-xf35.google.com with SMTP id a9so12326851qvf.0 for ; Tue, 05 Oct 2021 08:47:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=vfa0EbbE2laUFMxdghsJxYa09TklBLgSPleTjeSdegg=; b=cmWdAMXNDayRfgTc2d6YL+ORVqe0v6KFUhuypzwotH4zfOKyyzHpwMq15/O3V5EtPJ P8zOcKZBaMpaRg1YQ8MKCT5dJjzPOXmBoxBFFDpCyDc9sjfli9koIuWczcmP/JGI1eLV V708g+hAiEVi6Wdlne+hyuh/Mcncm1Or/tC3lfOl+9GiYKBkvy0fdcYwfhMeq7X/AMxG eNGShxsIhOXeBYh3zSanWpbqa+ryBOHcDaLzzaJ9HOXVdL/in62kJh/9mBYHYSJW6wLR qm9Z4xqnhU/1iIQQJEkePIifK2g+YEx9uxZeqBwkMPkyXGqxRj4bmHXYmIRDMhc5V0Dg oHaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=vfa0EbbE2laUFMxdghsJxYa09TklBLgSPleTjeSdegg=; b=qk31dr56f6EVIMg5KeMqgeV8HnMItWLXgdujv29sqnyFUnr4SAX1AnJ2gwxujzOFLq CLoAOrgf+ojwySsHmfnPXfyNzlJyBEIZTxuCOzr1/rbp6XQVmCTotR/lJy27u08psOrF V8F6wNYz4iQ9zpBe1HcDPjbMcO/GM4Awq/TnGnp2R3zkRDmu9abhiiMq5aNeOQGkGdrE x0A5o9EQiGoc0jKpMVc09N//ZDlAgn1i9ZX2vPBhZbjJFA/6gFemd3UmQGqH+i6HXvBh on0o6+iCMbHmzkQCiEP4CDV9I4ncRZvOopB2UgmNHVkFGfUVx8KzFClDomTVCXI2esn0 ZofA== X-Gm-Message-State: AOAM530f1KpLJo1zTYzcwkoUAELMKk+/mghNQqIA+EyplW07r4I8O4Vc zwQUljqarDODz3wL7HCwtCnnKXGiitt24/ipi2BkCQ== X-Received: by 2002:a0c:8e45:: with SMTP id w5mr27927140qvb.17.1633448836496; Tue, 05 Oct 2021 08:47:16 -0700 (PDT) MIME-Version: 1.0 References: <0e65c165e3d54a38cbba01603f325dca727274de.1631121222.git.greentime.hu@sifive.com> In-Reply-To: From: Greentime Hu Date: Tue, 5 Oct 2021 23:46:00 +0800 Message-ID: Subject: Re: [RFC PATCH v8 09/21] riscv: Add task switch support for vector To: Ley Foon Tan Cc: Darius Rad , linux-riscv , Linux Kernel Mailing List , Albert Ou , Palmer Dabbelt , Paul Walmsley , Vincent Chen Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Ley Foon Tan =E6=96=BC 2021=E5=B9=B410=E6=9C=885=E6= =97=A5 =E9=80=B1=E4=BA=8C =E4=B8=8A=E5=8D=8810:12=E5=AF=AB=E9=81=93=EF=BC= =9A > > On Mon, Oct 4, 2021 at 8:41 PM Greentime Hu wro= te: > > > > Ley Foon Tan =E6=96=BC 2021=E5=B9=B410=E6=9C=88= 1=E6=97=A5 =E9=80=B1=E4=BA=94 =E4=B8=8A=E5=8D=8810:46=E5=AF=AB=E9=81=93=EF= =BC=9A > > > > > > On Wed, Sep 29, 2021 at 11:54 PM Darius Rad wro= te: > > > > > > > > On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote: > > > > > Darius Rad =E6=96=BC 2021=E5=B9=B49=E6=9C= =8813=E6=97=A5 =E9=80=B1=E4=B8=80 =E4=B8=8B=E5=8D=888:21=E5=AF=AB=E9=81=93= =EF=BC=9A > > > > > > > [....] > > > > > > > > > > > > > > So this will unconditionally enable vector instructions, and al= locate > > > > > > memory for vector state, for all processes, regardless of wheth= er vector > > > > > > instructions are used? > > > > > > > > > > > > > > > > Hi Darius, > > > > > > > > > > Yes, it will enable vector if has_vector() is true. The reason th= at we > > > > > choose to enable and allocate memory for user space program is be= cause > > > > > we also implement some common functions in the glibc such as memc= py > > > > > vector version and it is called very often by every process. So t= hat > > > > > we assume if the user program is running in a CPU with vector ISA > > > > > would like to use vector by default. If we disable it by default = and > > > > > make it trigger the illegal instruction, that might be a burden s= ince > > > > > almost every process will use vector glibc memcpy or something li= ke > > > > > that. > > > > > > > > Do you have any evidence to support the assertion that almost every= process > > > > would use vector operations? One could easily argue that the conve= rse is > > > > true: no existing software uses the vector extension now, so most l= ikely a > > > > process will not be using it. > > > > > > > > > > > > > > > Given the size of the vector state and potential power and perf= ormance > > > > > > implications of enabling the vector engine, it seems like this = should > > > > > > treated similarly to Intel AMX on x86. The full discussion of = that is > > > > > > here: > > > > > > > > > > > > https://lore.kernel.org/lkml/CALCETrW2QHa2TLvnUuVxAAheqcbSZ-5_W= RXtDSAGcbG8N+gtdQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/ > > > > > > > > > > > > The cover letter for recent Intel AMX patches has a summary of = the x86 > > > > > > implementation: > > > > > > > > > > > > https://lore.kernel.org/lkml/20210825155413.19673-1-chang.seok.= bae@intel.com/ > > > > > > > > > > > > If RISC-V were to adopt a similar approach, I think the signifi= cant > > > > > > points are: > > > > > > > > > > > > 1. A process (or thread) must specifically request the desire= to use > > > > > > vector extensions (perhaps with some new arch_prctl() API), > > > > > > > > > > > > 2. The kernel is free to deny permission, perhaps based on > > > > > > administrative rules or for other reasons, and > > > > > > > > > > > > 3. If a process attempts to use vector extensions before doin= g the > > > > > > above, the process will die due to an illegal instruction. > > > > > > > > > > Thank you for sharing this, but I am not sure if we should treat > > > > > vector like AMX on x86. IMHO, compiler might generate code with v= ector > > > > > instructions automatically someday, maybe we should treat vector > > > > > extensions like other extensions. > > > > > If user knows the vector extension is supported in this CPU and h= e > > > > > would like to use it, it seems we should let user use it directly= just > > > > > like other extensions. > > > > > If user don't know it exists or not, user should use the library = API > > > > > transparently and let glibc or other library deal with it. The gl= ibc > > > > > ifunc feature or multi-lib should be able to choose the correct > > > > > implementation. > > > > > > > > What makes me think that the vector extension should be treated lik= e AMX is > > > > that they both (1) have a significant amount of architectural state= , and > > > > (2) likely have a significant power and/or area impact on (non-emul= ated) > > > > designs. > > > > > > > > For example, I think it is possible, maybe even likely, that vector > > > > implementations will have one or more of the following behaviors: > > > > > > > > 1. A single vector unit shared among two or more harts, > > > > > > > > 2. Additional power consumption when the vector unit is enabled a= nd idle > > > > versus not being enabled at all, > > > > > > > > 3. For a system which supports variable operating frequency, a re= duction > > > > in the maximum frequency when the vector unit is enabled, and/or > > > > > > > > 4. The inability to enter low power states and/or delays to low p= ower > > > > states transitions when the vector unit is enabled. > > > > > > > > None of the above constraints apply to more ordinary extensions lik= e > > > > compressed or the various bit manipulation extensions. > > > > > > > > The discussion I linked to has some well reasoned arguments on why > > > > substantial extensions should have a mechanism to request using the= m by > > > > user space. The discussion was in the context of Intel AMX, but ap= plies to > > > > further x86 extensions, and I think should also apply to similar ex= tensions > > > > on RISC-V, like vector here. > > > > > > > There is possible use case where not all cores support vector > > > extension due to size, area and power. > > > Perhaps can have the mechanism or flow to determine the > > > application/thread require vector extension or it specifically reques= t > > > the desire to use > > > vector extensions. Then this app/thread run on cpu with vector > > > extension capability only. > > > > > > > IIRC, we assume all harts has the same ability in Linux because of SMP > > assumption. > > If we have more information of hw capability and we may use this > > information for scheduler to switch the correct process to the correct > > CPU. > > Do you have any idea how to implement it in Linux kernel? Maybe we can > > list in the TODO list. > I think we can refer to other arch implementations as reference: > > 1. ARM64 supports 32-bit thread on asymmetric AArch32 systems. There > is a flag in ELF to check, then start the thread on the core that > supports 32-bit execution. This patchset is merged to mainline 5.15. > https://lore.kernel.org/linux-arm-kernel/20210730112443.23245-8-will@kern= el.org/T/ Wow! This is useful for AMP. > > 2. Link shared by Darius, on-demand request implementation on Intel AMX > https://lore.kernel.org/lkml/20210825155413.19673-1-chang.seok.bae@intel.= com/ > > glibc support optimized library functions with vector, this is enabled > by default if compiler is with vector extension enabled? If yes, then > most of the app required vector core. As I mentioned earlier, glibc ifunc will solve this issue. The Linux/glibc can run on platform with vector or without vector and glibc will use the information get from Linux kernel and using ifunc to decide whether it should use the vector version or not. Which means even your toolchain has vector glibc support and your Linux kernel told the glibc this platform doesn't support vector then the ifunc mechanism will choose the non-vector version ones.