Received: by 2002:ab2:3350:0:b0:1f4:6588:b3a7 with SMTP id o16csp382405lqe; Sat, 6 Apr 2024 05:30:51 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVd86bREMm0q0q+omYkHapLAsqlT2g8dyL8kIAa96PZkOO/dIotPzw/7t22Uqit/d7no/0nEtlGSaqBcxn7rCGKsWqESQCIXyU0pj+Pnw== X-Google-Smtp-Source: AGHT+IEqYUo5IMogHkl65YafGvZyOCNiXpx18YoKQVlktJFJJ/L+tLecmno964gnSLuFi8FYuiso X-Received: by 2002:a05:6a00:2da7:b0:6ea:dfbf:13d4 with SMTP id fb39-20020a056a002da700b006eadfbf13d4mr4270948pfb.18.1712406651367; Sat, 06 Apr 2024 05:30:51 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712406651; cv=pass; d=google.com; s=arc-20160816; b=hOvCWNYoFJoUciff7TrSnCqEDIptWFLJUGCJskj2PM48W2B4phroRss18H6hxSHfS4 Ni+quaMC5/FlvrarIe/ZPfxuBlLZ2jlkBsr9SMkU7qMJvNvVVCbLWSXRNnzz8IhYCr21 V+/N0iQtsRoxJwx2WfPaGkKYfdyq9g4h7dFAp6T4BdJ30jdnrZ4CvYcT3kIVEQ5EG0En jJKNlepNUdn2nx49jH4iCLtlm7WWMrEf+7X9pkLN/RhuyDl5H2hsBXxxdMJPISwBJWsc bwK0sUmykhxNhgFwFZPXq3Tn2gHpEd1RbhJl5VkMcCPdJ3SygZjCUhAJorR7gk1ojn1/ vquA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=W5khUc4+pP3T2YCJkmv/x98K4zoP9YhPJ4PuWSN9uFI=; fh=IEJ0K0X3PJlkm1fJKb4iezXCM0B/ZXpMkTcrRzM94/E=; b=lbTbryuuYtT68RToZr3cAqQGnSV5CMZlM3eHw2mbcpI0t5WxgQXdpQN4B5Xq8QvPYk mYXGN4Qw1dZp+2p88lLWZAuvGYixcwgZ3TRTVTm8wJhBlJ/ZUFJpf0gjO/G19tKInadA OEcoaQ+OWOACxbRdgCE79DH1NephZZRDylwKcxCymQrh+u9AtgQ97s6ach0oQfsBKxFQ dDv0f4jxr3vPXDmlczDXGjen5Vs/wrfsh3s7P8d4e7m7g1Eu0Y7hdAbiWrDVq57DdwUm UGE+fh2NbN1xl7wox7sd9bwMiojXzdGH3zMrRIc5Te+Lp4p5PEoyR7wyrq2i5CdlT3wx UfHg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=IulJYl6T; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-133918-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-133918-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id du19-20020a056a002b5300b006e558b4e179si2970746pfb.235.2024.04.06.05.30.51 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 06 Apr 2024 05:30:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-133918-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=IulJYl6T; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-133918-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-133918-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 08DC7281F35 for ; Sat, 6 Apr 2024 12:30:51 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1859426AD8; Sat, 6 Apr 2024 12:30:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="IulJYl6T" Received: from mail-lj1-f175.google.com (mail-lj1-f175.google.com [209.85.208.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2FD55652 for ; Sat, 6 Apr 2024 12:30:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.175 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712406645; cv=none; b=omdGI+znPL2NlXRUeXYlc42kwEnfQg9Scw6c/6DLu7PtoQsWGDLD+GfHE3QyGxf5U2+JKS+XOdCzqpSjh8zPrc5ROuqsML2Uyz8M7xNxIIq/kM6g4wEMVLuxrqHtwI633nSfXgMfs79afL0ceDwGHRdJ0pXReHWBd/HjyOn6TZI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712406645; c=relaxed/simple; bh=NZj5iiy3z4PFAT+LpVhjkATLvy3m1FXWC4HJtwGQ8Sg=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=L6LF3zuMeuxXf5wfIbA2dr1mJKvpAQyPZbVH59tSfTOwhCM0yogk82RGXOssMz7BQiAQblMM0waCPTFCpp9dtfZZ8fq5bhewCCXB6HZ3wafdmyXYyWy+2MlI3K7ReVECBYE7CWhGouI6l2wSU3+4Nl8aEf+VKCmhQYEmDH+0c9o= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=IulJYl6T; arc=none smtp.client-ip=209.85.208.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lj1-f175.google.com with SMTP id 38308e7fff4ca-2d718efedb2so53101211fa.0 for ; Sat, 06 Apr 2024 05:30:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712406641; x=1713011441; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=W5khUc4+pP3T2YCJkmv/x98K4zoP9YhPJ4PuWSN9uFI=; b=IulJYl6TqRtePBEm6XE1+IhtkH2WAlo7Ejh9QCkRG2+6mauB1QTITr3pHV7US61ph2 YdkOJvFgB1A59CMKtbCyMj73YSW/03OER4wi+GVAmgI1Wy3x6TOm3e8Rx2EliCU4XgMy T1qOzGi2iDx5ST+6tq1Oi7ALZqMbI+0Quv0kNSHgpioc2FijN25kn0IDt2U6Zpsy8LXE 8/ypDNAObKwS26WVL52IgAooaHGtsmP1wSez4m03nZ+XjOTzFZYiuwm4I50tw/xZoCYF C1Vj5iSx/p5xEsQYzh3VLzyDgrvPHrhmPba8kauMtCe94mpZdU4uvSxHFnqUu09vbN0u qDog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712406641; x=1713011441; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=W5khUc4+pP3T2YCJkmv/x98K4zoP9YhPJ4PuWSN9uFI=; b=HJxZgYzsAps9F9JwcAgkab2ExWeoRxcar3b+ryK7c+sWYUYBQmV4XLVJHAzeY9JkMF 460bEOKWVY1gYDkMJu8EU10q0ZothBaS2RM05iwXjMJMoidrIE6XYJFB4J5XR4EY7Hvk qD2VIi+LL8SrqruxifEJcNcCsUMmHqXno7WtBeiXP39nixxFF8U/0RMMAbqmxpyPw34k YVNjGjrfK/d8K61hcPW1QlwEnm8xLHHs/yJo0S+E/Aipet7yZq6+XlXqIDPdEAnS+BKW yjakQhJZh5ri+Ssnm75aJ5qx6ns/b11hP5xi7XZvPS8LUDNJP2Sqi17BeAQXZLhEU0t4 JEgg== X-Forwarded-Encrypted: i=1; AJvYcCXKk7c+dkmjDiO+WXma8TjG5yZGkw2lv9/d47aPvJiKfVlLVBSRXR1A1VTK9cErVqHEJvzSsTTbywdYSZWpcXZOQ3ZTkjIsZpkU0tAv X-Gm-Message-State: AOJu0Yx9GVzqWo2Xi5S2GIlu11iJ0ZPuwMngJzZR0E60PtumSLgOKbuN AykhosDuAnZzkFttXv4IRv80vmhO9XQtwStYltwm638F4grZtGFfy0npl64Qt7xPqAAunxpf/6V 4Btc/roC85qnloOYf9GfPLXx5/FIaJqGQu1h40Q== X-Received: by 2002:a05:651c:49d:b0:2d6:f793:3434 with SMTP id s29-20020a05651c049d00b002d6f7933434mr3495904ljc.2.1712406640903; Sat, 06 Apr 2024 05:30:40 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: In-Reply-To: From: Uros Bizjak Date: Sat, 6 Apr 2024 14:30:38 +0200 Message-ID: Subject: Re: More annoying code generation by clang To: Ingo Molnar Cc: Linus Torvalds , Nick Desaulniers , Nathan Chancellor , Thomas Gleixner , Peter Anvin , "the arch/x86 maintainers" , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, Apr 6, 2024 at 12:56=E2=80=AFPM Ingo Molnar wrot= e: > > > [ I've Cc:-ed a few more people who might be interested in this. ] > [ Copy of Linus's email below. ] > > * Linus Torvalds wrote: > > > So this doesn't really matter in any real life situation, but it > > really grated on me. > > > > Clang has this nasty habit of taking our nice asm constraints, and > > turning them into worst-case garbage. It's been reported a couple of > > times where we use "g" to tell the compiler that pretty much any > > source to the asm works, and then clang takes that to mean "I will > > take that to use 'memory'" even when that makes no sense what-so-ever. > > > > See for example > > > > https://lore.kernel.org/all/CAHk-=3DwgobnShg4c2yyMbk2p=3DU-wmnOmX_0= =3Db3ZY_479Jjey2xw@mail.gmail.com/ > > > > where I was ranting about clang just doing pointlessly stupid things. > > > > However, I found a case where yes, clang does pointlessly stupid > > things, but it's at least _partly_ our fault, and gcc can't generate > > optimal code either. > > > > We have this fairly critical code in __fget_files_rcu() to look up a > > 'struct file *' from an fd, and it does this: > > > > /* Mask is a 0 for invalid fd's, ~0 for valid ones */ > > nospec_mask =3D array_index_mask_nospec(fd, fdt->max_fd= s); > > > > and clang makes a *horrid* mess of it, generating this code: > > > > movl %edi, %r14d > > movq 32(%rbx), %rdx > > movl (%rdx), %eax > > movq %rax, 8(%rsp) > > cmpq 8(%rsp), %r14 > > sbbq %rcx, %rcx > > > > which is just crazy. Notice how it does that "move rax to stack, then > > do the compare against the stack", instead of just using %rax. > > > > In fact, that function shouldn't have a stack frame at all, and the > > only reason it is generated is because of this whole oddity. > > > > All clang's fault, right? > > > > Yeah, mostly. But it turns out that what really messes with clangs > > little head is that the x86 array_index_mask_nospec() function is > > being a bit annoying. > > > > This is what we do: > > > > static __always_inline unsigned long > > array_index_mask_nospec(unsigned long index, > > unsigned long size) > > { > > unsigned long mask; > > > > asm volatile ("cmp %1,%2; sbb %0,%0;" > > :"=3Dr" (mask) > > :"g"(size),"r" (index) > > :"cc"); > > return mask; > > } > > > > and look at the use again: > > > > nospec_mask =3D array_index_mask_nospec(fd, fdt->max_fds); > > > > here all the values are actually 'unsigned int'. So what happens is > > that clang can't just use the fdt->max_fds value *directly* from > > memory, because it needs to be expanded from 32-bit to 64-bit because > > we've made our array_index_mask_nospec() function only work on 64-bit > > 'unsigned long' values. > > > > So it turns out that by massaging this a bit, and making it just be a > > macro - so that the asm can decide that "I can do this in 32-bit" - I > > can get clang to generate much better code. > > > > Clang still absolutely hates the "g" constraint, so to get clang to > > really get this right I have to use "ir" instead of "g". Which is > > wrong. Because gcc does this right, and could use the memory op > > directly. But even gcc cannot do that with our *current* function, > > because of that "the memory value is 32-bit, we require a 64-bit > > value" > > > > Anyway, I can get gcc to generate the right code: > > > > movq 32(%r13), %rdx > > cmp (%rdx),%ebx > > sbb %esi,%esi > > > > which is basically the right code for the six crazy instructions clang > > generates. And if I make the "g" be "ir", I can get clang to generate > > > > movq 32(%rdi), %rcx > > movl (%rcx), %eax > > cmpl %eax, %esi > > sbbl %esi, %esi > > > > which is the same thing, but with that (pointless) load to a register. > > > > And now clang doesn't generate that stack frame at all. > > > > Anyway, this was a long email to explain the odd attached patch. > > > > Comments? Note that this patch is *entirely* untested, I have done > > this purely by looking at the code generation in fs/file.c. FYI, please note that gcc-12 is able to synthesize carry-flag compares on its own: --cut here-- unsigned long foo (unsigned long index, unsigned long size) { return (index <=3D size) ? 0 : -1; } extern unsigned int size; unsigned long bar (unsigned int index) { return (index <=3D size) ? 0 : -1; } --cut here-- gcc-12 -O2: foo: cmpq %rdi, %rsi sbbq %rax, %rax ret bar: cmpl %edi, size(%rip) sbbq %rax, %rax ret Uros. > > > > Linus > > > arch/x86/include/asm/barrier.h | 23 +++++++++-------------- > > 1 file changed, 9 insertions(+), 14 deletions(-) > > > > diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barr= ier.h > > index 66e57c010392..6159d2cbbfde 100644 > > --- a/arch/x86/include/asm/barrier.h > > +++ b/arch/x86/include/asm/barrier.h > > @@ -33,20 +33,15 @@ > > * Returns: > > * 0 - (index < size) > > */ > > -static __always_inline unsigned long array_index_mask_nospec(unsigned = long index, > > - unsigned long size) > > -{ > > - unsigned long mask; > > - > > - asm volatile ("cmp %1,%2; sbb %0,%0;" > > - :"=3Dr" (mask) > > - :"g"(size),"r" (index) > > - :"cc"); > > - return mask; > > -} > > - > > -/* Override the default implementation from linux/nospec.h. */ > > -#define array_index_mask_nospec array_index_mask_nospec > > +#define array_index_mask_nospec(idx,sz) ({ \ > > + typeof((idx)+(sz)) __idx =3D (idx); \ > > + typeof(__idx) __sz =3D (sz); \ > > + typeof(__idx) __mask; \ > > + asm volatile ("cmp %1,%2; sbb %0,%0" \ > > + :"=3Dr" (__mask) \ > > + :"ir"(__sz),"r" (__idx) \ > > + :"cc"); \ > > + __mask; }) > > > > /* Prevent speculative execution past this barrier. */ > > #define barrier_nospec() asm volatile("lfence":::"memory") >