Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp1055255pxb; Wed, 6 Apr 2022 07:41:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzpCPbCqLE/Fc3c2uUDOyk0ZgBcyro2/9UqHPdxcg6qLJUcswa0MVSGFfOJY/Bnk3nDmqUW X-Received: by 2002:a17:90a:e7cc:b0:1ca:ad40:b285 with SMTP id kb12-20020a17090ae7cc00b001caad40b285mr10163373pjb.91.1649256077038; Wed, 06 Apr 2022 07:41:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649256077; cv=none; d=google.com; s=arc-20160816; b=HNXyCYEUzsktO5J2PZ0iQMO2I0diLkPKO864GyN44vn02GWH45o59wQvoUw/OFYtO1 UNYSl4bz+p6Qr2BDtrqWobgO7sb6iIS+EP4oNzp8JtB2otzVGIGPSXOUguf4a9B+/lVu 80QsULC+9vYAKwq2f0GRohuzrQD/kSTj4ptjtyS1kPrFajPH0deKXomSE2wULFitfjGE z20p7LfsqYBNJlOrrxe2sZOVH+hS44eBJPFguZs5Pz/7PWdGU/fkCOPg7s6lZzRUwZ/w CnBGelOxSLLsZtaX/cr2c9tbkubEeN/guaJWb2NOtIWWyytTQMpK0dIBIVl86p+G6LZV /IsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=jqeyTN5VmpGu+YkzgGWzBs3vkQcUBmTv1MRYBouWcUg=; b=pAWdFpXIlAFL6qjCb4KEeVXs9MShifXu1slWJNl1GOPWzHohf2HyOgajyKM+Kh1ucU ZatpCZcjuKRBoOoz1HHjY/IITS1PS3IaIzmjP9olVBT1Yridjn8+oYmpuyVMBLRDp9eX 1ya54knXdrj/QxJqTY2cLBTDY8CprR447EjJXyu5Wby5ixnZhgTrgCT/PLWruoxrL4BG jrnOL6fr46gMxiT7EohleLmR24K0aXwyEeEAKeU4paLa5QGXN7khPtWlVsUh+DVFOSyr vxTuxy+J3CyZbzGYjG+E/PcrFCVxALUK2luiEkRO2Sgc55JWfnVW2/izapD+GhBQ1jc7 tLDA== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id z10-20020a170902ccca00b001567d82c09asi12212742ple.174.2022.04.06.07.41.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Apr 2022 07:41:17 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 4F71964ECEB; Wed, 6 Apr 2022 05:27:16 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1577702AbiDEXP4 (ORCPT + 99 others); Tue, 5 Apr 2022 19:15:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1392207AbiDENtk (ORCPT ); Tue, 5 Apr 2022 09:49:40 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6A1071A043; Tue, 5 Apr 2022 05:51:42 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 07A13D6E; Tue, 5 Apr 2022 05:51:42 -0700 (PDT) Received: from FVFF77S0Q05N (unknown [10.57.8.234]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5DFE03F5A1; Tue, 5 Apr 2022 05:51:39 -0700 (PDT) Date: Tue, 5 Apr 2022 13:51:30 +0100 From: Mark Rutland To: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Cc: linux-arch@vger.kernel.org, gcc@gcc.gnu.org, catalin.marinas@arm.com, will@kernel.org, marcan@marcan.st, maz@kernel.org, szabolcs.nagy@arm.com, f.fainelli@gmail.com, opendmb@gmail.com, Andrew Pinski , Ard Biesheuvel , Peter Zijlstra , x86@kernel.org, andrew.cooper3@citrix.com, Jeremy Linton Subject: GCC 12 miscompilation of volatile asm (was: Re: [PATCH] arm64/io: Remind compiler that there is a memory side effect) Message-ID: References: <20220401164406.61583-1-jeremy.linton@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi all, [adding kernel folk who work on asm stuff] As a heads-up, GCC 12 (not yet released) appears to erroneously optimize away calls to functions with volatile asm. Szabolcs has raised an issue on the GCC bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160 ... which is a P1 release blocker, and is currently being investigated. Jemery originally reported this as an issue with {readl,writel}_relaxed(), but the underlying problem doesn't have anything to do with those specifically. I'm dumping a bunch of info here largely for posterity / archival, and to find out who (from the kernel side) is willing and able to test proposed compiler fixes, once those are available. I'm happy to do so for aarch64; Peter, I assume you'd be happy to look at the x86 side? This is a generic issue, and I wrote test cases for aarch64 and x86_64. Those are inline later in this mail, and currently you can see them on compiler explorer: aarch64: https://godbolt.org/z/vMczqjYvs x86_64: https://godbolt.org/z/cveff9hq5 My aarch64 test case is: | #define sysreg_read(regname) \ | ({ \ | unsigned long __sr_val; \ | asm volatile( \ | "mrs %0, " #regname "\n" \ | : "=r" (__sr_val)); \ | \ | __sr_val; \ | }) | | #define sysreg_write(regname, __sw_val) \ | do { \ | asm volatile( \ | "msr " #regname ", %0\n" \ | : \ | : "r" (__sw_val)); \ | } while (0) | | #define isb() \ | do { \ | asm volatile( \ | "isb" \ | : \ | : \ | : "memory"); \ | } while (0) | | static unsigned long sctlr_read(void) | { | return sysreg_read(sctlr_el1); | } | | static void sctlr_write(unsigned long val) | { | sysreg_write(sctlr_el1, val); | } | | static void sctlr_rmw(void) | { | unsigned long val; | | val = sctlr_read(); | val |= 1UL << 7; | sctlr_write(val); | } | | void sctlr_read_multiple(void) | { | sctlr_read(); | sctlr_read(); | sctlr_read(); | sctlr_read(); | } | | void sctlr_write_multiple(void) | { | sctlr_write(0); | sctlr_write(0); | sctlr_write(0); | sctlr_write(0); | sctlr_write(0); | } | | void sctlr_rmw_multiple(void) | { | sctlr_rmw(); | sctlr_rmw(); | sctlr_rmw(); | sctlr_rmw(); | } | | void function(void) | { | sctlr_read_multiple(); | sctlr_write_multiple(); | sctlr_rmw_multiple(); | | isb(); | } Per compiler explorer (https://godbolt.org/z/vMczqjYvs) GCC trunk currently compiles this as: | sctlr_rmw: | mrs x0, sctlr_el1 | orr x0, x0, 128 | msr sctlr_el1, x0 | ret | sctlr_read_multiple: | mrs x0, sctlr_el1 | mrs x0, sctlr_el1 | mrs x0, sctlr_el1 | mrs x0, sctlr_el1 | ret | sctlr_write_multiple: | mov x0, 0 | msr sctlr_el1, x0 | msr sctlr_el1, x0 | msr sctlr_el1, x0 | msr sctlr_el1, x0 | msr sctlr_el1, x0 | ret | sctlr_rmw_multiple: | ret | function: | isb | ret Whereas GCC 11.2 compiles this as: | sctlr_rmw: | mrs x0, sctlr_el1 | orr x0, x0, 128 | msr sctlr_el1, x0 | ret | sctlr_read_multiple: | mrs x0, sctlr_el1 | mrs x0, sctlr_el1 | mrs x0, sctlr_el1 | mrs x0, sctlr_el1 | ret | sctlr_write_multiple: | mov x0, 0 | msr sctlr_el1, x0 | msr sctlr_el1, x0 | msr sctlr_el1, x0 | msr sctlr_el1, x0 | msr sctlr_el1, x0 | ret | sctlr_rmw_multiple: | stp x29, x30, [sp, -16]! | mov x29, sp | bl sctlr_rmw | bl sctlr_rmw | bl sctlr_rmw | bl sctlr_rmw | ldp x29, x30, [sp], 16 | ret | function: | stp x29, x30, [sp, -16]! | mov x29, sp | bl sctlr_read_multiple | bl sctlr_write_multiple | bl sctlr_rmw_multiple | isb | ldp x29, x30, [sp], 16 | ret My x86_64 test case is: | unsigned long rdmsr(unsigned long reg) | { | unsigned int lo, hi; | | asm volatile( | "rdmsr" | : "=d" (hi), "=a" (lo) | : "c" (reg) | ); | | return ((unsigned long)hi << 32) | lo; | } | | void wrmsr(unsigned long reg, unsigned long val) | { | unsigned int lo = val; | unsigned int hi = val >> 32; | | asm volatile( | "wrmsr" | : | : "d" (hi), "a" (lo), "c" (reg) | ); | } | | void msr_rmw_set_bits(unsigned long reg, unsigned long bits) | { | unsigned long val; | | val = rdmsr(reg); | val |= bits; | wrmsr(reg, val); | } | | void func_with_msr_side_effects(unsigned long reg) | { | msr_rmw_set_bits(reg, 1UL << 0); | msr_rmw_set_bits(reg, 1UL << 1); | msr_rmw_set_bits(reg, 1UL << 2); | msr_rmw_set_bits(reg, 1UL << 3); | } Per compiler explorer (https://godbolt.org/z/cveff9hq5) GCC trunk currently compiles this as: | msr_rmw_set_bits: | mov rcx, rdi | rdmsr | sal rdx, 32 | mov eax, eax | or rax, rsi | or rax, rdx | mov rdx, rax | shr rdx, 32 | wrmsr | ret | func_with_msr_side_effects: | ret While GCC 11.2 compiles that as: | msr_rmw_set_bits: | mov rcx, rdi | rdmsr | sal rdx, 32 | mov eax, eax | or rax, rsi | or rax, rdx | mov rdx, rax | shr rdx, 32 | wrmsr | ret | func_with_msr_side_effects: | push rbp | push rbx | mov rbx, rdi | mov rbp, rsi | call msr_rmw_set_bits | mov rsi, rbp | mov rdi, rbx | call msr_rmw_set_bits | mov rsi, rbp | mov rdi, rbx | call msr_rmw_set_bits | mov rsi, rbp | mov rdi, rbx | call msr_rmw_set_bits | pop rbx | pop rbp | ret Thanks, Mark.