Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp2694973imw; Sun, 17 Jul 2022 14:56:44 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sYmSjxnnHktEnGt6oO3Bk2Ql47a+bdG6XAYm9QyWcUmq0hJf/8Cr2p/zY2emx+Br+owxVm X-Received: by 2002:a63:85c6:0:b0:412:a94a:4ffc with SMTP id u189-20020a6385c6000000b00412a94a4ffcmr22399152pgd.618.1658095004020; Sun, 17 Jul 2022 14:56:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658095004; cv=none; d=google.com; s=arc-20160816; b=UFTLsNOwQE3NDVHfITp2gqnqv0Vl+19ahdfso5rwI5JYRCx2nCcOLuZ3yGQvtUQyg/ /k+4BPf0q74eaUMnALpJ2o0IeRTn2I1HG4wShPYj4SdLiUYe0M7SBrVaGe0gDfmp0tNO bH/ahNojc8e0o3vLHxCgby0nA0xMfulvz9LKoM5DavjsEFwvJm88wDDVVgptEB6rL94R HWp4dNZIlW4Z9C9Z/R4a47LekUyDkWQJndbNWY48JYyYpaW7sFaFVnjy4sGDot9K/siN 8x+SM1QzzeFQE0MKBzorBjgFqu/kKlyD8SRy3Callwh9uaU3jTWjvVV6a5FO9CHXf/Mc cvXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=kRzPJGFbY+ihEhq17hmTH+dK8kmlJtap/eTytjWRCs0=; b=dJIhiXbSSTkHQQAfqK1y0hL7wVxGaKIYEuxEB33z8bSDoODVSMtb/bt7G4F3YtAD5r x/m3VQiN7YGdMbH65wFJ6FD0OAmiahZczJ+IUzPOK2oLLwveFKKG3JD3qvS/w3zZovhO pNbqp+wAf8zzPIPspkQLKLnNe5flamNP6SEYRZzdSXO1WBhlacGiwp819ZoUvVeZtn0h XdhUZGUY/k36kzzeH6npjyyb1TIyPvTive6qY8gCV0qk36MB1+odRgESJeLaxbGsO+Sj IbANA7CSHaiIVA7s9WlTbnBoLvCMq1lV9NpjdiNPCTRut87tZ9s/1EO/w/8x7QeOjZPp ajgg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=z+z7LGsU; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f7-20020a056a00238700b0052b36d3eaecsi11948135pfc.240.2022.07.17.14.56.30; Sun, 17 Jul 2022 14:56:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=z+z7LGsU; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233000AbiGQVyZ (ORCPT + 99 others); Sun, 17 Jul 2022 17:54:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231547AbiGQVyV (ORCPT ); Sun, 17 Jul 2022 17:54:21 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 656CB63C5 for ; Sun, 17 Jul 2022 14:54:20 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1658094857; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=kRzPJGFbY+ihEhq17hmTH+dK8kmlJtap/eTytjWRCs0=; b=z+z7LGsUDkLV++luahZ4DC4XH9e1TUis2cVmFWadaWkg45d246ZQMzQdsfpNhM4w9DZpo1 GczqdROqeFDQFVPzykDyen1zN3FNKWqeGEP4nNOdT6qyGSlGhARUIE3MozgYyjDWKkSn/+ SOAN1rSbw7RULwisRAUR1D65h6pS8EOOMQSoRoYi1riE7qGjYoAgC3X5T1HnWWL5xN+xx4 lCR2UdpBw6R/Xt655pX8Sdb7FTkL4INVMdstLr2R03zMUImGvqPoZwocLccklQDDDDGbRO 62mSQD1hPwyL/rL0xhdZ/MQbEDwSfaTSvVVidcxdLYI8tug3ZipZkoI5Z1Z+pg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1658094857; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=kRzPJGFbY+ihEhq17hmTH+dK8kmlJtap/eTytjWRCs0=; b=JPPZsryaaXeA4jTfNogGPo2zUBWFxTEe5Jn6oUXVA3Pdb5bo2CjVSjiQvnZmxoGYRyi2Te JOtOSIFBUlJuc1Cw== To: Andrew Cooper , LKML Cc: "x86@kernel.org" , Linus Torvalds , Tim Chen , Josh Poimboeuf , Pawan Gupta , Johannes Wikner , Alyssa Milburn , Jann Horn , "H.J. Lu" , Joao Moreira , Joseph Nuzman , Steven Rostedt , Juergen Gross Subject: Re: [patch 02/38] x86/cpu: Use native_wrmsrl() in load_percpu_segment() In-Reply-To: <87r12jwl9l.ffs@tglx> References: <20220716230344.239749011@linutronix.de> <20220716230952.787452088@linutronix.de> <0bec8fe2-d1e3-f01c-6e52-06ab542efdd8@citrix.com> <87zgh7wo91.ffs@tglx> <87tu7fwlhr.ffs@tglx> <87r12jwl9l.ffs@tglx> Date: Sun, 17 Jul 2022 23:54:16 +0200 Message-ID: <87o7xnwgl3.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jul 17 2022 at 22:13, Thomas Gleixner wrote: > On Sun, Jul 17 2022 at 22:08, Thomas Gleixner wrote: >> On Sun, Jul 17 2022 at 21:08, Thomas Gleixner wrote: >> loadsegment_simple() was a red herring. The gs segment is already zero. >> >> So what explodes here is the early boot when switching from early per >> CPU to the real per CPU area. >> >> start_kernel() >> ..... >> setup_per_cpu_areas(); >> smp_prepare_boot_cpu() > > Bah. switch_to_new_gdt() is already invoked from setup_per_cpu_areas() > and then again in smp_prepare_boot_cpu() and once more in cpu_init(), > > What a mess. So the below builds and boots at least on 64bit. I'll stare at it some more tomorrow. I have no idea whether native_load_gdt() works with XEN_PV. It should, but what do I know. Thanks, tglx --- --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -205,7 +205,7 @@ static inline void native_set_ldt(const } } -static inline void native_load_gdt(const struct desc_ptr *dtr) +static __always_inline void native_load_gdt(const struct desc_ptr *dtr) { asm volatile("lgdt %0"::"m" (*dtr)); } --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -670,10 +670,9 @@ extern int sysenter_setup(void); /* Defined in head.S */ extern struct desc_ptr early_gdt_descr; -extern void switch_to_new_gdt(int); +extern void switch_to_real_gdt(int); extern void load_direct_gdt(int); extern void load_fixmap_gdt(int); -extern void load_percpu_segment(int); extern void cpu_init(void); extern void cpu_init_secondary(void); extern void cpu_init_exception_handling(void); --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -7,20 +7,24 @@ ifdef CONFIG_FUNCTION_TRACER CFLAGS_REMOVE_common.o = -pg CFLAGS_REMOVE_perf_event.o = -pg +CFLAGS_REMOVE_switch_gdt.o = -pg endif # If these files are instrumented, boot hangs during the first second. KCOV_INSTRUMENT_common.o := n KCOV_INSTRUMENT_perf_event.o := n +KCOV_INSTRUMENT_switch_gdt.o := n # As above, instrumenting secondary CPU boot code causes boot hangs. KCSAN_SANITIZE_common.o := n +KCSAN_SANITIZE_switch_gdt.o := n -# Make sure load_percpu_segment has no stackprotector -CFLAGS_common.o := -fno-stack-protector +# Make sure that switching the GDT and the per CPU segment +# does not have stack protector enabled. +CFLAGS_switch_gdt.o := -fno-stack-protector obj-y := cacheinfo.o scattered.o topology.o -obj-y += common.o +obj-y += common.o switch_gdt.o obj-y += rdrand.o obj-y += match.o obj-y += bugs.o --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -701,16 +701,6 @@ static const char *table_lookup_model(st __u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); __u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); -void load_percpu_segment(int cpu) -{ -#ifdef CONFIG_X86_32 - loadsegment(fs, __KERNEL_PERCPU); -#else - __loadsegment_simple(gs, 0); - wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu)); -#endif -} - #ifdef CONFIG_X86_32 /* The 32-bit entry code needs to find cpu_entry_area. */ DEFINE_PER_CPU(struct cpu_entry_area *, cpu_entry_area); @@ -738,18 +728,6 @@ void load_fixmap_gdt(int cpu) } EXPORT_SYMBOL_GPL(load_fixmap_gdt); -/* - * Current gdt points %fs at the "master" per-cpu area: after this, - * it's on the real one. - */ -void switch_to_new_gdt(int cpu) -{ - /* Load the original GDT */ - load_direct_gdt(cpu); - /* Reload the per-cpu base */ - load_percpu_segment(cpu); -} - static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {}; static void get_model_name(struct cpuinfo_x86 *c) @@ -2228,12 +2206,6 @@ void cpu_init(void) boot_cpu_has(X86_FEATURE_TSC) || boot_cpu_has(X86_FEATURE_DE)) cr4_clear_bits(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE); - /* - * Initialize the per-CPU GDT with the boot GDT, - * and set up the GDT descriptor: - */ - switch_to_new_gdt(cpu); - if (IS_ENABLED(CONFIG_X86_64)) { loadsegment(fs, 0); memset(cur->thread.tls_array, 0, GDT_ENTRY_TLS_ENTRIES * 8); --- /dev/null +++ b/arch/x86/kernel/cpu/switch_gdt.c @@ -0,0 +1,31 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include + +/* + * Invoked during early boot to switch from early GDT and early per CPU + * (%fs on 32bit, GS_BASE on 64bit) to the real GDT and the runtime per CPU + * area. + * + * This has to be done atomic because after switching from early GDT to + * the real one any per cpu variable access is going to fault because + * %fs resp. GS_BASE is not yet pointing to the real per CPU data. + * + * As a consequence this uses the native variants of load_gdt() and + * wrmsrl(). So XEN_PV has to take the fault and emulate. + */ +void __init switch_to_real_gdt(int cpu) +{ + struct desc_ptr gdt_descr; + + gdt_descr.address = (long)get_cpu_gdt_rw(cpu); + gdt_descr.size = GDT_SIZE - 1; + native_load_gdt(&gdt_descr); + +#ifdef CONFIG_X86_32 + loadsegment(fs, __KERNEL_PERCPU); +#else + native_wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu)); +#endif +} --- a/arch/x86/kernel/setup_percpu.c +++ b/arch/x86/kernel/setup_percpu.c @@ -211,7 +211,7 @@ void __init setup_per_cpu_areas(void) * area. Reload any changed state for the boot CPU. */ if (!cpu) - switch_to_new_gdt(cpu); + switch_to_real_gdt(cpu); } /* indicate the early static arrays will soon be gone */ --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1457,7 +1457,11 @@ void arch_thaw_secondary_cpus_end(void) void __init native_smp_prepare_boot_cpu(void) { int me = smp_processor_id(); - switch_to_new_gdt(me); + + /* SMP invokes this from setup_per_cpu_areas() */ + if (!IS_ENABLED(CONFIG_SMP)) + switch_to_real_gdt(me); + /* already set me in cpu_online_mask in boot_cpu_init() */ cpumask_set_cpu(me, cpu_callout_mask); cpu_set_state_online(me); --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1164,7 +1164,7 @@ static void __init xen_setup_gdt(int cpu pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry_boot; pv_ops.cpu.load_gdt = xen_load_gdt_boot; - switch_to_new_gdt(cpu); + switch_to_real_gdt(cpu); pv_ops.cpu.write_gdt_entry = xen_write_gdt_entry; pv_ops.cpu.load_gdt = xen_load_gdt;