Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp44490ybx; Wed, 6 Nov 2019 12:58:24 -0800 (PST) X-Google-Smtp-Source: APXvYqw4DnLt6UeIwI+RJifAGtlmZ/S0LNXMWoPwXQNoNLS0ZDtxbUuneNxKXX6ksNlJzham2gNn X-Received: by 2002:a17:906:79c9:: with SMTP id m9mr35655047ejo.297.1573073904376; Wed, 06 Nov 2019 12:58:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573073904; cv=none; d=google.com; s=arc-20160816; b=JE40QegoMb1aSEaocteScsgL0YQsWq5/BhIea5ejr1pdOwFrxKlOqhCOf4Ab509w8X ubZsvXbL8ZlgxVy025DqIDuQDiUo7UyHqB5NqhfnOL2vUM+ckW/ef9UMnv5IQwIo3TSU Qj/gZUfspCY9N/6p79dR6fQiRukHs64BADxM2OPwXqAGNjkc5t3wDGCeUfFEWXSP9RPu AFILJs0Q1WFq/3B6u/QAKXpbzUMOMIhBpmTDJYnBalS0c+h4qSJl7Q8Qn9hDw7LXOjCP yZ6me2/IMDzsPDAeE7yziHpuAu/cob8xov17kupBbp/WfKdQtSijg+7+E8Nb7Fqc7+/d 9/Tg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:subject:cc:to :from:date:user-agent:message-id; bh=HPP8SmLfEs66S70lZWSMR30lQkjZqoTSGS5Tet9MdCM=; b=SyYGSpn4ox07R7Sd9WrJk5g2GN+mRVSXmkKwRgbRl1aSZwAQSYPVUW91CJ4hTdv/aM 1/QspKW053c2UYHgvFVNoXoBDbIkzstbCVYB1tFth6Q0nOGL3dijdcEguUEKfg4HH+iZ kzw2XhL/bYY9wMSf4kB8woT922xKqC3TsLnufOPsRRmzOA8epf5xzkLd1C2UXCOm061l ZN5N305Dsn/4NCO8qvIOegtF/b7IjMZPKN40WkmD7PBoqLcoal1bxgFaNFGHEsDrNUcu 6zawg4vqKJepI3wqJSB9JrJMWHIbaFCqK7mN9WTeQdYStqiiQaxBz/DfS9v5Koa5rt+e l+Vg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e15si13526793eda.106.2019.11.06.12.57.46; Wed, 06 Nov 2019 12:58:24 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732581AbfKFU4v (ORCPT + 99 others); Wed, 6 Nov 2019 15:56:51 -0500 Received: from Galois.linutronix.de ([193.142.43.55]:45264 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732114AbfKFU4s (ORCPT ); Wed, 6 Nov 2019 15:56:48 -0500 Received: from localhost ([127.0.0.1] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1iSSM8-00032l-BO; Wed, 06 Nov 2019 21:56:44 +0100 Message-Id: <20191106202806.133597409@linutronix.de> User-Agent: quilt/0.65 Date: Wed, 06 Nov 2019 20:35:03 +0100 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Stephen Hemminger , Willy Tarreau , Juergen Gross , Sean Christopherson , Linus Torvalds , "H. Peter Anvin" Subject: [patch 4/9] x86/io: Speedup schedule out of I/O bitmap user References: <20191106193459.581614484@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org There is no requirement to update the TSS I/O bitmap when a thread using it is scheduled out and the incoming thread does not use it. For the permission check based on the TSS I/O bitmap the CPU calculates the memory location of the I/O bitmap by the address of the TSS and the io_bitmap_base member of the tss_struct. The easiest way to invalidate the I/O bitmap is to switch the offset to an address outside of the TSS limit. If an I/O instruction is issued from user space the TSS limit causes #GP to be raised in the same was as valid I/O bitmap with all bits set to 1 would do. This removes the extra work when an I/O bitmap using task is scheduled out and puts the burden on the rare I/O bitmap users when they are scheduled in. Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/processor.h | 38 +++++++++++++++++-------- arch/x86/kernel/cpu/common.c | 3 + arch/x86/kernel/doublefault.c | 2 - arch/x86/kernel/process.c | 59 +++++++++++++++++++++------------------ 4 files changed, 61 insertions(+), 41 deletions(-) --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -330,8 +330,23 @@ struct x86_hw_tss { #define IO_BITMAP_BITS 65536 #define IO_BITMAP_BYTES (IO_BITMAP_BITS/8) #define IO_BITMAP_LONGS (IO_BITMAP_BYTES/sizeof(long)) -#define IO_BITMAP_OFFSET (offsetof(struct tss_struct, io_bitmap) - offsetof(struct tss_struct, x86_tss)) -#define INVALID_IO_BITMAP_OFFSET 0x8000 + +#define IO_BITMAP_OFFSET_VALID \ + (offsetof(struct tss_struct, io_bitmap) - \ + offsetof(struct tss_struct, x86_tss)) + +/* + * sizeof(unsigned long) coming from an extra "long" at the end + * of the iobitmap. + * + * -1? seg base+limit should be pointing to the address of the + * last valid byte + */ +#define __KERNEL_TSS_LIMIT \ + (IO_BITMAP_OFFSET_VALID + IO_BITMAP_BYTES + sizeof(unsigned long) - 1) + +/* Base offset outside of TSS_LIMIT so unpriviledged IO causes #GP */ +#define IO_BITMAP_OFFSET_INVALID (__KERNEL_TSS_LIMIT + 1) struct entry_stack { unsigned long words[64]; @@ -350,6 +365,15 @@ struct tss_struct { struct x86_hw_tss x86_tss; /* + * Store the dirty size of the last io bitmap offender. The next + * one will have to do the cleanup as the switch out to a non + * io bitmap user will just set x86_tss.io_bitmap_base to a value + * outside of the TSS limit. So for sane tasks there is no need + * to actually touch the io_bitmap at all. + */ + unsigned int io_bitmap_prev_max; + + /* * The extra 1 is there because the CPU will access an * additional byte beyond the end of the IO permission * bitmap. The extra byte must be all 1 bits, and must @@ -360,16 +384,6 @@ struct tss_struct { DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw); -/* - * sizeof(unsigned long) coming from an extra "long" at the end - * of the iobitmap. - * - * -1? seg base+limit should be pointing to the address of the - * last valid byte - */ -#define __KERNEL_TSS_LIMIT \ - (IO_BITMAP_OFFSET + IO_BITMAP_BYTES + sizeof(unsigned long) - 1) - /* Per CPU interrupt stacks */ struct irq_stack { char stack[IRQ_STACK_SIZE]; --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1863,7 +1863,8 @@ void cpu_init(void) /* Initialize the TSS. */ tss_setup_ist(tss); - tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET; + tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_INVALID; + tss->io_bitmap_prev_max = 0; memset(tss->io_bitmap, 0xff, sizeof(tss->io_bitmap)); set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss); --- a/arch/x86/kernel/doublefault.c +++ b/arch/x86/kernel/doublefault.c @@ -54,7 +54,7 @@ struct x86_hw_tss doublefault_tss __cach .sp0 = STACK_START, .ss0 = __KERNEL_DS, .ldt = 0, - .io_bitmap_base = INVALID_IO_BITMAP_OFFSET, + .io_bitmap_base = IO_BITMAP_OFFSET_INVALID, .ip = (unsigned long) doublefault_fn, /* 0x2 bit is always set */ --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -72,18 +72,9 @@ #ifdef CONFIG_X86_32 .ss0 = __KERNEL_DS, .ss1 = __KERNEL_CS, - .io_bitmap_base = INVALID_IO_BITMAP_OFFSET, #endif + .io_bitmap_base = IO_BITMAP_OFFSET_INVALID, }, -#ifdef CONFIG_X86_32 - /* - * Note that the .io_bitmap member must be extra-big. This is because - * the CPU will access an additional byte beyond the end of the IO - * permission bitmap. The extra byte must be all 1 bits, and must - * be within the limit. - */ - .io_bitmap = { [0 ... IO_BITMAP_LONGS] = ~0 }, -#endif }; EXPORT_PER_CPU_SYMBOL(cpu_tss_rw); @@ -112,18 +103,18 @@ void exit_thread(struct task_struct *tsk struct thread_struct *t = &tsk->thread; unsigned long *bp = t->io_bitmap_ptr; struct fpu *fpu = &t->fpu; + struct tss_struct *tss; if (bp) { - struct tss_struct *tss = &per_cpu(cpu_tss_rw, get_cpu()); + preempt_disable(); + tss = this_cpu_ptr(&cpu_tss_rw); t->io_bitmap_ptr = NULL; - clear_thread_flag(TIF_IO_BITMAP); - /* - * Careful, clear this in the TSS too: - */ - memset(tss->io_bitmap, 0xff, t->io_bitmap_max); t->io_bitmap_max = 0; - put_cpu(); + clear_thread_flag(TIF_IO_BITMAP); + /* Invalidate the io bitmap base in the TSS */ + tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_INVALID; + preempt_enable(); kfree(bp); } @@ -363,29 +354,43 @@ void arch_setup_new_exec(void) } } -static inline void switch_to_bitmap(struct thread_struct *prev, - struct thread_struct *next, +static inline void switch_to_bitmap(struct thread_struct *next, unsigned long tifp, unsigned long tifn) { struct tss_struct *tss = this_cpu_ptr(&cpu_tss_rw); if (tifn & _TIF_IO_BITMAP) { /* - * Copy the relevant range of the IO bitmap. - * Normally this is 128 bytes or less: + * Copy at least the size of the incoming tasks bitmap + * which covers the last permitted I/O port. + * + * If the previous task which used an io bitmap had more + * bits permitted, then the copy needs to cover those as + * well so they get turned off. */ memcpy(tss->io_bitmap, next->io_bitmap_ptr, - max(prev->io_bitmap_max, next->io_bitmap_max)); + max(tss->io_bitmap_prev_max, next->io_bitmap_max)); + + /* Store the new max and set io_bitmap_base valid */ + tss->io_bitmap_prev_max = next->io_bitmap_max; + tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_VALID; + /* - * Make sure that the TSS limit is correct for the CPU - * to notice the IO bitmap. + * Make sure that the TSS limit is covering the io bitmap. + * It might have been cut down by a VMEXIT to 0x67 which + * would cause a subsequent I/O access from user space to + * trigger a #GP because tbe bitmap is outside the TSS + * limit. */ refresh_tss_limit(); } else if (tifp & _TIF_IO_BITMAP) { /* - * Clear any possible leftover bits: + * Do not touch the bitmap. Let the next bitmap using task + * deal with the mess. Just make the io_bitmap_base invalid + * by moving it outside the TSS limit so any subsequent I/O + * access from user space will trigger a #GP. */ - memset(tss->io_bitmap, 0xff, prev->io_bitmap_max); + tss->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET_INVALID; } } @@ -599,7 +604,7 @@ void __switch_to_xtra(struct task_struct tifn = READ_ONCE(task_thread_info(next_p)->flags); tifp = READ_ONCE(task_thread_info(prev_p)->flags); - switch_to_bitmap(prev, next, tifp, tifn); + switch_to_bitmap(next, tifp, tifn); propagate_user_return_notify(prev_p, next_p);