Received: by 10.223.176.46 with SMTP id f43csp1250554wra; Fri, 26 Jan 2018 14:43:42 -0800 (PST) X-Google-Smtp-Source: AH8x2244Lp+1cHWxvk+PyLCsiRw69L0hHkkx0ua9XT9leg9+4c4OZ6hl7c88W2/Ck2IULeUDXCpc X-Received: by 2002:a17:902:50e:: with SMTP id 14-v6mr15273671plf.360.1517006622257; Fri, 26 Jan 2018 14:43:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517006622; cv=none; d=google.com; s=arc-20160816; b=St2eBn4SgmWNtBUrklqbthR5B1g2oxFIE8imx/SZDsfpbL7p93K9HnNgllEgPyH1ZZ tChJxPu7hywvs7tydtcpIgg3Mscx+Wn3fn28etKWSPer75lJsV0b1KO57DJHG92o0Tts cD6cr4e1LIxPmew/mqkPPaM/33/Q8CmRpdqcd2i2Q3rRDRDj3yJi9rjUlAf+2NcCJMq8 /LDzN08LJyaAgTq2zWlYG1aWNUeFDQxy6MUSPT1gUPSfhpajIcSk/lpoBBX0J9RW6SLR FRdkWHI2UlrPgNddCtcgsjXRkMVin3LIE5dr57SS+9wACwLXZWEvUyXdUlKMjO+PZBF5 GS7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:references:in-reply-to:mime-version :dmarc-filter:arc-authentication-results; bh=kUiOszF6MRRPcEGJ/FJ6RHCrlCs9V74CpxHUfdeGtfs=; b=haCO7HzCQQstHS22QYpxCXaIc6JwBZ/0+GdWdMLO7hVW/m4Itl0UwXrP3h3xres2co oQ7nS8JiuuCnbpUcttxHmMlp85xv+jXbQlalurOc3lycK+SH8QyRlb/7hnWQWN+qgjHU npDM/UgwN6Io5mXA73u2GNiIjOJOPUDYGo/qmQcLAhYC4WsOhmmPjRhBGSBZ0GxzcUYW zm2cX5hWeex0oJtC5W9XeU6CWvszSTzI0pNv018EgHe9RYbXtxQlTLpVJId7rhATAJHd AR7pr90MsP+PQl/xbpiSet62hJoClEbPcP/OgikFMOg//l/GEcp89apfZhHIcyZOivn3 KQuQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f11-v6si4489154plm.586.2018.01.26.14.43.27; Fri, 26 Jan 2018 14:43:42 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752022AbeAZWnD convert rfc822-to-8bit (ORCPT + 99 others); Fri, 26 Jan 2018 17:43:03 -0500 Received: from mail.kernel.org ([198.145.29.99]:50924 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751682AbeAZWnC (ORCPT ); Fri, 26 Jan 2018 17:43:02 -0500 Received: from mail-it0-f54.google.com (mail-it0-f54.google.com [209.85.214.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 537782175D for ; Fri, 26 Jan 2018 22:43:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 537782175D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org Received: by mail-it0-f54.google.com with SMTP id b5so2254236itc.3 for ; Fri, 26 Jan 2018 14:43:01 -0800 (PST) X-Gm-Message-State: AKwxyteoFUlxM4A87R58IegJwS7Lno0HudwTmFOtrLK4mesWS9aWRQPu SxgGRg+HBviZ2bvcl9oOqRfQTBt+dr8Fx9qhaEXJvA== X-Received: by 10.36.74.200 with SMTP id k191mr19803772itb.69.1517006580640; Fri, 26 Jan 2018 14:43:00 -0800 (PST) MIME-Version: 1.0 Received: by 10.2.137.84 with HTTP; Fri, 26 Jan 2018 14:42:40 -0800 (PST) In-Reply-To: References: <20180126153631.ha7yc33fj5uhitjo@xps> From: Andy Lutomirski Date: Fri, 26 Jan 2018 14:42:40 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: selftests/x86/fsgsbase_64 test problem To: Andy Lutomirski Cc: Borislav Petkov , "H. Peter Anvin" , Dan Rue , Shuah Khan , Ingo Molnar , Dmitry Safonov , "open list:KERNEL SELFTEST FRAMEWORK" , LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 26, 2018 at 2:38 PM, Andy Lutomirski wrote: > On Fri, Jan 26, 2018 at 11:46 AM, Andy Lutomirski wrote: >> On Fri, Jan 26, 2018 at 10:59 AM, Andy Lutomirski wrote: >>> On Fri, Jan 26, 2018 at 8:22 AM, Andy Lutomirski wrote: >>>> On Fri, Jan 26, 2018 at 7:36 AM, Dan Rue wrote: >>>>> >>>>> We've noticed that fsgsbase_64 can fail intermittently with the >>>>> following error: >>>>> >>>>> [RUN] ARCH_SET_GS(0x0) and clear gs, then schedule to 0x1 >>>>> Before schedule, set selector to 0x1 >>>>> other thread: ARCH_SET_GS(0x1) -- sel is 0x0 >>>>> [FAIL] GS/BASE changed from 0x1/0x0 to 0x0/0x0 >>>>> >>>>> This can be reliably reproduced by running fsgsbase_64 in a loop. i.e. >>>>> >>>>> for i in $(seq 1 10000); do ./fsgsbase_64 || break; done >>>>> >>>>> This problem isn't new - I've reproduced it on latest mainline and every >>>>> release going back to v4.12 (I did not try earlier). This was tested on >>>>> a Supermicro board with a Xeon E3-1220 as well as an Intel Nuc with an >>>>> i3-5010U. >>>>> >>>> >>>> Hmm, I can reproduce it, too. I'll look in a bit. >>> >>> I'm triggering a different error, and I think what's going on is that >>> the kernel doesn't currently re-save GSBASE when a task switches out >>> and that task has save gsbase != 0 and in-register GS == 0. This is >>> arguably a bug, but it's not an infoleak, and fixing it could be a wee >>> bit expensive. I'm not sure what, if anything, to do about this. I >>> suppose I could add some gross perf hackery to the test to detect this >>> case and suppress the error. >>> >>> I can also trigger the problem you're seeing, and I don't know what's >>> up. It may be related to and old problem I've seen that causes signal >>> delivery to sometimes corrupt %gs. It's deterministic, but it depends >>> in some odd way on register state. I can currently reproduce that >>> issue 100% of the time, and I'm trying to see if I can figure out >>> what's happening. >> >> I think it's a CPU bug, and I'm a bit mystified. I can trigger the >> following, plausibly related issue: >> >> Write a program that writes %gs = 1. >> Run that program under gdb >> break in which %gs == 1 >> display/x $gs >> si >> >> Under QEMU TCG, gs stays equal to 1. On native or KVM, on Skylake, it >> changes to 0. >> >> On KVM or native, I do not observe do_debug getting called with %gs == >> 1. On TCG, I do. I don't think that's precisely the problem that's >> causing the test to fail, since the test doesn't use TF or ptrace, but >> I wouldn't be shocked if it's related. >> >> hpa, any insight? >> >> (NB: if you want to play with this as I've described it, you may need >> to make invalid_selector() in ptrace.c always return false. The >> current implementation is too strict and causes problems.) > > Much simpler test. Run the attached program (gs1). It more or less > just sets %gs to 1 and spins until it stops being 1. Do it on a > kernel with the attached patch applied. I see stuff like this: > > # ./gs1 > PID = 129 > [ 15.703015] pid 129 saved gs = 1 > [ 15.703517] pid 129 loaded gs = 1 > [ 15.703973] pid 129 prepare_exit_to_usermode: gs = 1 > ax = 0, cx = 0, dx = 0 > > So we're interrupting the program, switching out, switching back in, > setting %gs to 1, observing that %gs is *still* 1 in > prepare_exit_to_usermode(), returning to usermode, and observing %gs > == 0. > > Presumably what's happening is that the IRET microcode matches the > SDM's pseudocode, which says: > > RETURN-TO-OUTER-PRIVILEGE-LEVEL: > ... > FOR each SegReg in (ES, FS, GS, and DS) > DO > tempDesc ← descriptor cache for SegReg (* hidden part of segment register *) > IF tempDesc(DPL) < CPL AND tempDesc(Type) is data or non-conforming code > THEN (* Segment register invalid *) > SegReg ← NULL; > FI; > OD; > > But this is very odd. The actual permission checks (in the docs for MOV) are: > > IF DS, ES, FS, or GS is loaded with non-NULL selector > THEN > IF segment selector index is outside descriptor table limits > or segment is not a data or readable code segment > or ((segment is a data or nonconforming code segment) > or ((RPL > DPL) and (CPL > DPL)) > THEN #GP(selector); FI; > > ^^^^ > This makes no sense. This says that the data segments cannot be > loaded with MOV. Empirically, it seems like MOV works if CPL <= DPL > and RPL <= DPL, but I haven't checked that hard. Surely Intel meant: ... or ((segment is a data segment or nonconforming code segment) and ((RPL > DPL) or (CPL > DPL)) This would be consistent with the AMD APM #GP condition of "The DS, ES, FS, or GS register was loaded and the segment pointed to was a data or non-conforming code segment, but the RPL or CPL was greater than the DPL." > > IF segment not marked present > THEN #NP(selector); > ELSE > SegmentRegister ← segment selector; > SegmentRegister ← segment descriptor; FI; > FI; > > IF DS, ES, FS, or GS is loaded with NULL selector > THEN > SegmentRegister ← segment selector; > SegmentRegister ← segment descriptor; > ^^^^ > wtf? There is no "segment descriptor". Presumably what actually > gets written to segment.DPL is nonsense. > FI; I think the bug is here. I think that, when writing a NULL selector to DS, ES, FS, or GS, Intel CPUs incorrectly set DPL == RPL, whereas they should set DPL to 3. > > Anyway, I think it's nonsense that user code can load a selector using > MOV that is, in turn, rejected by IRET. I don't suppose Intel would > consider fixing this going forward. > > Borislav, any chance you could run the attached program on an AMD > machine to see what it does?