Received: by 10.223.176.46 with SMTP id f43csp1247354wra; Fri, 26 Jan 2018 14:39:30 -0800 (PST) X-Google-Smtp-Source: AH8x225giDRVrUmQcn2TPgC4pOiaK7go+R4vOe4252+K8hQiA5V2hXpAy6uyyLqkTKeGRFO7KY/z X-Received: by 10.98.232.14 with SMTP id c14mr20669481pfi.215.1517006370000; Fri, 26 Jan 2018 14:39:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517006369; cv=none; d=google.com; s=arc-20160816; b=bo8cdZQnWoKaNTMaE628LD4RaJ3FOgieZZN5dqF2KvI23LtSgo90hfbgK55O8/tkQD tWJq6f/bJBZGPkhMIe23LuvEC30TQEwC0FH/8HcTiRl5pqbSwRQcQfF9viPJLej+GEyj bhTRtQzObyAcFK10R8ssLNCi0OrO2vgqywU6ziBDFBUB+q75RfnWqZ7gz7YfH+u+Zhyb sj34BOnEFU/aOqtMv1VU6YPJ5v3ZsPpOGsG734YyVWxocsIy1ccvcuZi6XON4p0sEvpi j74N3965oppoptMPE3PXk5K7ikX+noqrYJKwLTOMdUk5Au5iZ4qZ8d313wTt3249wsqZ xnUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dmarc-filter :arc-authentication-results; bh=deHNAjvae1UKMj4tJbW2sER/q6tTSJltVKLrGb1Yh9k=; b=gwQRb0OJpwVQMZS+QaDwF4UjRvKeR/t885MB72rYf0V/cibvdKCzka36ahyHaCmIGv B1+gKtaLCn7PWIocCxLG/WMdLwnTqxC84EPzfW9Rn9CVi7pbj4+QxpVDIIEjxWMRdgVO jFa5nTXVTf3A8KFgJAG6GJt0I1DE/ZI9AKtEM8agvAJ2ZhMSjGo8Kyn/AeNIdu+aDErp EBMXQPrD8GHTrle9BqkaiACjbdTGEl7BNiiCUY8ri/7o+i8BJQQu7nsQ+zSOuMUO79R1 5GI3aabRBgc9s4ANQJ5IOqFKVrt+4V4A8b/YLPhCClt4WCgO+UpNr8+sO3VUnJx+cbFw kjNQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d9si3551199pgp.285.2018.01.26.14.39.15; Fri, 26 Jan 2018 14:39:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752102AbeAZWix (ORCPT + 99 others); Fri, 26 Jan 2018 17:38:53 -0500 Received: from mail.kernel.org ([198.145.29.99]:50576 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751698AbeAZWiw (ORCPT ); Fri, 26 Jan 2018 17:38:52 -0500 Received: from mail-io0-f181.google.com (mail-io0-f181.google.com [209.85.223.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 8E649217AB for ; Fri, 26 Jan 2018 22:38:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8E649217AB Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org Received: by mail-io0-f181.google.com with SMTP id 25so2000925ioj.9 for ; Fri, 26 Jan 2018 14:38:51 -0800 (PST) X-Gm-Message-State: AKwxyteqP/unAisfhF6IVCw024vI0K7pnZ6Ewv811qSEZvGcSLBZKdpW hB+cmYKw1lQQTlpcdAOJ5plp1QnTtnft9DcaeVbVeQ== X-Received: by 10.107.138.20 with SMTP id m20mr18285637iod.192.1517006328773; Fri, 26 Jan 2018 14:38:48 -0800 (PST) MIME-Version: 1.0 Received: by 10.2.137.84 with HTTP; Fri, 26 Jan 2018 14:38:28 -0800 (PST) In-Reply-To: References: <20180126153631.ha7yc33fj5uhitjo@xps> From: Andy Lutomirski Date: Fri, 26 Jan 2018 14:38:28 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: selftests/x86/fsgsbase_64 test problem To: Andy Lutomirski , Borislav Petkov Cc: "H. Peter Anvin" , Dan Rue , Shuah Khan , Ingo Molnar , Dmitry Safonov , "open list:KERNEL SELFTEST FRAMEWORK" , LKML Content-Type: multipart/mixed; boundary="001a113fd3785e0f640563b58f7e" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --001a113fd3785e0f640563b58f7e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Jan 26, 2018 at 11:46 AM, Andy Lutomirski wrote: > On Fri, Jan 26, 2018 at 10:59 AM, Andy Lutomirski wrote= : >> On Fri, Jan 26, 2018 at 8:22 AM, Andy Lutomirski wrote= : >>> On Fri, Jan 26, 2018 at 7:36 AM, Dan Rue wrote: >>>> >>>> We've noticed that fsgsbase_64 can fail intermittently with the >>>> following error: >>>> >>>> [RUN] ARCH_SET_GS(0x0) and clear gs, then schedule to 0x1 >>>> Before schedule, set selector to 0x1 >>>> other thread: ARCH_SET_GS(0x1) -- sel is 0x0 >>>> [FAIL] GS/BASE changed from 0x1/0x0 to 0x0/0x0 >>>> >>>> This can be reliably reproduced by running fsgsbase_64 in a loop. i.e. >>>> >>>> for i in $(seq 1 10000); do ./fsgsbase_64 || break; done >>>> >>>> This problem isn't new - I've reproduced it on latest mainline and eve= ry >>>> release going back to v4.12 (I did not try earlier). This was tested o= n >>>> a Supermicro board with a Xeon E3-1220 as well as an Intel Nuc with an >>>> i3-5010U. >>>> >>> >>> Hmm, I can reproduce it, too. I'll look in a bit. >> >> I'm triggering a different error, and I think what's going on is that >> the kernel doesn't currently re-save GSBASE when a task switches out >> and that task has save gsbase !=3D 0 and in-register GS =3D=3D 0. This = is >> arguably a bug, but it's not an infoleak, and fixing it could be a wee >> bit expensive. I'm not sure what, if anything, to do about this. I >> suppose I could add some gross perf hackery to the test to detect this >> case and suppress the error. >> >> I can also trigger the problem you're seeing, and I don't know what's >> up. It may be related to and old problem I've seen that causes signal >> delivery to sometimes corrupt %gs. It's deterministic, but it depends >> in some odd way on register state. I can currently reproduce that >> issue 100% of the time, and I'm trying to see if I can figure out >> what's happening. > > I think it's a CPU bug, and I'm a bit mystified. I can trigger the > following, plausibly related issue: > > Write a program that writes %gs =3D 1. > Run that program under gdb > break in which %gs =3D=3D 1 > display/x $gs > si > > Under QEMU TCG, gs stays equal to 1. On native or KVM, on Skylake, it > changes to 0. > > On KVM or native, I do not observe do_debug getting called with %gs =3D= =3D > 1. On TCG, I do. I don't think that's precisely the problem that's > causing the test to fail, since the test doesn't use TF or ptrace, but > I wouldn't be shocked if it's related. > > hpa, any insight? > > (NB: if you want to play with this as I've described it, you may need > to make invalid_selector() in ptrace.c always return false. The > current implementation is too strict and causes problems.) Much simpler test. Run the attached program (gs1). It more or less just sets %gs to 1 and spins until it stops being 1. Do it on a kernel with the attached patch applied. I see stuff like this: # ./gs1 PID =3D 129 [ 15.703015] pid 129 saved gs =3D 1 [ 15.703517] pid 129 loaded gs =3D 1 [ 15.703973] pid 129 prepare_exit_to_usermode: gs =3D 1 ax =3D 0, cx =3D 0, dx =3D 0 So we're interrupting the program, switching out, switching back in, setting %gs to 1, observing that %gs is *still* 1 in prepare_exit_to_usermode(), returning to usermode, and observing %gs =3D=3D 0. Presumably what's happening is that the IRET microcode matches the SDM's pseudocode, which says: RETURN-TO-OUTER-PRIVILEGE-LEVEL: ... FOR each SegReg in (ES, FS, GS, and DS) DO tempDesc =E2=86=90 descriptor cache for SegReg (* hidden part of segmen= t register *) IF tempDesc(DPL) < CPL AND tempDesc(Type) is data or non-conforming cod= e THEN (* Segment register invalid *) SegReg =E2=86=90 NULL; FI; OD; But this is very odd. The actual permission checks (in the docs for MOV) a= re: IF DS, ES, FS, or GS is loaded with non-NULL selector THEN IF segment selector index is outside descriptor table limits or segment is not a data or readable code segment or ((segment is a data or nonconforming code segment) or ((RPL > DPL) and (CPL > DPL)) THEN #GP(selector); FI; ^^^^ This makes no sense. This says that the data segments cannot be loaded with MOV. Empirically, it seems like MOV works if CPL <=3D DPL and RPL <=3D DPL, but I haven't checked that hard. IF segment not marked present THEN #NP(selector); ELSE SegmentRegister =E2=86=90 segment selector; SegmentRegister =E2=86=90 segment descriptor; FI; FI; IF DS, ES, FS, or GS is loaded with NULL selector THEN SegmentRegister =E2=86=90 segment selector; SegmentRegister =E2=86=90 segment descriptor; ^^^^ wtf? There is no "segment descriptor". Presumably what actually gets written to segment.DPL is nonsense. FI; Anyway, I think it's nonsense that user code can load a selector using MOV that is, in turn, rejected by IRET. I don't suppose Intel would consider fixing this going forward. Borislav, any chance you could run the attached program on an AMD machine to see what it does? --001a113fd3785e0f640563b58f7e Content-Type: text/x-csrc; charset="US-ASCII"; name="gs1.c" Content-Disposition: attachment; filename="gs1.c" Content-Transfer-Encoding: base64 X-Attachment-Id: f_jcwhz7ox0 I2luY2x1ZGUgPHN0ZGlvLmg+CiNpbmNsdWRlIDxzeXMvdHlwZXMuaD4KI2luY2x1ZGUgPHVuaXN0 ZC5oPgoKaW50IG1haW4oKQp7Cgl1bnNpZ25lZCBzaG9ydCBheCwgY3gsIGR4OwoJcHJpbnRmKCJQ SUQgPSAlZFxuIiwgKGludClnZXRwaWQoKSk7Cglhc20gdm9sYXRpbGUgKCJtb3YgJVtvbmVdLCAl JWdzXG5cdCIKCQkgICAgICAiMTpcblx0IgoJCSAgICAgICJtb3YgJSVncywgJSVlYXhcblx0IgoJ CSAgICAgICJtb3YgJSVncywgJSVlY3hcblx0IgoJCSAgICAgICJtb3YgJSVncywgJSVlZHhcblx0 IgoJCSAgICAgICJjbXB3ICQxLCAlJWF4XG5cdGpuZSAyZlxuXHQiCgkJICAgICAgImNtcHcgJDEs ICUlY3hcblx0am5lIDJmXG5cdCIKCQkgICAgICAiY21wdyAkMSwgJSVkeFxuXHRqbmUgMmZcblx0 IgoJCSAgICAgICJqbXAgMWJcblx0IgoJCSAgICAgICIyOiIKCQkgICAgICA6ICI9YSIgKGF4KSwg Ij1jIiAoY3gpLCAiPWQiIChkeCkKCQkgICAgICA6IFtvbmVdICJybSIgKCh1bnNpZ25lZCBzaG9y dCkxKSk7CglwcmludGYoImF4ID0gJWh4LCBjeCA9ICVoeCwgZHggPSAlaHhcbiIsIGF4LCBjeCwg ZHgpOwoJcmV0dXJuIDA7Cn0K --001a113fd3785e0f640563b58f7e--