Received: by 10.223.176.46 with SMTP id f43csp1096219wra; Fri, 26 Jan 2018 11:48:41 -0800 (PST) X-Google-Smtp-Source: AH8x225pcbdts1eTtzg8vSIW3JmgjjVI9BwbEV0v5Fp9SRO86Yk3t94ZQUuhuGSAcrq/UIhsVmQg X-Received: by 10.98.155.8 with SMTP id r8mr19811820pfd.94.1516996121215; Fri, 26 Jan 2018 11:48:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516996121; cv=none; d=google.com; s=arc-20160816; b=M8oZ/j/tpvtd9Rnj20eIqHPjyYBzn382DQJWxXq1AnO+4BLR8CtXgB7537BkAZienm z2mjnccG8eHKzCA943zK1V6VurqImxJHKvKNj7l7Ps5GsJ2IarOEtF6fXZv6OEhlo/hP hXKTlAiIuJWh1IkN4OBVv5vTXuYFuniDHaVRRCL3YV/Ng7NsaHoyCKPKCYo5fPmbwInG XaFH+JnecjK1/wnENmLl2+kk0yGu4Bmrg1jRRzSGOnHnroh7mF7opCOyUZNQ5P2MLxmc Zjkft58rIWmVChI68FxDHY6EOOfsl27t+fDwynzztokpLnR70QjwcA40jd3em1C5RHgP JvfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dmarc-filter :arc-authentication-results; bh=jXcJOP8XGsVzko1wavVUw9d2gIqD6I78SotCVNfeHeg=; b=UmXsmq/t/MUp/56lnH3nRGPXlCgaDBnqPfzhOuabVoBsS05tJO62vzhxUjLCVS3KuD CtqnO3uYu9SvfWWaPBKz/PnXjvFn1RWCPDVX7Lwxg68tIVcl8Xi9BnbmklJ8fEZY4p4C FRxuM4b2P1+LV/WISIawfiZfYE8biiTWjqI8lUrIBDlxw2KmgBcLx1XEe1HUMRxxc+/m 60KfTAFOmVD3yFqZ7rp+y6aOjQvp0gegqsZswZCRBkhI7Z6VOXOyGkjFDSh7ky240THE otsS2frRGPXp9iUrumhnT7Ye/awx6cDlczRHqhmsrQMlu4WTSUF0qS7vJiP3D2iEx9i6 Uy0A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m62si3420868pgm.88.2018.01.26.11.48.26; Fri, 26 Jan 2018 11:48:41 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751937AbeAZTqq (ORCPT + 99 others); Fri, 26 Jan 2018 14:46:46 -0500 Received: from mail.kernel.org ([198.145.29.99]:38112 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751522AbeAZTqo (ORCPT ); Fri, 26 Jan 2018 14:46:44 -0500 Received: from mail-io0-f179.google.com (mail-io0-f179.google.com [209.85.223.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C373C217B3 for ; Fri, 26 Jan 2018 19:46:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C373C217B3 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org Received: by mail-io0-f179.google.com with SMTP id d13so1578961iog.5 for ; Fri, 26 Jan 2018 11:46:43 -0800 (PST) X-Gm-Message-State: AKwxytczYaVeJ2QUaat9u/RPtxkdKCdNmBAVGkC0K6wogP9B5fxlX7O5 7cqEL74cHs5JIYbMj5iYDeh48eqrValCqO8DAaKMLA== X-Received: by 10.107.78.16 with SMTP id c16mr17802718iob.105.1516995999836; Fri, 26 Jan 2018 11:46:39 -0800 (PST) MIME-Version: 1.0 Received: by 10.2.137.84 with HTTP; Fri, 26 Jan 2018 11:46:19 -0800 (PST) In-Reply-To: References: <20180126153631.ha7yc33fj5uhitjo@xps> From: Andy Lutomirski Date: Fri, 26 Jan 2018 11:46:19 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: selftests/x86/fsgsbase_64 test problem To: Andy Lutomirski , "H. Peter Anvin" Cc: Dan Rue , Shuah Khan , Ingo Molnar , Dmitry Safonov , Borislav Petkov , "open list:KERNEL SELFTEST FRAMEWORK" , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 26, 2018 at 10:59 AM, Andy Lutomirski wrote: > On Fri, Jan 26, 2018 at 8:22 AM, Andy Lutomirski wrote: >> On Fri, Jan 26, 2018 at 7:36 AM, Dan Rue wrote: >>> >>> We've noticed that fsgsbase_64 can fail intermittently with the >>> following error: >>> >>> [RUN] ARCH_SET_GS(0x0) and clear gs, then schedule to 0x1 >>> Before schedule, set selector to 0x1 >>> other thread: ARCH_SET_GS(0x1) -- sel is 0x0 >>> [FAIL] GS/BASE changed from 0x1/0x0 to 0x0/0x0 >>> >>> This can be reliably reproduced by running fsgsbase_64 in a loop. i.e. >>> >>> for i in $(seq 1 10000); do ./fsgsbase_64 || break; done >>> >>> This problem isn't new - I've reproduced it on latest mainline and every >>> release going back to v4.12 (I did not try earlier). This was tested on >>> a Supermicro board with a Xeon E3-1220 as well as an Intel Nuc with an >>> i3-5010U. >>> >> >> Hmm, I can reproduce it, too. I'll look in a bit. > > I'm triggering a different error, and I think what's going on is that > the kernel doesn't currently re-save GSBASE when a task switches out > and that task has save gsbase != 0 and in-register GS == 0. This is > arguably a bug, but it's not an infoleak, and fixing it could be a wee > bit expensive. I'm not sure what, if anything, to do about this. I > suppose I could add some gross perf hackery to the test to detect this > case and suppress the error. > > I can also trigger the problem you're seeing, and I don't know what's > up. It may be related to and old problem I've seen that causes signal > delivery to sometimes corrupt %gs. It's deterministic, but it depends > in some odd way on register state. I can currently reproduce that > issue 100% of the time, and I'm trying to see if I can figure out > what's happening. I think it's a CPU bug, and I'm a bit mystified. I can trigger the following, plausibly related issue: Write a program that writes %gs = 1. Run that program under gdb break in which %gs == 1 display/x $gs si Under QEMU TCG, gs stays equal to 1. On native or KVM, on Skylake, it changes to 0. On KVM or native, I do not observe do_debug getting called with %gs == 1. On TCG, I do. I don't think that's precisely the problem that's causing the test to fail, since the test doesn't use TF or ptrace, but I wouldn't be shocked if it's related. hpa, any insight? (NB: if you want to play with this as I've described it, you may need to make invalid_selector() in ptrace.c always return false. The current implementation is too strict and causes problems.)