Received: by 2002:a25:ca44:0:0:0:0:0 with SMTP id a65csp726569ybg; Tue, 28 Jul 2020 17:58:15 -0700 (PDT) X-Google-Smtp-Source: ABdhPJybIjgXRai+mCBcYdeWg7flDMjFCdcLFiusLDotbxG6xoWTmlut+IX7D8qxBJzO+RtrO2qE X-Received: by 2002:a05:6402:1d97:: with SMTP id dk23mr29340646edb.1.1595984295268; Tue, 28 Jul 2020 17:58:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595984295; cv=none; d=google.com; s=arc-20160816; b=gL9+cryRR9THcg4UDlS94LfQvahRj3+LBI35y9/GG9CO2ZGvNdCJ0oNxWtUm13biMz 42u6vh/rIpDjCFya5FHJuRBRTmpGpak5AgoflhFmMBf8G68CPoLhXrHHBZlYwjBtqahC exbGC9CzxMKFdZDUlsimb6muMtOzlxGDZdKFtyTn6MO7OZpzLo8flxPb2malwRLJaJDQ DItlajYPxSEd/5Ey3i2/wC3KDyi6AM9PDgs5BBRov634Nr/HxnttZNeiyIF2rtiUXDo/ FvHfzZD9AankU0tFrmmm27WUTQS8VEIy+EvQD4lKSfVRmNVXS5R1R8hyyMchpYVB/k77 Zg9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:in-reply-to:cc:references:message-id :date:subject:mime-version:from:content-transfer-encoding :dkim-signature; bh=xMlai4+jYP0XrSSpHNRx/rApchrWP8YqEYA3eY+3yls=; b=FUr3FX5I+/PMusaIhqiQeEdDzcM+xbLkw+R92xqrNXOnOrEUa0GQ7vw0S//nWIk5sU 0525qpVFMR0bUsjxPGjFEKDkAq2zJxx3YmJdMhOUt8XLe5f0lOMLYNfr0y8Di2Yl1+ue 6NO8G+60FkbPjqQIsqSgmFVCpj9FqeGZEqRpq4v9aAoRF8w0hGB5FYQWflMuH3QUJQ/V iFaERk7xCu6RXYOn2oB4U8JOlZsRBE2ozX0xbeH3hGNJFL3rzhA5QLq1fwxLCUryQhpq y7Iwnljfuuw3Icgp9e9W7hO+N7yd8cV7aEyQPQr8NHikejo3U4gRKh/nlh++tZ6ujlsD egIw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=UuNmFd9S; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q26si263455edw.453.2020.07.28.17.57.50; Tue, 28 Jul 2020 17:58:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=UuNmFd9S; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730881AbgG2A4j (ORCPT + 99 others); Tue, 28 Jul 2020 20:56:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48110 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730766AbgG2A4j (ORCPT ); Tue, 28 Jul 2020 20:56:39 -0400 Received: from mail-pl1-x642.google.com (mail-pl1-x642.google.com [IPv6:2607:f8b0:4864:20::642]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15BDFC061794 for ; Tue, 28 Jul 2020 17:56:39 -0700 (PDT) Received: by mail-pl1-x642.google.com with SMTP id t1so1836974plq.0 for ; Tue, 28 Jul 2020 17:56:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=content-transfer-encoding:from:mime-version:subject:date:message-id :references:cc:in-reply-to:to; bh=xMlai4+jYP0XrSSpHNRx/rApchrWP8YqEYA3eY+3yls=; b=UuNmFd9SMA7htAxW4sk6ClTo4x1omHZFcWzYRt5hGDpJxWyUpEoQdzPWx0yFpZcbh0 /94zTKZffmlVR4Smb/AA0kX8EhNye1sQ9sMKtxxPavv8nWJR7iSJNVzKrFoHY0Dy2vCz stZ99OEFzbDDzy3KFzZZDdD+XRHc5I/Yd6PxtQ87OP9Qh0SHqTSnUNiSjryTQgJ3idJ5 X4lqLMbels7SAPYzl5j6Cj6H9D0JtS5x8T1Y6+HK3iXK8Uv889tR8Ti9qsyP1Vp9Jl47 WVml3qZbKUNVldZGnIeNy13uukFrf4dbWQcdD51ayCOqrgKQy1r80ODAFS3jPyCRL+EH U2dA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=xMlai4+jYP0XrSSpHNRx/rApchrWP8YqEYA3eY+3yls=; b=VClXky7jiGMEi+/4cR3f0Mk/3qmyBXnqNX3dIgtPGaOkHkBIL116xu/J2/9JtNYHXu mVN5/nf1lkuMtJtXAlDf6/q4PsWcPlSDO0pPxRN1DPpQw8qt1f/PP922or+8bp4DssR0 hWx2xnmT7yAAS5W/iCVLkXBsijszEfl6Iyi9v/2EBEkQMfDNSk/8CALRNKMnP3926FrP 45rx2r3hTk8T6YMlw4x0iCUC8ATJH+DvZ4kxNwJjLfo0dEM921iu+hgp9Sf2Yz3XZ10d LRz9uqUa0vcE1a0UWBJDMq8RULiq/eYehY1UqHuY+yO7DAk3+r1KtYyFmH/ZQ7xP17KG yD1g== X-Gm-Message-State: AOAM531J02fSyBoOotHkfCj3U2dBVr0lCBqqrjoUQQba7YHWpEFK2qd8 541/Tb/V1PBMvJym816vXloXiw== X-Received: by 2002:a17:90a:8985:: with SMTP id v5mr6918771pjn.181.1595984198399; Tue, 28 Jul 2020 17:56:38 -0700 (PDT) Received: from ?IPv6:2601:646:c200:1ef2:eca9:c737:f7c5:9694? ([2601:646:c200:1ef2:eca9:c737:f7c5:9694]) by smtp.gmail.com with ESMTPSA id x15sm222810pfr.208.2020.07.28.17.56.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 28 Jul 2020 17:56:37 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Andy Lutomirski Mime-Version: 1.0 (1.0) Subject: Re: Random shadow stack pointer corruption Date: Tue, 28 Jul 2020 17:56:20 -0700 Message-Id: <9CB08BDB-0245-474A-BE03-C529F659581F@amacapital.net> References: Cc: Yu-cheng Yu , Dave Hansen , Andy Lutomirski , LKML , X86 ML , Borislav Petkov , Dave Hansen , Ingo Molnar , "Ravi V. Shankar" , Sebastian Andrzej Siewior , Tony Luck , Thomas Gleixner , Peter Zijlstra , Weijiang Yang In-Reply-To: To: "H.J. Lu" X-Mailer: iPhone Mail (17F80) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Jul 28, 2020, at 5:36 PM, H.J. Lu wrote: >=20 > =EF=BB=BFOn Sat, Jul 18, 2020 at 4:35 PM Yu-cheng Yu wrote: >>=20 >>> On Sat, 2020-07-18 at 15:41 -0700, Dave Hansen wrote: >>> On 7/18/20 11:24 AM, Yu-cheng Yu wrote: >>>> On Sat, 2020-07-18 at 11:00 -0700, Andy Lutomirski wrote: >>>>> On Sat, Jul 18, 2020 at 10:58 AM Yu-cheng Yu w= rote: >>>>>> Hi, >>>>>>=20 >>>>>> My shadow stack tests start to have random shadow stack pointer corru= ption after >>>>>> v5.7 (excluding). The symptom looks like some locking issue or the k= ernel is >>>>>> confused about which CPU a task is on. In later tip/master, this can= be >>>>>> triggered by creating two tasks and each does continuous >>>>>> pthread_create()/pthread_join(). If the kernel has max_cpus=3D1, the= issue goes >>>>>> away. I also checked XSAVES/XRSTORS, but this does not seem to be an= issue >>>>>> coming from there. >>>>>=20 >>>>> What do you mean "shadow stack pointer corruption"? Is SSP itself >>>>> corrupt while running in the kernel? Is one of the MSRs getting >>>>> corrupted? Is the memory to which the shadow stack points getting >>>>> corrupted? Is the CPU rejecting an attempt to change SSP? >>>>=20 >>>> What I see is, a new thread after ret_from_fork() and iret back to ring= -3, >>>> its shadow stack pointer (MSR_IA32_PL3_SSP) is corrupted. >>>=20 >>> Does corrupt mean random? Or is it a valid stack address, just not for >>> _this_ thread? Or NULL? Or is it a kernel address? Have you tried >>> tracing *ALL* the WRMSR's and XRSTOR's that write to the MSR? >>=20 >> When a shadow stack address is changed, the address appears to be other t= ask's. >> I traced all WRMSR's and XRSTOR's. I also verified there have not been a= ny >> XRSTORS from a wrong buffer. When rc6 is tagged, I will re-base, test, a= nd >> share current patches. >>=20 >=20 > We have identified that >=20 > ommit 91eeafea1e4b7c95cc4f38af186d7d48fceef89a > Author: Thomas Gleixner > Date: Thu May 21 22:05:28 2020 +0200 >=20 > x86/entry: Switch page fault exception to IDTENTRY_RAW >=20 > Convert page fault exceptions to IDTENTRY_RAW: >=20 > - Implement the C entry point with DEFINE_IDTENTRY_RAW > - Add the CR2 read into the exception handler > - Add the idtentry_enter/exit_cond_rcu() invocations in > in the regular page fault handler and in the async PF > part. > - Emit the ASM stub with DECLARE_IDTENTRY_RAW > - Remove the ASM idtentry in 64-bit > - Remove the CR2 read from 64-bit > - Remove the open coded ASM entry code in 32-bit > - Fix up the XEN/PV code > - Remove the old prototypes >=20 > No functional change. >=20 > triggered the shadow stack corruption when the process returned from sysca= ll. > SSP MSR somehow was changed between setting SSP MSR and IRET. Could > there be a page fault between setting SSP MSR and IRET? Not upstream because there=E2=80=99s no SSP MSR. >=20 > --=20 > H.J.