Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp516354ybh; Sat, 18 Jul 2020 11:02:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzOGkTP7imky7uDB/DxfAqHTr9KUu9yATc7VDBipsQtpcjw/fVWz5KatZhsPfCFKabA2+lm X-Received: by 2002:aa7:d3ca:: with SMTP id o10mr15059341edr.138.1595095348743; Sat, 18 Jul 2020 11:02:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595095348; cv=none; d=google.com; s=arc-20160816; b=QHsJUIXWtTlrfSLHrOJimes7Jw2+deglNe4jpuxpF1ShC2P+USrYqKGU6qNVp3UZ69 b4Yz6xYx1gEJYkMrLqqxZAT7IAiFMQ+8K2GVf8K7CLGrKvMmzsfVDESfbFpuYPUkyNOS rCko/gcZl6d0gcITOPrus39aRly8bb/+xgfHh1ieNsZoYsRfIzBOPJ0BkoiGf+DEHhps GHcRfQPoBEA78/d2MADs0wxm9z5mtWa1Q4/UdXZTQMmzq4CKo3vBympdK30l2dBYjXxT IHIIjiW3p0naXES7fsW6ft1U3QMCybGaalZuYHB2uym/2rFgiSZTL7xMDeE7BRZOB4jO 02kA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:date:to:from:subject:message-id:ironport-sdr :ironport-sdr; bh=wH5gkbzjKwkuhH6zjtvEKOoDvrVoqMcEqLnK6xOOeak=; b=u2X60GGyCAd4Epm+5Y418p8fOPIKhrBZcW2iqYXt2NdJOeVqU2pH8zc610PwK8FSG0 SSZtYEVwlQb1vXhrrfwyAOOFnxOGP+GMwTXCMuH1QRczX/LCXPr2vPaUILZA+RYM0y6v X+6WuUxzHrNijhTRl21J7gz/OZMj3xWZCTeYCa9WZ7yNmxTcEFhO1O6VNORG11MnLGjj PGnDPG7dfThTH2tVveZHGyJkGsyXlypFs8VGwxPMxnYc/dIGsv/iy06msfB5SzUbh5bU Vc3dikeo17dx/vM6hyVJsqBsrDVLXDWG72LWqAw3rR5UiEHSS7Ej8syv8hO78/ZrJtxj MAYw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id df15si6960646edb.366.2020.07.18.11.02.05; Sat, 18 Jul 2020 11:02:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727942AbgGRR61 (ORCPT + 99 others); Sat, 18 Jul 2020 13:58:27 -0400 Received: from mga05.intel.com ([192.55.52.43]:3488 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726604AbgGRR61 (ORCPT ); Sat, 18 Jul 2020 13:58:27 -0400 IronPort-SDR: vl+VRnOnxbBD7i9uOc/xUKNG3HdDJkTR5ZavowiFVuu//TMChjfaL79LR7YJjqMbBGplmTmJ4J PubqT9zHRNAw== X-IronPort-AV: E=McAfee;i="6000,8403,9686"; a="234615165" X-IronPort-AV: E=Sophos;i="5.75,368,1589266800"; d="scan'208";a="234615165" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jul 2020 10:58:26 -0700 IronPort-SDR: noNeAW6WZyC/PdsOfTPfAaMBnEkpJPAcQ24OcQAkfjArxFlJesDf5IfzdT5b1bNdBBQO7lgOyb vC2UZJHoXrjQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,368,1589266800"; d="scan'208";a="271069266" Received: from yyu32-desk.sc.intel.com ([143.183.136.146]) by fmsmga008.fm.intel.com with ESMTP; 18 Jul 2020 10:58:26 -0700 Message-ID: Subject: Random shadow stack pointer corruption From: Yu-cheng Yu To: LKML , x86@kernel.org, Andy Lutomirski , Borislav Petkov , Dave Hansen , "H.J. Lu" , Ingo Molnar , "Ravi V. Shankar" , Sebastian Andrzej Siewior , Tony Luck , Thomas Gleixner , Peter Zijlstra , Weijiang Yang Date: Sat, 18 Jul 2020 10:57:27 -0700 Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.32.4 (3.32.4-1.fc30) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, My shadow stack tests start to have random shadow stack pointer corruption after v5.7 (excluding). The symptom looks like some locking issue or the kernel is confused about which CPU a task is on. In later tip/master, this can be triggered by creating two tasks and each does continuous pthread_create()/pthread_join(). If the kernel has max_cpus=1, the issue goes away. I also checked XSAVES/XRSTORS, but this does not seem to be an issue coming from there. The tests I run take a long time to complete, and some commit points in bisect do not show failures right away. However, the issue can be more easily triggered after the point of: d77290507ab2 x86/entry/32: Convert IRET exception to IDTENTRY_SW Can anyone help me find places to look at? Thanks, Yu-cheng