Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp4049775pxb; Mon, 1 Feb 2021 11:02:13 -0800 (PST) X-Google-Smtp-Source: ABdhPJw37SlhOsgeG9T2cJv9DFKYhv09Ukn6zYZaNriGO2GvygwLrxSFwoie9cI8qaqplKlVhEFj X-Received: by 2002:a17:906:149a:: with SMTP id x26mr2854252ejc.486.1612206019762; Mon, 01 Feb 2021 11:00:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612206019; cv=none; d=google.com; s=arc-20160816; b=bq4enCep4Ihc9y0GcrlsXmDXSHE6cOEdaGX/PndOBYKixbfHiYa3eN09q500MEA3xG IB5nK/OyULc57wNOyTL2hfAXF52x9azYs8ysrrmWzxATjLMZ47v63I/ULV200PPw0+jj FVjFNw9Q7GqHYL3NmcHPrsFzOYup9x4JY5dZTmQFvZMDj5+zi30jd2Si0oIvptKkCK3v AQ3bBqihh9PwZT5DV0HPaKTO+QAu2OQAhVGsVtulx2j+VxthcX8yjP5ePL2DkWcVVEsu GZC52B5XzrJZ5Opm/l5zZ32UdjPzCMGw64U+TVtBvLFB4swWCsWuVEscOZPHAJEqpdj7 6pOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:ironport-sdr :ironport-sdr; bh=k0OY6ffHf1GH6AEfHJRQc+tpK7PQrquLBUETsgpjCG4=; b=OPJ7qM1l08MmhzcFjtOkh56zipEM2pqrmBKViO6kb9nP0tXx5hta/IxrqYHCONP9BR 8HE0dTSngsssOBSDfjnVX97FiUAkVZDKAsl6MT1+93Abl+Cq0Pj9gV1OVotSpPbMeAA9 ag78iZefnObpsvZwSVxSMMrnrmLuXTWvvQ4D37LHI0z3OyVxAE2caqClYwTHOFKO3uPr 3DJC3uXcWQQVUIg/94ea+Wnv4AO0gNInunSH3JymJ0fGZnB5OVY60hp1XRie2n3FTcaV /tbM9jHDbe2FyHUBd97D0mjMN1iAVLw/D5z0v/73eySTuuTj3MvVp2/DA+cLvUuRVBvv RP9A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g4si11482991edu.256.2021.02.01.10.59.53; Mon, 01 Feb 2021 11:00:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231995AbhBAS66 (ORCPT + 99 others); Mon, 1 Feb 2021 13:58:58 -0500 Received: from mga18.intel.com ([134.134.136.126]:6649 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229525AbhBAS65 (ORCPT ); Mon, 1 Feb 2021 13:58:57 -0500 IronPort-SDR: ND+JUAAq/NXObHbjQ2C3T56i91o5HTdqmiijjVNVWURLw6PA9KWsE7JNtCNvLZyhZxbqN0hGbe 4CsBBW6vvFug== X-IronPort-AV: E=McAfee;i="6000,8403,9882"; a="168416070" X-IronPort-AV: E=Sophos;i="5.79,393,1602572400"; d="scan'208";a="168416070" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Feb 2021 10:58:14 -0800 IronPort-SDR: mFCnqgrFuZMebcpZHoL5TJBZXYtUQmrYnBnXWFsNcJDi+zpaRIvoqXbnuPVNBNTAz0/krOjbvW K27HBSXRh24A== X-IronPort-AV: E=Sophos;i="5.79,393,1602572400"; d="scan'208";a="355881295" Received: from agluck-desk2.sc.intel.com (HELO agluck-desk2.amr.corp.intel.com) ([10.3.52.146]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Feb 2021 10:58:13 -0800 Date: Mon, 1 Feb 2021 10:58:12 -0800 From: "Luck, Tony" To: Borislav Petkov Cc: x86@kernel.org, Andrew Morton , Peter Zijlstra , Darren Hart , Andy Lutomirski , linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v5] x86/mce: Avoid infinite loop for copy from user recovery Message-ID: <20210201185812.GA54867@agluck-desk2.amr.corp.intel.com> References: <20210115205103.GA5920@agluck-desk2.amr.corp.intel.com> <20210115232346.GA7967@agluck-desk2.amr.corp.intel.com> <20210119105632.GF27433@zn.tnic> <20210119235759.GA9970@agluck-desk2.amr.corp.intel.com> <20210120121812.GF825@zn.tnic> <20210121210959.GA10304@agluck-desk2.amr.corp.intel.com> <20210125225509.GA7149@agluck-desk2.amr.corp.intel.com> <20210126110314.GC6514@zn.tnic> <20210126223605.GA14355@agluck-desk2.amr.corp.intel.com> <20210128175735.GB2120@zn.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210128175735.GB2120@zn.tnic> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 28, 2021 at 06:57:35PM +0100, Borislav Petkov wrote: > Crazy idea: if you still can reproduce on -rc3, you could bisect: i.e., > if you apply the patch on -rc3 and it explodes and if you apply the same > patch on -rc5 and it works, then that could be a start... Yeah, don't > have a better idea here. :-\ I tried reporoducing (applied the original patch I posted back to -rc3) and the same issue stubbornly refused to show up again. But I did hit something with the same signature (overflow bit set in bank 1) while running my futex test (which has two processes mapping the poison page). This time I *do* understand what happened. The test failed when the two processes were running on the two hyperhtreads of the same core. Seeing overflow in this case is understandable because bank 1 MSRs on my test machine are shared between the HT threads. When I run the test again using taskset(1) to only allowing running on thread 0 of each core, it keeps going for hunderds of iterations. I'm not sure I can stitch together how this overflow also happened for my single process test. Maybe a migration from one HT thread to the other at an awkward moment? -Tony