Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3167882pxb; Tue, 19 Jan 2021 16:02:17 -0800 (PST) X-Google-Smtp-Source: ABdhPJzS30uTuPrxt/HXXLLnmIxP5FTVTVv+bAU22H7z2VLqKB+FBw812DRK2qDyuYw41NXNes9s X-Received: by 2002:a17:906:c954:: with SMTP id fw20mr2369486ejb.342.1611100937214; Tue, 19 Jan 2021 16:02:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611100937; cv=none; d=google.com; s=arc-20160816; b=s5z6ituN/UK/BPe1DdlpyCrFfLaeRj8V5dKh+sOfWp6mmvN1fxfKszz1inHVESal89 lec7Gs2S3pTmCZ3roNfuSoAFU6LnR9wXZqc53aDjGwuOOVE+xjqENaNUo+XieQxyVh+6 jqviCGY/GEm4BIMAIR12DKYSQq7CucPsc60h4a631gVz/ws6toU3mbCsUBTgyy8XwQT7 cPfK2nQGsjc2ZUwPdZQwISKWCHNylTLXJImin9R0+PtZkpvjZEDm4f16tUaKnRVhL5Un g9KArPg2W8CKj+reAB5glvrkygw28sltMepR3mnATGUiSogvfvyXIUgAYn1mPdXAGNxE rl0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:ironport-sdr:ironport-sdr; bh=ochwGi2REm29iAhWD+foDLWIWNOmHPeVpvHKNM4ntkw=; b=K1EYbRCl97UXG98zEOoldoR/IQd12l/8uMqNV2ON1iaF0On0fy2A5OXsTH4xSwrYev NO5evY4d1jBQakpIBhaQ9GFKz/YUX7hBecb7HyFFJmMtoAagKHsofKDTJpnc9I2UDdAw K193R3L6oFMo78loh2p2OQtt2XY6bdQjPCU0Mdm3b7mVvn7mJJBVqfVuztTjcPyyOmiD qqSNc3fwUeZB1I3WwxvXulgT7Vz4hHebhmzJ0Ds+Dy3/lP8z1ACukcEW15aWdlFMU43J zJAmhE8sLSeAE2b5lCRDwtgcoeMT90+FrLexCWg/UyLwgk+THXvqM86xmkqUKH+9SRBg G2vw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y11si129727edp.516.2021.01.19.16.01.51; Tue, 19 Jan 2021 16:02:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731000AbhASX6x (ORCPT + 99 others); Tue, 19 Jan 2021 18:58:53 -0500 Received: from mga14.intel.com ([192.55.52.115]:14052 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730605AbhASX6p (ORCPT ); Tue, 19 Jan 2021 18:58:45 -0500 IronPort-SDR: ha3Lu4+1c8q4WW2nA7oTVwwe51MloQfA7aK55VsQukr8wNdPmwOLuSFCWEN/k5VIknRmKDU7Hm 6zFmqFHz+aww== X-IronPort-AV: E=McAfee;i="6000,8403,9869"; a="178237128" X-IronPort-AV: E=Sophos;i="5.79,359,1602572400"; d="scan'208";a="178237128" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jan 2021 15:58:00 -0800 IronPort-SDR: Cr3m4y5okOrq1yhbLaWhhnHnH+hvpjKVUwfhFjIKslEtcuL+l6rDtnvmJFbtoRCs02T/kPK94B lS6RZYBCqJJQ== X-IronPort-AV: E=Sophos;i="5.79,359,1602572400"; d="scan'208";a="402548004" Received: from agluck-desk2.sc.intel.com (HELO agluck-desk2.amr.corp.intel.com) ([10.3.52.68]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jan 2021 15:58:00 -0800 Date: Tue, 19 Jan 2021 15:57:59 -0800 From: "Luck, Tony" To: Borislav Petkov Cc: x86@kernel.org, Andrew Morton , Peter Zijlstra , Darren Hart , Andy Lutomirski , linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v4] x86/mce: Avoid infinite loop for copy from user recovery Message-ID: <20210119235759.GA9970@agluck-desk2.amr.corp.intel.com> References: <20210111214452.1826-1-tony.luck@intel.com> <20210115003817.23657-1-tony.luck@intel.com> <20210115152754.GC9138@zn.tnic> <20210115193435.GA4663@agluck-desk2.amr.corp.intel.com> <20210115205103.GA5920@agluck-desk2.amr.corp.intel.com> <20210115232346.GA7967@agluck-desk2.amr.corp.intel.com> <20210119105632.GF27433@zn.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20210119105632.GF27433@zn.tnic> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 19, 2021 at 11:56:32AM +0100, Borislav Petkov wrote: > On Fri, Jan 15, 2021 at 03:23:46PM -0800, Luck, Tony wrote: > > On Fri, Jan 15, 2021 at 12:51:03PM -0800, Luck, Tony wrote: > > > static void kill_me_now(struct callback_head *ch) > > > { > > > + p->mce_count = 0; > > > force_sig(SIGBUS); > > > } > > > > Brown paper bag time ... I just pasted that line from kill_me_maybe() > > and I thought I did a re-compile ... but obviously not since it gives > > > > error: ā€˜pā€™ undeclared (first use in this function) > > > > Option a) (just like kill_me_maybe) > > > > struct task_struct *p = container_of(cb, struct task_struct, mce_kill_me); > > > > Option b) (simpler ... not sure why PeterZ did the container_of thing > > > > current->mce_count = 0; > > Right, he says it is the canonical way to get it out of callback_head. > I don't think current will change while the #MC handler runs but we can > adhere to the design pattern here and do container_of() ... Ok ... I'll use the canonical way. But now I've run into a weird issue. I'd run some basic tests with a dozen machine checks in each of: 1) user access 2) kernel copyin 3) futex (multiple accesses from kernel before task_work()) and it passed my tests before I posted. But the real validation folks took my patch and found that it has destabilized cases 1 & 2 (and case 3 also chokes if you repeat a few more times). System either hangs or panics. Generally before 100 injection/conumption cycles. Their tests are still just doing one at a time (i.e. complete recovery of one machine cehck before injecting the next error). So there aren't any complicated race conditions. So if you see anything obviously broken, let me know. Otherwise I'll be poking around at the patch to figure out what is wrong. -Tony