Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp4814347pxv; Tue, 6 Jul 2021 09:45:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx5Ncuh3Iz+h5+F/Dm7jffDUn0ZpvFNlp8/a1cfIltYQ8I/ybxZjNKANeBvwXAuIAP5XP2b X-Received: by 2002:a92:bf0b:: with SMTP id z11mr15584020ilh.60.1625589953715; Tue, 06 Jul 2021 09:45:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625589953; cv=none; d=google.com; s=arc-20160816; b=dtaxySZgFXdhHFca1JbTzUCVbYuIB7MVNUnXhSPuE2pfxAsgSQC3V9K5AGv/0JihVE JCmob0MYZpV2qnydHZqhlrwBixBOBqB27ytIlEv/PezUeFCtCA/MTFyIjnDxk7E13XVg 7DqzH6UDLlWg/LsaiEmxWRKqWj2MVrhcChZ9VFDMNQpC/GXW+6DOezrfq1ea8kNgA8vN kPeevw0dh9LCPYG/PKnj7HOIpcXQABPWJ7JFjfsotDJyNGNPeddGJQGr3oQY8GMzoixk 6+u84z6FDf92M7lEKAttNXMSGRPXur/UxspbGcGp1AnaHMFLJvLkL/YCElWwTi27TM7/ wqrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=RwgW/T/Dumlgkpnqqsld52lVq8RNKYX26J7dYhiHMXU=; b=0kkBk3eL+HwgMGN8Jx0BGNWkJM6WAAa1IgTfDx6gvEV7YTSdK8T00kbg7kwX3yqVXg Z7dLI8ZP1IELcYcfExOKQNfvr/B+wLmKuleLOEbAxdJbaqdnHTOoEublXumYOVu3+jxK 87Y/jf3EGq2+sDUCXRe/LrCRkPg0Ma3YlyIFlEWkIvxnAs5HoTtZlKrmhiqmoXbhR6RD WkYMIO0RnNYA//T+oRDfAUvYf7VZlHCPDuNS9YXmY3FOIt6knMAcD0SAdNpz+1VpaRvB ze63dn1iOGqx4kX1WQ5PjPnA6Who16Vwn+NLDlt1/+sg48LMdvB5eZvpMh6ziJnoQFkg MUkA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t2si17539520ilf.48.2021.07.06.09.45.42; Tue, 06 Jul 2021 09:45:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230209AbhGFQrg (ORCPT + 99 others); Tue, 6 Jul 2021 12:47:36 -0400 Received: from mga01.intel.com ([192.55.52.88]:52110 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230048AbhGFQrd (ORCPT ); Tue, 6 Jul 2021 12:47:33 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10037"; a="230883143" X-IronPort-AV: E=Sophos;i="5.83,328,1616482800"; d="scan'208";a="230883143" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jul 2021 09:44:53 -0700 X-IronPort-AV: E=Sophos;i="5.83,328,1616482800"; d="scan'208";a="457133171" Received: from agluck-desk2.sc.intel.com (HELO agluck-desk2.amr.corp.intel.com) ([10.3.52.146]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jul 2021 09:44:52 -0700 Date: Tue, 6 Jul 2021 09:44:51 -0700 From: "Luck, Tony" To: Ding Hui Cc: bp@alien8.de, bp@suse.de, naoya.horiguchi@nec.com, osalvador@suse.de, peterz@infradead.org, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, x86@kernel.org, hpa@zytor.com, youquan.song@intel.com, huangcun@sangfor.com.cn, stable@vger.kernel.org Subject: Re: [PATCH v2] x86/mce: Fix endless loop when run task works after #MC Message-ID: <20210706164451.GA1289248@agluck-desk2.amr.corp.intel.com> References: <20210706121606.15864-1-dinghui@sangfor.com.cn> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210706121606.15864-1-dinghui@sangfor.com.cn> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 06, 2021 at 08:16:06PM +0800, Ding Hui wrote: > Recently we encounter multi #MC on the same task when it's > task_work_run() has not been called, current->mce_kill_me was > added to task_works list more than once, that make a circular > linked task_works, so task_work_run() will do a endless loop. I saw the same and posted a similar fix a while back: https://www.spinics.net/lists/linux-mm/msg251006.html It didn't get merged because some validation tests began failing around the same time. I'm now pretty sure I understand what happened with those other tests. I'll post my updated version (second patch in a three part series) later today. > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > + if (!cmpxchg(¤t->mce_kill_me.func, NULL, ch.func)) { > + current->mce_addr = m->addr; > + current->mce_kflags = m->kflags; > + current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV); > + current->mce_whole_page = whole_page(m); You don't need an atomic cmpxchg here (nor the WRITE_ONCE() to clear it). The task is operating on its own task_struct. Nobody else should touch the mce_kill_me field. -Tony