Received: by 10.192.165.148 with SMTP id m20csp4143470imm; Mon, 30 Apr 2018 12:33:58 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrIJV9LgpvVEkAyJdNltD9tBg2TmMuTmzeX4l4q98BAqkMm/GeecE7wA5FJ+6sV8/W3dszu X-Received: by 2002:a17:902:33a5:: with SMTP id b34-v6mr13658021plc.232.1525116838668; Mon, 30 Apr 2018 12:33:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525116838; cv=none; d=google.com; s=arc-20160816; b=JLu8Wzvl7KUPp6Vzx0rNCmItHBFHEKIyyAMXO1VVMMs1g8+bSrBlagIaB1bFX4dK3j UHWV0YKUioGjywKUTV3eE/c5H0NEP7ysK5SM8FHWMyS8/1OfH0FdPhn5DiWs94xGNFb8 QC82JAuI/F3Ko34cWWWjoMfwkVAIyYzwgslmjdIcHeorK+KfEPxAHd4TZmvHts5KcVmG VGcoLy8VmeaAGrst8Tfq0pWArimH0y4QEYO2N4lGkhekaPusFEBOmqPNtsDaQD4wiwdj Vkc0ECTygziIywJdY6KnPUQfKyBl6lIeo40StayUfN9K6TJQjTOJjbUQxZJEH/k1QfcX bLQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from:dmarc-filter :arc-authentication-results; bh=jBvzrorcW81Se9aA7HdvC7IpqiqUKZ63HH2NC/5sH0c=; b=b8TBnmj8JVS+z48dYzv0pamLUPfTN2+oCCrD9IcvyXLVnDpUAPExz6rm8P/KfhoJVG ek/AJ7kD6MJeT69Pdwhx5Ak5fIWUcs9GxURTLzalGvnUn4MMbCLkVi2ZiotbRCG1h6Vr CCai7mW95iJ0mRpCzEfpuAf3XAy13sNn+YQNHnH6oN3dyT143woK9ZejlXaU9Q6ETd7l d2kOvfBO7lhJpgWprrM5rM+7O5b149P7/58Olal8TsG3fCY9Og/cLy6dfVj6MeDqV3DV pMChIHvdDt6mv2muLfYVGTvll3F2NL1AMmIs/6mxsJZQrzDB0i663zOGBrEK7gtbkIWc bthQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x124-v6si6691437pgb.651.2018.04.30.12.33.13; Mon, 30 Apr 2018 12:33:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756570AbeD3TcR (ORCPT + 99 others); Mon, 30 Apr 2018 15:32:17 -0400 Received: from mail.kernel.org ([198.145.29.99]:37386 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932409AbeD3T27 (ORCPT ); Mon, 30 Apr 2018 15:28:59 -0400 Received: from localhost (unknown [104.132.1.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 2AD7B22E03; Mon, 30 Apr 2018 19:28:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2AD7B22E03 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linuxfoundation.org Authentication-Results: mail.kernel.org; spf=fail smtp.mailfrom=gregkh@linuxfoundation.org From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Mahesh Salgaonkar , Balbir Singh , Michael Ellerman Subject: [PATCH 4.16 081/113] powerpc/mce: Fix a bug where mce loops on memory UE. Date: Mon, 30 Apr 2018 12:24:52 -0700 Message-Id: <20180430184018.603869231@linuxfoundation.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180430184015.043892819@linuxfoundation.org> References: <20180430184015.043892819@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.16-stable review patch. If anyone has any objections, please let me know. ------------------ From: Mahesh Salgaonkar commit 75ecfb49516c53da00c57b9efe48fa3f5504a791 upstream. The current code extracts the physical address for UE errors and then hooks it up into memory failure infrastructure. On successful extraction of physical address it wrongly sets "handled = 1" which means this UE error has been recovered. Since MCE handler gets return value as handled = 1, it assumes that error has been recovered and goes back to same NIP. This causes MCE interrupt again and again in a loop leading to hard lockup. Also, initialize phys_addr to ULONG_MAX so that we don't end up queuing undesired page to hwpoison. Without this patch we see: Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 ... Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 Memory failure: 0x20181a08: recovery action for dirty LRU page: Recovered Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned ... Watchdog CPU:38 Hard LOCKUP After this patch we see: Severe Machine check interrupt [Not recovered] NIP: [00007fffaae585f4] PID: 7168 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffaafe28ac Physical address: 00002017c0bd0000 find[7168]: unhandled signal 7 at 00007fffaae585f4 nip 00007fffaae585f4 lr 00007fffaae585e0 code 4 Memory failure: 0x2017c0bd: recovery action for dirty LRU page: Recovered Fixes: 01eaac2b0591 ("powerpc/mce: Hookup ierror (instruction) UE errors") Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Mahesh Salgaonkar Signed-off-by: Balbir Singh Reviewed-by: Balbir Singh Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/mce_power.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) --- a/arch/powerpc/kernel/mce_power.c +++ b/arch/powerpc/kernel/mce_power.c @@ -441,7 +441,6 @@ static int mce_handle_ierror(struct pt_r if (pfn != ULONG_MAX) { *phys_addr = (pfn << PAGE_SHIFT); - handled = 1; } } } @@ -532,9 +531,7 @@ static int mce_handle_derror(struct pt_r * kernel/exception-64s.h */ if (get_paca()->in_mce < MAX_MCE_DEPTH) - if (!mce_find_instr_ea_and_pfn(regs, addr, - phys_addr)) - handled = 1; + mce_find_instr_ea_and_pfn(regs, addr, phys_addr); } found = 1; } @@ -572,7 +569,7 @@ static long mce_handle_error(struct pt_r const struct mce_ierror_table itable[]) { struct mce_error_info mce_err = { 0 }; - uint64_t addr, phys_addr; + uint64_t addr, phys_addr = ULONG_MAX; uint64_t srr1 = regs->msr; long handled;