Received: by 2002:a25:86ce:0:0:0:0:0 with SMTP id y14csp397024ybm; Mon, 20 May 2019 18:52:38 -0700 (PDT) X-Google-Smtp-Source: APXvYqyohPNOBuhEBSSVdg/blZ3F++zWK+tJmtN8Y/GfhuUjwIG/r04kwHtbNh4MB+3oK6I3FVA5 X-Received: by 2002:aa7:8a95:: with SMTP id a21mr22558404pfc.215.1558403558570; Mon, 20 May 2019 18:52:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558403558; cv=none; d=google.com; s=arc-20160816; b=Hpq7o5B1KfHLrwYjTmPXdqtIS4AXbfXO8Pl+N5iWkRjIPI3QhWkJQBKfSXYru23wFb e6QPWJetxaoMZgNAMBmwVPjtzxC0k0McsUUy2wzHk6u9l62EJhoyt9wsP3A+DXZ415oD wnRkUfcn3VrKQZL+bRmJ2c3c2pEK2HZGPaXnAP/AIUcIZWcq0xia2vu7CUaYoVO+/JJC 37DUTXHKBh9ihZ1CoOYwDgZIQfXbN/zdMX/nyZiMuh/0vn5xzmxpq0FN+Hhh6Jy8qCZL 3EJAt6yEcvKMrESqXPI21GMkkZdF9k/bcfB6NtYg6kWBSuqlRceV8AhjVjgCmPbqenmH cMRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:cc:to:subject :dkim-signature; bh=dn+YYBqV56XU7F7c8CWuso5/+bxKEcmadpFPiXvJSds=; b=bEqWnHwtCRwzFz3SEqEtxIOENq6tBYVuSUJTiiMRRgbdNKNMb+NJ4nTk7fkoOPt232 tsMTQUYvpYC08R2cokkiuOFh9G3TkvbLAy8Tu5XfsPbMvC5C9CAfXXdFqnKtNBtxt8oU 5bK9oqBs5d0dIL7PS2/hkxkDz26ZWK8yXS12nWmgLMad8b3ufuJOdpwbZB4tRoG8H6H3 QlDdyRwNLrkHs1llHFvnncvtYfJX7ZLbvKYR9Azibult7s2rcvzF016H4QyQ4oOoMKX3 UPvUaNkxVXqtTAcR7UB7KACscVRLWLJAbV7bklmDU3RMUGcuaG/5P6wLOua5J34mhdRL ItEg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=DdAorvBu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i5si9146937pgs.4.2019.05.20.18.52.23; Mon, 20 May 2019 18:52:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=DdAorvBu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727511AbfEUBuO (ORCPT + 99 others); Mon, 20 May 2019 21:50:14 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:45026 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727043AbfEUBuO (ORCPT ); Mon, 20 May 2019 21:50:14 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x4L1nFqj053277; Tue, 21 May 2019 01:50:06 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=dn+YYBqV56XU7F7c8CWuso5/+bxKEcmadpFPiXvJSds=; b=DdAorvBuJ54owqKG0kLG39NVx7aA0pnGQwqww2td+q3Q1tnyKKa/wLEQXTY+tsG2Te1d v3vE+2FjpNDnS77mfNnXh6YQrMpVPdIL7n6fpi4uJiroTlDpFR7MxG8tC55LHMiaw+5Y ReeZzbIVqmHfkYSiyZp86cOiDHWGhSRMMBqUqkqed1VfAde28N0s4YUo0tvaSys4mvBM 0XigivNYS6TCuK6bakIVbsIkeTpREeiZfg103SbG9xE2gS1LMT2pHQlI+nUBSxbywcpN 8n2ukoqn5/SEAUsMPplFXfogsWXKud+1bO0wbsQUQ6haFsLfiPuSsLGADUM71mZ7jROQ Kw== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2120.oracle.com with ESMTP id 2sjapqa6sr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 21 May 2019 01:50:05 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x4L1nYqf031921; Tue, 21 May 2019 01:50:05 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserp3030.oracle.com with ESMTP id 2sks1xww56-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 21 May 2019 01:50:04 +0000 Received: from abhmp0010.oracle.com (abhmp0010.oracle.com [141.146.116.16]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x4L1o4tS000757; Tue, 21 May 2019 01:50:04 GMT Received: from [10.159.155.76] (/10.159.155.76) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 21 May 2019 01:50:04 +0000 Subject: Re: [PATCH] mm, memory-failure: clarify error message To: Naoya Horiguchi , Anshuman Khandual Cc: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "linux-nvdimm@lists.01.org" References: <1558066095-9495-1-git-send-email-jane.chu@oracle.com> <512532de-4c09-626d-380f-58cef519166b@arm.com> <20190520102106.GA12721@hori.linux.bs1.fc.nec.co.jp> From: Jane Chu Organization: Oracle Corporation Message-ID: <49fd8918-5762-9b92-d383-8fdd96cf1c38@oracle.com> Date: Mon, 20 May 2019 18:50:02 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <20190520102106.GA12721@hori.linux.bs1.fc.nec.co.jp> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9263 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1905210009 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9263 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1905210009 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thanks Vishal and Naoya! -jane On 5/20/2019 3:21 AM, Naoya Horiguchi wrote: > On Fri, May 17, 2019 at 10:18:02AM +0530, Anshuman Khandual wrote: >> >> On 05/17/2019 09:38 AM, Jane Chu wrote: >>> Some user who install SIGBUS handler that does longjmp out >> What the longjmp about ? Are you referring to the mechanism of catching the >> signal which was registered ? > AFAIK, longjmp() might be useful for signal-based retrying, so highly > optimized applications like Oracle DB might want to utilize it to handle > memory errors in application level, I guess. > >>> therefore keeping the process alive is confused by the error >>> message >>> "[188988.765862] Memory failure: 0x1840200: Killing >>> cellsrv:33395 due to hardware memory corruption" >> Its a valid point because those are two distinct actions. >> >>> Slightly modify the error message to improve clarity. >>> >>> Signed-off-by: Jane Chu >>> --- >>> mm/memory-failure.c | 7 ++++--- >>> 1 file changed, 4 insertions(+), 3 deletions(-) >>> >>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>> index fc8b517..14de5e2 100644 >>> --- a/mm/memory-failure.c >>> +++ b/mm/memory-failure.c >>> @@ -216,10 +216,9 @@ static int kill_proc(struct to_kill *tk, unsigned long pfn, int flags) >>> short addr_lsb = tk->size_shift; >>> int ret; >>> >>> - pr_err("Memory failure: %#lx: Killing %s:%d due to hardware memory corruption\n", >>> - pfn, t->comm, t->pid); >>> - >>> if ((flags & MF_ACTION_REQUIRED) && t->mm == current->mm) { >>> + pr_err("Memory failure: %#lx: Killing %s:%d due to hardware memory " >>> + "corruption\n", pfn, t->comm, t->pid); >>> ret = force_sig_mceerr(BUS_MCEERR_AR, (void __user *)tk->addr, >>> addr_lsb, current); >>> } else { >>> @@ -229,6 +228,8 @@ static int kill_proc(struct to_kill *tk, unsigned long pfn, int flags) >>> * This could cause a loop when the user sets SIGBUS >>> * to SIG_IGN, but hopefully no one will do that? >>> */ >>> + pr_err("Memory failure: %#lx: Sending SIGBUS to %s:%d due to hardware " >>> + "memory corruption\n", pfn, t->comm, t->pid); >>> ret = send_sig_mceerr(BUS_MCEERR_AO, (void __user *)tk->addr, >>> addr_lsb, t); /* synchronous? */ >> As both the pr_err() messages are very similar, could not we just switch between "Killing" >> and "Sending SIGBUS to" based on a variable e.g action_[kill|sigbus] evaluated previously >> with ((flags & MF_ACTION_REQUIRED) && t->mm == current->mm). > That might need additional if sentence, which I'm not sure worth doing. > I think that the simplest fix for the reported problem (a confusing message) > is like below: > > - pr_err("Memory failure: %#lx: Killing %s:%d due to hardware memory corruption\n", > + pr_err("Memory failure: %#lx: Sending SIGBUS to %s:%d due to hardware memory corruption\n", > pfn, t->comm, t->pid); > > Or, if we have a good reason to separate the message for MF_ACTION_REQUIRED and > MF_ACTION_OPTIONAL, that might be OK. > > Thanks, > Naoya Horiguchi