Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp2481403iog; Sun, 19 Jun 2022 19:35:09 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vKEC7J6lTKMlC7WmXL1JuU/+zamt5uUkvel/u6/mp9cGULae++vkJS4zdt6jkcZ2CnWWNh X-Received: by 2002:a17:902:f652:b0:156:701b:9a2a with SMTP id m18-20020a170902f65200b00156701b9a2amr21499406plg.14.1655692509622; Sun, 19 Jun 2022 19:35:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655692509; cv=none; d=google.com; s=arc-20160816; b=SqX/xrykEU5kji3SA3aSFlHobg6c33gAdO8jZzlmUdiNhSGPvs3iJ0Z41nx1/sbXfA 1LiUA3GEq5Hhdnstoqx596XmEiljpHgiE96QWAhAZyU8c6L9m4xotEi7Y4/wsm/1BgD1 bMvaedMh2GLogBHY4/2n1F4+wNF/we8fmYfrEJhoVfKaT4yaPOl+YKgUXVnkHlyOB97B gUIi3ui7544lqdWYOPrXoJ85bcjq71HVHwioE/EOFg/1wZAZhqcERaXp0HHSxO7CO/uS k4h3NBHeeUVdiaJ9E2eBR1Q7s7Zd1LIxuYBbe6MYZbKPVxxnmz+NEfXY7IwBLCe40iUl mpwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id; bh=prOq3udDUBRxaxJVT8va6P/2GlTJJNRYqOPB4W5ZLwU=; b=fPBXz8oMSQIYsJRNA3VM3/MSerWQ6TuYvVcEFxphO2k8VsmpE/7N8U/zDzkSERXNY1 gzj9wXEzVPjOtnBpMtnbXd+tdQpTaKctKViuL7mkhgkdNCTuaaInVE9IiS8bFaVaRY3w PpOaefdqIXsFqQaif+M8TnHVnf3zLw51TH7Iuz3enWsyQyhj/JLoWTpUIlxadfbllPHL PGZStDsJGNFz9Jwt/KoNz2Buv8r28wInED/PaOmHIgJkDhwpOfmiw/xFTMN8miYj+BDJ NfwtJh43UVAi/i8UYfe29FmwbF5pVTC27OqT22hYSTahxkZLH3So2vwtQwvhLufmNoqH EnUA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n127-20020a632785000000b003f60a405a01si14273435pgn.620.2022.06.19.19.34.58; Sun, 19 Jun 2022 19:35:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236874AbiFTBxm (ORCPT + 99 others); Sun, 19 Jun 2022 21:53:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48100 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237603AbiFTBxe (ORCPT ); Sun, 19 Jun 2022 21:53:34 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7068CB495 for ; Sun, 19 Jun 2022 18:53:31 -0700 (PDT) Received: from dggemv703-chm.china.huawei.com (unknown [172.30.72.57]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4LRCLb5QCdzkWMc; Mon, 20 Jun 2022 09:51:51 +0800 (CST) Received: from kwepemm600017.china.huawei.com (7.193.23.234) by dggemv703-chm.china.huawei.com (10.3.19.46) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Mon, 20 Jun 2022 09:53:29 +0800 Received: from [10.174.179.234] (10.174.179.234) by kwepemm600017.china.huawei.com (7.193.23.234) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Mon, 20 Jun 2022 09:53:27 +0800 Message-ID: <95ae5d1a-fcfd-9106-4b13-9978de1a3d23@huawei.com> Date: Mon, 20 Jun 2022 09:53:26 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCH -next v5 6/8] arm64: add support for machine check error safe To: Mark Rutland CC: James Morse , Andrew Morton , Thomas Gleixner , "Ingo Molnar" , Borislav Petkov , Robin Murphy , Dave Hansen , "Catalin Marinas" , Will Deacon , "Alexander Viro" , Michael Ellerman , Benjamin Herrenschmidt , Paul Mackerras , , "H . Peter Anvin" , , , , , Kefeng Wang , Xie XiuQi , Guohanjun References: <20220528065056.1034168-1-tongtiangen@huawei.com> <20220528065056.1034168-7-tongtiangen@huawei.com> <4aa8b109-c79b-8da0-db89-85ca128f1049@huawei.com> From: Tong Tiangen In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.234] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemm600017.china.huawei.com (7.193.23.234) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2022/6/18 20:52, Mark Rutland 写道: > On Sat, Jun 18, 2022 at 05:18:55PM +0800, Tong Tiangen wrote: >> 在 2022/6/17 16:55, Mark Rutland 写道: >>> On Sat, May 28, 2022 at 06:50:54AM +0000, Tong Tiangen wrote: >>>> +static bool arm64_do_kernel_sea(unsigned long addr, unsigned int esr, >>>> + struct pt_regs *regs, int sig, int code) >>>> +{ >>>> + if (!IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC)) >>>> + return false; >>>> + >>>> + if (user_mode(regs) || !current->mm) >>>> + return false; >>> >>> What's the `!current->mm` check for? >> >> At first, I considered that only user processes have the opportunity to >> recover when they trigger memory error. >> >> But it seems that this restriction is unreasonable. When the kernel thread >> triggers memory error, it can also be recovered. for instance: >> >> https://lore.kernel.org/linux-mm/20220527190731.322722-1-jiaqiyan@google.com/ >> >> And i think if(!current->mm) shoud be added below: >> >> if(!current->mm) { >> set_thread_esr(0, esr); >> arm64_force_sig_fault(...); >> } >> return true; > > Why does 'current->mm' have anything to do with this, though? Sorry, typo, my original logic was: if(current->mm) { [...] } > > There can be kernel threads with `current->mm` set in unusual circumstances > (and there's a lot of kernel code out there which handles that wrong), so if > you want to treat user tasks differently, we should be doing something like > checking PF_KTHREAD, or adding something like an is_user_task() helper. > OK, i do want to treat user tasks differently here and didn't take into account what you said. will be fixed next version according to your suggestiong. As follows: if (!(current->flags & PF_KTHREAD)) { set_thread_esr(0, esr); arm64_force_sig_fault(...); } return true; > [...] > >>>> + >>>> + if (apei_claim_sea(regs) < 0) >>>> + return false; >>>> + >>>> + if (!fixup_exception_mc(regs)) >>>> + return false; >>> >>> I thought we still wanted to signal the task in this case? Or do you expect to >>> add that into `fixup_exception_mc()` ? >> >> Yeah, here return false and will signal to task in do_sea() -> >> arm64_notify_die(). > > I mean when we do the fixup. > > I thought the idea was to apply the fixup (to stop the kernel from crashing), > but still to deliver a fatal signal to the user task since we can't do what the > user task asked us to. > Yes, that's what i mean. :) >>>> + >>>> + set_thread_esr(0, esr); >>> >>> Why are we not setting the address? Is that deliberate, or an oversight? >> >> Here set fault_address to 0, i refer to the logic of arm64_notify_die(). >> >> void arm64_notify_die(...) >> { >> if (user_mode(regs)) { >> WARN_ON(regs != current_pt_regs()); >> current->thread.fault_address = 0; >> current->thread.fault_code = err; >> >> arm64_force_sig_fault(signo, sicode, far, str); >> } else { >> die(str, regs, err); >> } >> } >> >> I don't know exactly why and do you know why arm64_notify_die() did this? :) > > To be honest, I don't know, and that looks equally suspicious to me. > > Looking at the git history, that was added in commit: > > 9141300a5884b57c ("arm64: Provide read/write fault information in compat signal handlers") > > ... so maybe Catalin recalls why. > > Perhaps the assumption is just that this will be fatal and so unimportant? ... > but in that case the same logic would apply to the ESR value, so it's not clear > to me. OK, let's proceed as set to 0, if there is any change later, the two positions shall be changed together. Thanks, Tong. > > Mark. > > .