Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp278610pxb; Fri, 15 Jan 2021 12:53:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJybW33FScVOT63Fwg3zzI9SnuyQtAPnM66Ej2cYtFbWK/rlULGHv8UGB+cEzoAwD2PemSPv X-Received: by 2002:a05:6402:4d2:: with SMTP id n18mr11011569edw.309.1610743986795; Fri, 15 Jan 2021 12:53:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610743986; cv=none; d=google.com; s=arc-20160816; b=FbJWFdnan+hbXkr7u6QVNZgiEvT86GSSkuIj+kNYXx5hFgQhSblXK8kYtDt7/Uwb6C i7StdvE92Y/YfmWFMIoUuRowK5FhwkC6/eavjKlR47+tCr9hvNG7tFX8g5/IWxFRWED6 32APkMia2wTyncMeHoIEyVtL7RrHLaeb7h96SYNt5tndSVaqHMwwBavUWDcUTCiCerwX wfH4vCvoEfWJvlBUnrSD0JUe/Y/6UlvGXHIDPYDyZpqIRYMtcUt2tOXo8jZXmwAMwn+G LqAlqqkrHPUKaSFi8CutKjZPL6yQbisumSXhpccT5fTel0ylgb0BnbZlJNbxLrMauwKE nEGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:ironport-sdr :ironport-sdr; bh=uOmWvMQ3hi7kjp0EdfZm25WYxrJWzOnv3FPqCU5ruow=; b=eYMt9QusWckxSS0gd+AGDEZEvIJfH5mXx3GqrGF7dUq8tmsSkFlxukl9G9wayE73Wr z1KMJ06Q70JoD7cqVf8uUPDMHDBcRH2of4mLKEhwRZh+nUWm+k7MHcwLZxH/H8rV9FvX csn5E/4swr/t2n1P8usd0DLj0x1pam7d0U7/3U3ERG0ZdqLj6atkVtRtRjkJnK1rL2CJ E1J2KQxEoiEQhWbeePc7Ro3f81C7rPv+cRZnLHu/t0hcAomChznEb7i2vN2VXWCMzNRb y0ySjXKnd4+CiW0S2ZidPavNw6gz1ptI1/Y9V2osTL/hF2Z9rHfk6RTSB7LG3ZyXQoP/ WpHw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d13si3224376edv.602.2021.01.15.12.52.41; Fri, 15 Jan 2021 12:53:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729094AbhAOUvt (ORCPT + 99 others); Fri, 15 Jan 2021 15:51:49 -0500 Received: from mga06.intel.com ([134.134.136.31]:1033 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726434AbhAOUvr (ORCPT ); Fri, 15 Jan 2021 15:51:47 -0500 IronPort-SDR: 30qAGI3IYT/LV5E1t1v6WQXJ40Ah/eMKCLoi4PAmDtlirTnwaFkGxd+HH//FQr/wL89Q9EtiJR F0ga0z1/F9LA== X-IronPort-AV: E=McAfee;i="6000,8403,9865"; a="240148690" X-IronPort-AV: E=Sophos;i="5.79,350,1602572400"; d="scan'208";a="240148690" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2021 12:51:06 -0800 IronPort-SDR: uzNFJ4xaYcddHJLZeG2GHbtIt1lwBJJrdxqFfb2wOSSl8Nd6Ob3tkn/NsLFQdJtmNFs3iGw6dk sh7RLI6GPk3A== X-IronPort-AV: E=Sophos;i="5.79,350,1602572400"; d="scan'208";a="352918036" Received: from agluck-desk2.sc.intel.com (HELO agluck-desk2.amr.corp.intel.com) ([10.3.52.68]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jan 2021 12:51:06 -0800 Date: Fri, 15 Jan 2021 12:51:03 -0800 From: "Luck, Tony" To: Borislav Petkov Cc: x86@kernel.org, Andrew Morton , Peter Zijlstra , Darren Hart , Andy Lutomirski , linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v4] x86/mce: Avoid infinite loop for copy from user recovery Message-ID: <20210115205103.GA5920@agluck-desk2.amr.corp.intel.com> References: <20210111214452.1826-1-tony.luck@intel.com> <20210115003817.23657-1-tony.luck@intel.com> <20210115152754.GC9138@zn.tnic> <20210115193435.GA4663@agluck-desk2.amr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210115193435.GA4663@agluck-desk2.amr.corp.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Recovery action when get_user() triggers a machine check uses the fixup path to make get_user() return -EFAULT. Also queue_task_work() sets up so that kill_me_maybe() will be called on return to user mode to send a SIGBUS to the current process. But there are places in the kernel where the code assumes that this EFAULT return was simply because of a page fault. The code takes some action to fix that, and then retries the access. This results in a second machine check. While processing this second machine check queue_task_work() is called again. But since this uses the same callback_head structure that was used in the first call, the net result is an entry on the current->task_works list that points to itself. When task_work_run() is called it loops forever in this code: do { next = work->next; work->func(work); work = next; cond_resched(); } while (work); Add a counter (current->mce_count) to keep track of repeated machine checks before task_work() is called. First machine check saves the address information and calls task_work_add(). Subsequent machine checks before that task_work call back is executed check that the address is in the same page as the first machine check (since the callback will offline exactly one page). Expected worst case is two machine checks before moving on (e.g. one user access with page faults disabled, then a repeat to the same addrsss with page faults enabled). Just in case there is some code that loops forever enforce a limit of 10. Signed-off-by: Tony Luck --- V4: Fix bug with "||" where I meant "&&" Update stale commit comment referring to mce_busy field in task structure (now called mce_count). Add some comments to queue_task_work() arch/x86/kernel/cpu/mce/core.c | 31 ++++++++++++++++++++++++------- include/linux/sched.h | 1 + 2 files changed, 25 insertions(+), 7 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 13d3f1cbda17..5460c146edb5 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1238,6 +1238,7 @@ static void __mc_scan_banks(struct mce *m, struct pt_regs *regs, struct mce *fin static void kill_me_now(struct callback_head *ch) { + p->mce_count = 0; force_sig(SIGBUS); } @@ -1246,6 +1247,7 @@ static void kill_me_maybe(struct callback_head *cb) struct task_struct *p = container_of(cb, struct task_struct, mce_kill_me); int flags = MF_ACTION_REQUIRED; + p->mce_count = 0; pr_err("Uncorrected hardware memory error in user-access at %llx", p->mce_addr); if (!p->mce_ripv) @@ -1266,12 +1268,27 @@ static void kill_me_maybe(struct callback_head *cb) } } -static void queue_task_work(struct mce *m, int kill_current_task) +static void queue_task_work(struct mce *m, char *msg, int kill_current_task) { - current->mce_addr = m->addr; - current->mce_kflags = m->kflags; - current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV); - current->mce_whole_page = whole_page(m); + /* First call, save all the details */ + if (current->mce_count++ == 0) { + current->mce_addr = m->addr; + current->mce_kflags = m->kflags; + current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV); + current->mce_whole_page = whole_page(m); + } + + /* Ten is likley overkill. Don't expect more than two faults before task_work() */ + if (current->mce_count > 10) + mce_panic("Too many machine checks while accessing user data", m, msg); + + /* Second or later call, make sure page address matches the one from first call */ + if (current->mce_count > 1 && (current->mce_addr >> PAGE_SHIFT) != (m->addr >> PAGE_SHIFT)) + mce_panic("Machine checks to different user pages", m, msg); + + /* Do not call task_work_add() more than once */ + if (current->mce_count > 1) + return; if (kill_current_task) current->mce_kill_me.func = kill_me_now; @@ -1414,7 +1431,7 @@ noinstr void do_machine_check(struct pt_regs *regs) /* If this triggers there is no way to recover. Die hard. */ BUG_ON(!on_thread_stack() || !user_mode(regs)); - queue_task_work(&m, kill_current_task); + queue_task_work(&m, msg, kill_current_task); } else { /* @@ -1432,7 +1449,7 @@ noinstr void do_machine_check(struct pt_regs *regs) } if (m.kflags & MCE_IN_KERNEL_COPYIN) - queue_task_work(&m, kill_current_task); + queue_task_work(&m, msg, kill_current_task); } out: mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); diff --git a/include/linux/sched.h b/include/linux/sched.h index 6e3a5eeec509..386366c9c757 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1362,6 +1362,7 @@ struct task_struct { mce_whole_page : 1, __mce_reserved : 62; struct callback_head mce_kill_me; + int mce_count; #endif #ifdef CONFIG_KRETPROBES -- 2.21.1