Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp2352657pxb; Mon, 20 Sep 2021 19:50:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyn69QmMR1ctqQiBeji1Ztzp5ed3QBLtH2Uj77qyfDfQByGSLYmZyCn+B4uKswX7BK8mTMw X-Received: by 2002:a17:906:dbf2:: with SMTP id yd18mr18330833ejb.536.1632192600431; Mon, 20 Sep 2021 19:50:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632192600; cv=none; d=google.com; s=arc-20160816; b=mNXgSfe/Eb3QviX3fZElZ7LQ2N+oL2wGm3sYd1eTBEylRE92RhjH94xAKyJg28moTl A7hpNPhbPbSNTIMW+BK5AeH5CMBww9TG72Baw18+E2Vogr7AI+5Hi+8iJfgI0lFk3wZk YCznadp5q3Q/WKva4Wd466CPC5QEyeNBtGuxSry/Y2pZM23btz7ESwQ3sDP04FKSpJwV XEj2mJ0dzs2Ceig1vwcJo3xodLwU+KBitdWZ61timcSYU6b2fRUpE6LrAILfS3LhHjqI 9cEKt7hdjkXNON+3dqtMuKaoqpZYTpFQVNdq4sAR8zWbDPPF1b1i8C8OYjTU97O2OF+H 3x/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=8zuGOFM7cQvXAKPXoz/fToO9bKuVMMKkweTIZ+zaKqI=; b=lotOYgjMLVjkh4NdQcp3CktUZWP6SCOcljhmr/t4hBPAzcV/rIYB6mXH98x+MKuqHF 5pmd93QJ7p0SjiiKJ+6Cn7cuxpmcL9xTNUgPmMiMpnTtAOnjuvEgr7lNgXwzlJguM/bW crygTfaqAT/0kd0rCbINRgU9pRshu4zSPN9ZU42aTNgix1QmpCaic2FpFsIOHk4t5aTR fIDES6EIiS2MLmEZygfSwf1M/K2O+0ITMuWr0RfPuuTgCkNiE2RSePSkI7c/0Jvr0qlj 1F+Ts/Bo8vx984RbUKemsWP3VlS9tOEpJdPwL5VHsTl+VvubAJT/TmdfwitC2FUCBcS0 PCOg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=DNZRlnAu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 6si18633432ejk.777.2021.09.20.19.49.37; Mon, 20 Sep 2021 19:50:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=DNZRlnAu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1382209AbhITSkM (ORCPT + 99 others); Mon, 20 Sep 2021 14:40:12 -0400 Received: from mail.kernel.org ([198.145.29.99]:53124 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1380889AbhITSfB (ORCPT ); Mon, 20 Sep 2021 14:35:01 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 0A3DF61ABB; Mon, 20 Sep 2021 17:28:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1632158933; bh=43SxzmFKTqG6M1TynZWIgKseI2ME78ti4N0HTvupqHU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=DNZRlnAufGecN+RCEA8qEFemHcjVndlwhcdt2wJTPOYnhy17H3QbU9LCFfYLyM99F jZbneZxCVWO+3nlxiJItN3JI8L2lEkALUwebZLQZst4mF6cVu6oBqn4vuW6evbHoMR ONepuGJkm3+VcC3dVSogSj+XXxEmCHVm619RO6y8= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Tony Luck , Borislav Petkov Subject: [PATCH 5.10 122/122] x86/mce: Avoid infinite loop for copy from user recovery Date: Mon, 20 Sep 2021 18:44:54 +0200 Message-Id: <20210920163919.810398108@linuxfoundation.org> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210920163915.757887582@linuxfoundation.org> References: <20210920163915.757887582@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Tony Luck commit 81065b35e2486c024c7aa86caed452e1f01a59d4 upstream. There are two cases for machine check recovery: 1) The machine check was triggered by ring3 (application) code. This is the simpler case. The machine check handler simply queues work to be executed on return to user. That code unmaps the page from all users and arranges to send a SIGBUS to the task that triggered the poison. 2) The machine check was triggered in kernel code that is covered by an exception table entry. In this case the machine check handler still queues a work entry to unmap the page, etc. but this will not be called right away because the #MC handler returns to the fix up code address in the exception table entry. Problems occur if the kernel triggers another machine check before the return to user processes the first queued work item. Specifically, the work is queued using the ->mce_kill_me callback structure in the task struct for the current thread. Attempting to queue a second work item using this same callback results in a loop in the linked list of work functions to call. So when the kernel does return to user, it enters an infinite loop processing the same entry for ever. There are some legitimate scenarios where the kernel may take a second machine check before returning to the user. 1) Some code (e.g. futex) first tries a get_user() with page faults disabled. If this fails, the code retries with page faults enabled expecting that this will resolve the page fault. 2) Copy from user code retries a copy in byte-at-time mode to check whether any additional bytes can be copied. On the other side of the fence are some bad drivers that do not check the return value from individual get_user() calls and may access multiple user addresses without noticing that some/all calls have failed. Fix by adding a counter (current->mce_count) to keep track of repeated machine checks before task_work() is called. First machine check saves the address information and calls task_work_add(). Subsequent machine checks before that task_work call back is executed check that the address is in the same page as the first machine check (since the callback will offline exactly one page). Expected worst case is four machine checks before moving on (e.g. one user access with page faults disabled, then a repeat to the same address with page faults enabled ... repeat in copy tail bytes). Just in case there is some code that loops forever enforce a limit of 10. [ bp: Massage commit message, drop noinstr, fix typo, extend panic messages. ] Fixes: 5567d11c21a1 ("x86/mce: Send #MC singal from task work") Signed-off-by: Tony Luck Signed-off-by: Borislav Petkov Cc: Link: https://lkml.kernel.org/r/YT/IJ9ziLqmtqEPu@agluck-desk2.amr.corp.intel.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/mce/core.c | 45 ++++++++++++++++++++++++++++++----------- include/linux/sched.h | 1 2 files changed, 34 insertions(+), 12 deletions(-) --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1241,6 +1241,9 @@ static void __mc_scan_banks(struct mce * static void kill_me_now(struct callback_head *ch) { + struct task_struct *p = container_of(ch, struct task_struct, mce_kill_me); + + p->mce_count = 0; force_sig(SIGBUS); } @@ -1249,6 +1252,7 @@ static void kill_me_maybe(struct callbac struct task_struct *p = container_of(cb, struct task_struct, mce_kill_me); int flags = MF_ACTION_REQUIRED; + p->mce_count = 0; pr_err("Uncorrected hardware memory error in user-access at %llx", p->mce_addr); if (!p->mce_ripv) @@ -1269,17 +1273,34 @@ static void kill_me_maybe(struct callbac } } -static void queue_task_work(struct mce *m, int kill_it) +static void queue_task_work(struct mce *m, char *msg, int kill_current_task) { - current->mce_addr = m->addr; - current->mce_kflags = m->kflags; - current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV); - current->mce_whole_page = whole_page(m); - - if (kill_it) - current->mce_kill_me.func = kill_me_now; - else - current->mce_kill_me.func = kill_me_maybe; + int count = ++current->mce_count; + + /* First call, save all the details */ + if (count == 1) { + current->mce_addr = m->addr; + current->mce_kflags = m->kflags; + current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV); + current->mce_whole_page = whole_page(m); + + if (kill_current_task) + current->mce_kill_me.func = kill_me_now; + else + current->mce_kill_me.func = kill_me_maybe; + } + + /* Ten is likely overkill. Don't expect more than two faults before task_work() */ + if (count > 10) + mce_panic("Too many consecutive machine checks while accessing user data", m, msg); + + /* Second or later call, make sure page address matches the one from first call */ + if (count > 1 && (current->mce_addr >> PAGE_SHIFT) != (m->addr >> PAGE_SHIFT)) + mce_panic("Consecutive machine checks to different user pages", m, msg); + + /* Do not call task_work_add() more than once */ + if (count > 1) + return; task_work_add(current, ¤t->mce_kill_me, TWA_RESUME); } @@ -1427,7 +1448,7 @@ noinstr void do_machine_check(struct pt_ /* If this triggers there is no way to recover. Die hard. */ BUG_ON(!on_thread_stack() || !user_mode(regs)); - queue_task_work(&m, kill_it); + queue_task_work(&m, msg, kill_it); } else { /* @@ -1445,7 +1466,7 @@ noinstr void do_machine_check(struct pt_ } if (m.kflags & MCE_IN_KERNEL_COPYIN) - queue_task_work(&m, kill_it); + queue_task_work(&m, msg, kill_it); } out: mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1354,6 +1354,7 @@ struct task_struct { mce_whole_page : 1, __mce_reserved : 62; struct callback_head mce_kill_me; + int mce_count; #endif /*