Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp957436pxv; Thu, 22 Jul 2021 17:17:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyT35i3zFEbSge0hY20BuvYolrQWb5SjiJQ89vnq/Kz8nH5ZBTE6ZBIwjfb4+4jPqvGn80L X-Received: by 2002:a17:906:f6d5:: with SMTP id jo21mr2248841ejb.444.1626999427681; Thu, 22 Jul 2021 17:17:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626999427; cv=none; d=google.com; s=arc-20160816; b=wcJlyiDUlorZJmprn/JnE8B9spejRHSaIYvDlN0MXyIzyGOsu7+P6R6qmC9lcQRGZW Q2zz/Tftv9VoA3eSWOgqXx5ypf5jl6LN+55lVZUuMD+vTTHrn+fYOCG3h1/X1gBBurOC LubMiHJm9RT6iRTGVBkcgydKlbEZDmkhhbhjYFrFTZBdQju4hXeUa3rjYpqhSFvaFt1+ d6dT6hLs8t4jItyHiOt3ft90Bxlb0DlHMxkW2Xr8fCl6eKxWhw0deonzD+nvfUenPVdZ C9k/LSvqgoz6QK/wFRoljAA2RLN60YLH9yTjvS0cnpwPnGspYNTemoUFq/zcaecoqShP ZWwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=Zkw8z76w/jWUcfsh4jURrgR+kwZkdD84RUq0xQvpWjM=; b=u+0g3IibLg0kwAWIo9LYdPA/K7VK81zOkEW53rzh0jxJR4f8jZdIlBosaTDOMqSM55 3mR2ZVlkI6No5Gi8su+59DmzcJ/dLbbqYxXJsBRxgc0iKMJoKnNMkgnVNDO0HPavuKv1 HJvleutcRTJ+RPySYl/ZFr6H3m6CUsB/nPH1lvvCYc3QJKoTuNuSIRCeCllkD3QnzFPI tH68JTzp0Fuh9EPJ1pYdK0/ngDU3Qe/WtPQkFj7jyP2eMyZ/y2bzgTGrASM6lJOnaQBZ cLaEx4Cc2aZninz6e1u5kwqI0vZ+zK6v5FdqYrEr1jEG+OeK5TO8AFcQ3aIEeX9HUuqj F9KQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id du14si22535213ejc.35.2021.07.22.17.16.33; Thu, 22 Jul 2021 17:17:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232764AbhGVXeE (ORCPT + 99 others); Thu, 22 Jul 2021 19:34:04 -0400 Received: from mga05.intel.com ([192.55.52.43]:35219 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232550AbhGVXeD (ORCPT ); Thu, 22 Jul 2021 19:34:03 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10053"; a="297343473" X-IronPort-AV: E=Sophos;i="5.84,262,1620716400"; d="scan'208";a="297343473" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2021 17:14:37 -0700 X-IronPort-AV: E=Sophos;i="5.84,262,1620716400"; d="scan'208";a="470872653" Received: from agluck-desk2.sc.intel.com (HELO agluck-desk2.amr.corp.intel.com) ([10.3.52.146]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2021 17:14:37 -0700 Date: Thu, 22 Jul 2021 17:14:36 -0700 From: "Luck, Tony" To: Jue Wang Cc: Borislav Petkov , dinghui@sangfor.com.cn, huangcun@sangfor.com.cn, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, HORIGUCHI =?utf-8?B?TkFPWUEo5aCA5Y+jIOebtOS5nyk=?= , Oscar Salvador , x86 , "Song, Youquan" Subject: Re: [PATCH 2/3] x86/mce: Avoid infinite loop for copy from user recovery Message-ID: <20210723001436.GA1460637@agluck-desk2.amr.corp.intel.com> References: <20210722151930.GA1453521@agluck-desk2.amr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 22, 2021 at 04:30:44PM -0700, Jue Wang wrote: > I think the challenge being the uncorrectable errors are essentially > random. It's > just a matter of time for >1 UC errors to show up in sequential kernel accesses. > > It's easy to create such cases with artificial error injections. > > I suspect we want to design this part of the kernel to be able to handle generic > cases? Remember that: 1) These errors are all in application memory 2) We reset the count every time we get into the task_work function that will return to user So the multiple error scenario here is one where we hit errors on different user pages on a single trip into the kernel. Hitting the same page is easy. The kernel has places where it can hit poison with page faults disabled, and it then enables page faults and retries the same access, and hits poison again. I'm not aware of, nor expecting to find, places where the kernel tries to access user address A and hits poison, and then tries to access user address B (without returrning to user between access A and access B). -Tony