Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2217310pxj; Sat, 22 May 2021 15:10:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwl0FG1tXIJQjziwtxOk7kBONH9na17lKyIAkOkzTgfI2fDx3WJFRD0kJ+XFCWU3HYfNK29 X-Received: by 2002:a17:906:c04b:: with SMTP id bm11mr16165383ejb.263.1621721426378; Sat, 22 May 2021 15:10:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621721426; cv=none; d=google.com; s=arc-20160816; b=ZmglgDOBOZ9orA7mHdyqkLlSM09ZVuXUDF52UqWaNN/A8C8aiok3KxBUArkKL6zVUG EHfzNMDymNj/hZZB5cd4wVFiM4NQwyRbRqVSikT502JdY8dmIRrhZB7aqUxGhfhJxsZ7 6GLHUPZm4cpxgiZQud0mqktfYYAHUsz15UhJ9Hayg3blaSb6AWw+/OQaAi8WSpcWRYLx ch3LX7RxD6VkeKnlpKpSmYmxY8xzqN1uXFLqOUkRMizQaT4tHfIYcqL/WrELNunQKbRi 64e8wB8vY00XZA5jBLz5M90w9Ce71+Knrk7/cPjMmur2Z1gLJnUg6bu+zf8enuu34gVy pKpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=gWvoDMLpm/tkvn8e2EYOqwa2H5AM6NcoUdcvmFi1uzE=; b=t1Vhkz4v16apaX8n2EobCwcVl0XvFAQU+6oAHexk28aw0ulIYQJ5karAWOSKCym+C5 vnzeI64x3RzakqE2WuMGgi55mW+aTa4Tyykyoec9GE4uvHEjAZn3vOE8Yrg108yxCPuG O6SMWw0VKF8O8ieRk5FFkIK+t6oyvAvXoZBlg1ImZjkAKpUIB0fYd4ziGSOPbtioTLoF h1axKYoyMfXfX+rNwsYaSOo59e87+nMJJj1a05tMzJqIfGk5C6Hw1I0F7uhwTYgmtzNB f4q9PCKxe26ipSzawWhOm1gRZaH5PHqK4mudsDPvwC+go9hZMJ7LbTeBfG9FnC/yHuDl Ha3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b="wOkXc1V/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id mu6si8280829ejc.155.2021.05.22.15.10.00; Sat, 22 May 2021 15:10:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b="wOkXc1V/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231428AbhEVWK1 (ORCPT + 99 others); Sat, 22 May 2021 18:10:27 -0400 Received: from mail.kernel.org ([198.145.29.99]:59700 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231408AbhEVWK0 (ORCPT ); Sat, 22 May 2021 18:10:26 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id BA2A661182; Sat, 22 May 2021 22:09:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1621721341; bh=fQ0aMPh12hl+5c7WBOSl31XItRi0ATmMKJSFriH63vw=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=wOkXc1V/JbPb5GHrwUa9gxLporsN9D101phMZvDCVmuEtJ7Zh/TA2TcRzzdfyK1n4 9+6FMbLyjX5EUIGa2Fip8UkowFZE+y5P7L8kpIyCE2qkUMtYu3z5ZGh21mffn0Vu6n TfmrafZ6jBwSPw0npvfmELrgDPUXGuU2TNjvwujc= Date: Sat, 22 May 2021 15:09:00 -0700 From: Andrew Morton To: Naoya Horiguchi Cc: linux-mm@kvack.org, Tony Luck , Aili Yao , Oscar Salvador , David Hildenbrand , Borislav Petkov , Andy Lutomirski , Naoya Horiguchi , Jue Wang , linux-kernel@vger.kernel.org Subject: Re: [PATCH v5 1/3] mm/memory-failure: Use a mutex to avoid memory_failure() races Message-Id: <20210522150900.39d6832a03c5f772911c5b6d@linux-foundation.org> In-Reply-To: <20210521030156.2612074-2-nao.horiguchi@gmail.com> References: <20210521030156.2612074-1-nao.horiguchi@gmail.com> <20210521030156.2612074-2-nao.horiguchi@gmail.com> X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 21 May 2021 12:01:54 +0900 Naoya Horiguchi wrote: > There can be races when multiple CPUs consume poison from the same > page. The first into memory_failure() atomically sets the HWPoison > page flag and begins hunting for tasks that map this page. Eventually > it invalidates those mappings and may send a SIGBUS to the affected > tasks. > > But while all that work is going on, other CPUs see a "success" > return code from memory_failure() and so they believe the error > has been handled and continue executing. > > Fix by wrapping most of the internal parts of memory_failure() in > a mutex. We can reduce the scope of that mutex, which helps readability at least. --- a/mm/memory-failure.c~mm-memory-failure-use-a-mutex-to-avoid-memory_failure-races-fix +++ a/mm/memory-failure.c @@ -1397,8 +1397,6 @@ out: return rc; } -static DEFINE_MUTEX(mf_mutex); - /** * memory_failure - Handle memory failure of a page. * @pfn: Page Number of the corrupted page @@ -1425,6 +1423,7 @@ int memory_failure(unsigned long pfn, in int res = 0; unsigned long page_flags; bool retry = true; + static DEFINE_MUTEX(mf_mutex); if (!sysctl_memory_failure_recovery) panic("Memory failure on page %lx", pfn); _