Received: by 10.223.176.5 with SMTP id f5csp1844373wra; Thu, 8 Feb 2018 04:31:42 -0800 (PST) X-Google-Smtp-Source: AH8x225D15yB18bZzWNy7USY6dgn1S77GrUsAZlMHs0Yn3I1kASo72oTp0OS1CDngWer9Fyp6VX8 X-Received: by 10.98.133.140 with SMTP id m12mr556878pfk.226.1518093102873; Thu, 08 Feb 2018 04:31:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518093102; cv=none; d=google.com; s=arc-20160816; b=xy1HId4u22c93pK3kVdzkGlfKDLaV9BRs6xUpGEiL/Rl4sKdrG2OpzeRqJ6h1edvuQ VAGDCXxbYJnel5Wg/idtCJnlF3Oi5OOKdDgTwk5BYzH7W9EkA8PwPPUrnMX/V5PSmIIe yOHEcnZSeGHhG9Wmb+yfsEemmxVHie6iz3IBpG+z3clw/l/dg95t0XM9n6fuU93A4tLk XdreaeIGJqTqiDw9hYHuug5aYsBUdtiWVQ+TX6TvTMit8VIO48Zv/5yygIsSv7asADzt +Ch/XJlMpYbKiwNOBmdWl+mv+niFTUXJTxQVKvxIlaCKYRyXKoiom1ivuv2salRP4lAs lZcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from :arc-authentication-results; bh=uslAw6fMvw4e7O+zsQ1YFHK8B2YAWBDOMu0JQgrzKlE=; b=tfAokBTB4og+0fsNrBoPJcgOeI+PlMcTrEO4llUZEpCtTtXxIdqhrr51Qax6lFlDBe MZGE3ff7rGOTW/EWPokBW7tTkFLP2yVyNWTpDmaE2tOlVHxubyYIr8z8c0brEB8FHkG3 Og7UBK3RsTobp5MyWR+9mXBZACW7YMMwHD3HSNwEAzRpPAVIY346+HzVG5yDONkfRD4S pA+dKioZHFoPVt72bhr9zKjK3/z+V1YqLdbF3Y5XUT4QsdnMPadJZfmZ0kv/xrZ0Vxtb fklXxd5erAOkzOwgE20Nl17fxmNRIr6j3CLEtXlZ3QaEIaLuKG/yBj3frxRMbpNkq5eW NDxw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q13si2342710pgv.465.2018.02.08.04.31.28; Thu, 08 Feb 2018 04:31:42 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751079AbeBHMat (ORCPT + 99 others); Thu, 8 Feb 2018 07:30:49 -0500 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:34408 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750806AbeBHMar (ORCPT ); Thu, 8 Feb 2018 07:30:47 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3B6D01435; Thu, 8 Feb 2018 04:30:47 -0800 (PST) Received: from localhost (e105922-lin.cambridge.arm.com [10.1.207.29]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id AFDEE3F24D; Thu, 8 Feb 2018 04:30:46 -0800 (PST) From: Punit Agrawal To: Naoya Horiguchi Cc: "linux-mm\@kvack.org" , Andrew Morton , Michal Hocko , Mike Kravetz , "Aneesh Kumar K.V" , Anshuman Khandual , "linux-kernel\@vger.kernel.org" Subject: Re: [PATCH v2] mm: hwpoison: disable memory error handling on 1GB hugepage References: <20180130013919.GA19959@hori1.linux.bs1.fc.nec.co.jp> <1517284444-18149-1-git-send-email-n-horiguchi@ah.jp.nec.com> <87inbbjx2w.fsf@e105922-lin.cambridge.arm.com> <20180207011455.GA15214@hori1.linux.bs1.fc.nec.co.jp> Date: Thu, 08 Feb 2018 12:30:45 +0000 In-Reply-To: <20180207011455.GA15214@hori1.linux.bs1.fc.nec.co.jp> (Naoya Horiguchi's message of "Wed, 7 Feb 2018 01:14:57 +0000") Message-ID: <87fu6bfytm.fsf@e105922-lin.cambridge.arm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Horiguchi-san, Naoya Horiguchi writes: > Hi Punit, > > On Mon, Feb 05, 2018 at 03:05:43PM +0000, Punit Agrawal wrote: >> Naoya Horiguchi writes: >> [...] >> > >> > You can easily reproduce this by calling madvise(MADV_HWPOISON) twice on >> > a 1GB hugepage. This happens because get_user_pages_fast() is not aware >> > of a migration entry on pud that was created in the 1st madvise() event. >> >> Maybe I'm doing something wrong but I wasn't able to reproduce the issue >> using the test at the end. I get - >> >> $ sudo ./hugepage >> >> Poisoning page...once >> [ 121.295771] Injecting memory failure for pfn 0x8300000 at process virtual address 0x400000000000 >> [ 121.386450] Memory failure: 0x8300000: recovery action for huge page: Recovered >> >> Poisoning page...once again >> madvise: Bad address >> >> What am I missing? > > The test program below is exactly what I intended, so you did right > testing. Thanks for the confirmation. And the flow outline below. > I try to guess what could happen. The related code is like below: > > static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, > int write, struct page **pages, int *nr) > { > ... > do { > pud_t pud = READ_ONCE(*pudp); > > next = pud_addr_end(addr, end); > if (pud_none(pud)) > return 0; > if (unlikely(pud_huge(pud))) { > if (!gup_huge_pud(pud, pudp, addr, next, write, > pages, nr)) > return 0; > > pud_none() always returns false for hwpoison entry in any arch. > I guess that pud_huge() could behave in undefined manner for hwpoison entry > because pud_huge() assumes that a given pud has the present bit set, which > is not true for hwpoison entry. This is where the arm64 helpers behaves differently (though more by chance then design). A poisoned pud passes pud_huge() as it doesn't seem to be explicitly checking for the present bit. int pud_huge(pud_t pud) { return pud_val(pud) && !(pud_val(pud) & PUD_TABLE_BIT); } This doesn't lead to a crash as the first thing gup_huge_pud() does is check for pud_access_permitted() which does check for the present bit. I was able to crash the kernel by changing pud_huge() to check for the present bit. > As a result, pud_huge() checks an irrelevant bit used for other > purpose depending on non-present page table format of each arch. If > pud_huge() returns false for hwpoison entry, we try to go to the lower > level and the kernel highly likely to crash. So I guess your kernel > fell back the slow path and somehow ended up with returning EFAULT. Makes sense. Due to the difference above on arm64, it ends up falling back to the slow path which eventually returns -EFAULT (via follow_hugetlb_page) for poisoned pages. > > So I don't think that the above test result means that errors are properly > handled, and the proposed patch should help for arm64. Although, the deviation of pud_huge() avoids a kernel crash the code would be easier to maintain and reason about if arm64 helpers are consistent with expectations by core code. I'll look to update the arm64 helpers once this patch gets merged. But it would be helpful if there was a clear expression of semantics for pud_huge() for various cases. Is there any version that can be used as reference? Also, do you know what the plans are for re-enabling hugepage poisoning disabled here? Thanks, Punit [...]