Received: by 10.223.176.5 with SMTP id f5csp2618434wra; Mon, 5 Feb 2018 07:07:44 -0800 (PST) X-Google-Smtp-Source: AH8x225bA+ps4HPs5pfPl/jtEzzk+TnbWmeM1b6DprtsC7zaH9rMGaEndWhpOXPDDrUc4Kn8pNRa X-Received: by 10.101.80.69 with SMTP id k5mr27839131pgo.433.1517843264506; Mon, 05 Feb 2018 07:07:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517843264; cv=none; d=google.com; s=arc-20160816; b=nm6Nkbb0LzYH81Fhe/K8TW74WE+AMch4gkx57lBWpXytfEkZriRJIQ9ZyjSZQfJG1q 0rgBUwf8UFAFgS1VQT7hUSBYXoZrMij0Q1UnD+RVM5g9mruszOeX2ewp2t0EB/XmEvfz MTMAaVwUW3sgx+BRvYCqFCfckbsojinqte1mqfOZmyk33mJN/fLHvLC9AyQP5Q/yhC7P tew9NKy8pjhyNw5dNnA2PgaFCGzN4IBIitKlbwYZQseD2SqsyREJCoYDMfJ1rcsnw0ZQ egcNQ6OMOcd6kTBGklH0gC6ksU3zw8KtxQWaQXTBFj3pd8QyI0LGRjG14OB/4dTsaf53 c45Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from :arc-authentication-results; bh=pXmtW5xFLEb3myQEC3hWn0EgA1/B0iWQBlU+Gvt4dWM=; b=Qx1KygqSHpDQzj+gEclLDdoCGQtJQbHSUXAev5l7j8c+PKGqL8UYhBS8ynpCmZNhvG UCSd030ZXcTvBrHjnzvaBnd7UuwvO4p6jEUA2fKWMm22Db/NmYFK9woVTJvDNDegJR6Y 5uOI4wXHBRU8l/osCZWffPzOE/w5BgHRpQFDrY+3mmVGXg53VgLjte/S5WMSHW2l+yt+ bLgM7B52nYgDGp4j0MwMjg6S1RYmQTiZGN0EnxtQnte2DkODfbUCof3xRb5cevPq2lP0 v2CyaXnTtAHzps2jXuwp0vpRUvMqqojoPGu1G5NJVtwBJ78iXRJMp8VljBwcXeXiws14 ulGg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g7si171328pgs.411.2018.02.05.07.07.29; Mon, 05 Feb 2018 07:07:44 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753288AbeBEPFz (ORCPT + 99 others); Mon, 5 Feb 2018 10:05:55 -0500 Received: from foss.arm.com ([217.140.101.70]:51826 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753101AbeBEPFp (ORCPT ); Mon, 5 Feb 2018 10:05:45 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 462A91435; Mon, 5 Feb 2018 07:05:45 -0800 (PST) Received: from localhost (e105922-lin.cambridge.arm.com [10.1.207.29]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DF2383F25C; Mon, 5 Feb 2018 07:05:44 -0800 (PST) From: Punit Agrawal To: Naoya Horiguchi Cc: linux-mm@kvack.org, Andrew Morton , Michal Hocko , Mike Kravetz , "Aneesh Kumar K.V" , Anshuman Khandual , linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] mm: hwpoison: disable memory error handling on 1GB hugepage References: <20180130013919.GA19959@hori1.linux.bs1.fc.nec.co.jp> <1517284444-18149-1-git-send-email-n-horiguchi@ah.jp.nec.com> Date: Mon, 05 Feb 2018 15:05:43 +0000 In-Reply-To: <1517284444-18149-1-git-send-email-n-horiguchi@ah.jp.nec.com> (Naoya Horiguchi's message of "Tue, 30 Jan 2018 12:54:04 +0900") Message-ID: <87inbbjx2w.fsf@e105922-lin.cambridge.arm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Naoya Horiguchi writes: > Recently the following BUG was reported: > > Injecting memory failure for pfn 0x3c0000 at process virtual address 0x7fe300000000 > Memory failure: 0x3c0000: recovery action for huge page: Recovered > BUG: unable to handle kernel paging request at ffff8dfcc0003000 > IP: gup_pgd_range+0x1f0/0xc20 > PGD 17ae72067 P4D 17ae72067 PUD 0 > Oops: 0000 [#1] SMP PTI > ... > CPU: 3 PID: 5467 Comm: hugetlb_1gb Not tainted 4.15.0-rc8-mm1-abc+ #3 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014 > > You can easily reproduce this by calling madvise(MADV_HWPOISON) twice on > a 1GB hugepage. This happens because get_user_pages_fast() is not aware > of a migration entry on pud that was created in the 1st madvise() event. Maybe I'm doing something wrong but I wasn't able to reproduce the issue using the test at the end. I get - $ sudo ./hugepage Poisoning page...once [ 121.295771] Injecting memory failure for pfn 0x8300000 at process virtual address 0x400000000000 [ 121.386450] Memory failure: 0x8300000: recovery action for huge page: Recovered Poisoning page...once again madvise: Bad address What am I missing? --------- >8 --------- #include #include #include int main(int argc, char *argv[]) { int flags = MAP_HUGETLB | MAP_ANONYMOUS | MAP_PRIVATE; int prot = PROT_READ | PROT_WRITE; size_t hugepage_sz; void *hugepage; int ret; hugepage_sz = 1024 * 1024 * 1024; /* 1GB */ hugepage = mmap(NULL, hugepage_sz, prot, flags, -1, 0); if (hugepage == MAP_FAILED) { perror("mmap"); return 1; } memset(hugepage, 'b', hugepage_sz); getchar(); printf("Poisoning page...once\n"); ret = madvise(hugepage, hugepage_sz, MADV_HWPOISON); if (ret) { perror("madvise"); return 1; } getchar(); printf("Poisoning page...once again\n"); ret = madvise(hugepage, hugepage_sz, MADV_HWPOISON); if (ret) { perror("madvise"); return 1; } getchar(); memset(hugepage, 'c', hugepage_sz); ret = munmap(hugepage, hugepage_sz); if (ret) { perror("munmap"); return 1; } return 0; }