2022-08-09 22:03:18

by Kallol Biswas

[permalink] [raw]
Subject: mm/memory-failure: __get_any_page: unknown zero refcount

While running a memory RAS test on a new platform I encountered the
following on the 5.10.117 kernel.

__get_any_page: 0x77df: unknown zero refcount page type 7fffe000000000

The address 0x77df000 is in a system ram area:
00100000-5c9c0017 : System RAM

The page is not a huge page, not on the free buddy list and not in use.

__get_any_page()
..................
if (!get_hwpoison_page(p)) {
if (PageHuge(p)) {
pr_info("%s: %#lx free huge page\n", __func__, pfn);
ret = 0;
} else if (is_free_buddy_page(p)) {
pr_info("%s: %#lx free buddy page\n", __func__, pfn);
ret = 0;
} else if (page_count(p)) {
/* raced with allocation */
ret = -EBUSY;
} else {
pr_info("%s: %#lx: unknown zero refcount page type %lx\n",
__func__, pfn, p->flags);


Sparse mem configs are set:
cat /boot/config-5.10.117-2.el7.nutanix.20220304.1002776.x86_64 | grep -i sparse
CONFIG_SPARSE_IRQ=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y

Can someone help understand why we have such a page in the system?
What the purpose is.

Thank you,
Kallol


2022-08-10 20:29:40

by Kallol Biswas

[permalink] [raw]
Subject: Re: mm/memory-failure: __get_any_page: unknown zero refcount

Probably we are hitting a race condition. Upstream code has changed.

On Tue, Aug 9, 2022 at 2:28 PM Kallol Biswas <[email protected]> wrote:
>
> While running a memory RAS test on a new platform I encountered the
> following on the 5.10.117 kernel.
>
> __get_any_page: 0x77df: unknown zero refcount page type 7fffe000000000
>
> The address 0x77df000 is in a system ram area:
> 00100000-5c9c0017 : System RAM
>
> The page is not a huge page, not on the free buddy list and not in use.
>
> __get_any_page()
> ..................
> if (!get_hwpoison_page(p)) {
> if (PageHuge(p)) {
> pr_info("%s: %#lx free huge page\n", __func__, pfn);
> ret = 0;
> } else if (is_free_buddy_page(p)) {
> pr_info("%s: %#lx free buddy page\n", __func__, pfn);
> ret = 0;
> } else if (page_count(p)) {
> /* raced with allocation */
> ret = -EBUSY;
> } else {
> pr_info("%s: %#lx: unknown zero refcount page type %lx\n",
> __func__, pfn, p->flags);
>
>
> Sparse mem configs are set:
> cat /boot/config-5.10.117-2.el7.nutanix.20220304.1002776.x86_64 | grep -i sparse
> CONFIG_SPARSE_IRQ=y
> CONFIG_ARCH_SPARSEMEM_ENABLE=y
> CONFIG_ARCH_SPARSEMEM_DEFAULT=y
> CONFIG_SPARSEMEM_MANUAL=y
> CONFIG_SPARSEMEM=y
> CONFIG_SPARSEMEM_EXTREME=y
> CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
> CONFIG_SPARSEMEM_VMEMMAP=y
> CONFIG_MEMORY_HOTPLUG_SPARSE=y
>
> Can someone help understand why we have such a page in the system?
> What the purpose is.
>
> Thank you,
> Kallol