2017-04-25 14:28:06

by Laurent Dufour

[permalink] [raw]
Subject: [PATCH v2 0/2] BUG raised when onlining HWPoisoned page

When a page is HWPoisoned and later offlined and onlined back, a BUG
warning is raised in the kernel:

BUG: Bad page state in process mem-on-off-test pfn:7ae3b
page:f000000001eb8ec0 count:0 mapcount:0 mapping: (null)
index:0x1
flags: 0x3ffff800200000(hwpoison)
raw: 003ffff800200000 0000000000000000 0000000000000001
00000000ffffffff
raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000
c0000007fe055800
page dumped because: page still charged to cgroup
page->mem_cgroup:c0000007fe055800
Modules linked in: pseries_rng rng_core vmx_crypto virtio_balloon
ip_tables x_tables autofs4 virtio_blk virtio_net virtio_pci
virtio_ring virtio
CPU: 34 PID: 5946 Comm: mem-on-off-test Tainted: G B 4.11.0-rc7-hwp
#1
Call Trace:
[c0000007e4a737f0] [c000000000958e8c] dump_stack+0xb0/0xf0
(unreliable)
[c0000007e4a73830] [c00000000021588c] bad_page+0x11c/0x190
[c0000007e4a738c0] [c00000000021757c] free_pcppages_bulk+0x46c/0x600
[c0000007e4a73990] [c00000000021924c] free_hot_cold_page+0x2ec/0x320
[c0000007e4a739e0] [c0000000002a6440] generic_online_page+0x50/0x70
[c0000007e4a73a10] [c0000000002a6184] online_pages_range+0x94/0xe0
[c0000007e4a73a70] [c00000000005a2b0] walk_system_ram_range+0xe0/0x120
[c0000007e4a73ac0] [c0000000002cce44] online_pages+0x2b4/0x6b0
[c0000007e4a73b60] [c000000000600558] memory_subsys_online+0x218/0x270
[c0000007e4a73bf0] [c0000000005dec84] device_online+0xb4/0x110
[c0000007e4a73c30] [c000000000600f00] store_mem_state+0xc0/0x190
[c0000007e4a73c70] [c0000000005da1d4] dev_attr_store+0x34/0x60
[c0000007e4a73c90] [c000000000377c70] sysfs_kf_write+0x60/0xa0
[c0000007e4a73cb0] [c0000000003769fc] kernfs_fop_write+0x16c/0x240
[c0000007e4a73d00] [c0000000002d1b0c] __vfs_write+0x3c/0x1b0
[c0000007e4a73d90] [c0000000002d34dc] vfs_write+0xcc/0x230
[c0000007e4a73de0] [c0000000002d50e0] SyS_write+0x60/0x110
[c0000007e4a73e30] [c00000000000b760] system_call+0x38/0xfc

This has been seen on x86 kvm guest, PowerPC bare metal system and KVM
guest.

The issue is that the onlined page has already the mem_cgroup field
set.

It seems that the mem_cgroup field should be cleared when the page is
poisoned, which is done in the first patch of this series.

Then when the page is onlined back, the BUG warning is no more
triggered, but the page is now available for use, and once a process
is using it, it got killed because of the memory error.
It seems that the page should be ignored when onlined, as it is when
it is offlined (introduced by commit b023f46813cd "memory-hotplug:
skip HWPoisoned page when offlining pages"). The second patch of this
series is skipping HWPoisoned page when the memory block is onlined
back.

Changes against V1:
- The first patch commit's description mentioned the BUG explicitly
- In the patch as requested by Naoya Horiguchi, clear the page's
reserved flag when the skipping the poisoned page.

Laurent Dufour (2):
mm: Uncharge poisoned pages
mm: skip HWPoisoned pages when onlining pages

mm/memory-failure.c | 3 +++
mm/memory_hotplug.c | 4 ++++
2 files changed, 7 insertions(+)

--
2.7.4


2017-04-25 14:28:18

by Laurent Dufour

[permalink] [raw]
Subject: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages

The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
offlining pages") skip the HWPoisoned pages when offlining pages, but
this should be skipped when onlining the pages too.

Signed-off-by: Laurent Dufour <[email protected]>
---
mm/memory_hotplug.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6fa7208bcd56..741ddb50e7d2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
if (PageReserved(pfn_to_page(start_pfn)))
for (i = 0; i < nr_pages; i++) {
page = pfn_to_page(start_pfn + i);
+ if (PageHWPoison(page)) {
+ ClearPageReserved(page);
+ continue;
+ }
(*online_page_callback)(page);
onlined_pages++;
}
--
2.7.4

2017-04-25 14:28:29

by Laurent Dufour

[permalink] [raw]
Subject: [PATCH v2 1/2] mm: Uncharge poisoned pages

When page are poisoned, they should be uncharged from the root memory
cgroup.

This is required to avoid a BUG raised when the page is onlined back:
BUG: Bad page state in process mem-on-off-test pfn:7ae3b
page:f000000001eb8ec0 count:0 mapcount:0 mapping: (null)
index:0x1
flags: 0x3ffff800200000(hwpoison)
raw: 003ffff800200000 0000000000000000 0000000000000001
00000000ffffffff
raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000
c0000007fe055800
page dumped because: page still charged to cgroup
page->mem_cgroup:c0000007fe055800
Modules linked in: pseries_rng rng_core vmx_crypto virtio_balloon
ip_tables x_tables autofs4 virtio_blk virtio_net virtio_pci
virtio_ring virtio
CPU: 34 PID: 5946 Comm: mem-on-off-test Tainted: G B 4.11.0-rc7-hwp
Call Trace:
[c0000007e4a737f0] [c000000000958e8c] dump_stack+0xb0/0xf0
(unreliable)
[c0000007e4a73830] [c00000000021588c] bad_page+0x11c/0x190
[c0000007e4a738c0] [c00000000021757c] free_pcppages_bulk+0x46c/0x600
[c0000007e4a73990] [c00000000021924c] free_hot_cold_page+0x2ec/0x320
[c0000007e4a739e0] [c0000000002a6440] generic_online_page+0x50/0x70
[c0000007e4a73a10] [c0000000002a6184] online_pages_range+0x94/0xe0
[c0000007e4a73a70] [c00000000005a2b0] walk_system_ram_range+0xe0/0x120
[c0000007e4a73ac0] [c0000000002cce44] online_pages+0x2b4/0x6b0
[c0000007e4a73b60] [c000000000600558] memory_subsys_online+0x218/0x270
[c0000007e4a73bf0] [c0000000005dec84] device_online+0xb4/0x110
[c0000007e4a73c30] [c000000000600f00] store_mem_state+0xc0/0x190
[c0000007e4a73c70] [c0000000005da1d4] dev_attr_store+0x34/0x60
[c0000007e4a73c90] [c000000000377c70] sysfs_kf_write+0x60/0xa0
[c0000007e4a73cb0] [c0000000003769fc] kernfs_fop_write+0x16c/0x240
[c0000007e4a73d00] [c0000000002d1b0c] __vfs_write+0x3c/0x1b0
[c0000007e4a73d90] [c0000000002d34dc] vfs_write+0xcc/0x230
[c0000007e4a73de0] [c0000000002d50e0] SyS_write+0x60/0x110
[c0000007e4a73e30] [c00000000000b760] system_call+0x38/0xfc

Signed-off-by: Laurent Dufour <[email protected]>
---
mm/memory-failure.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 27f7210e7fab..22bd22eb25cb 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -529,6 +529,9 @@ static const char * const action_page_types[] = {
*/
static int delete_from_lru_cache(struct page *p)
{
+ if (memcg_kmem_enabled())
+ memcg_kmem_uncharge(p, 0);
+
if (!isolate_lru_page(p)) {
/*
* Clear sensible page flags, so that the buddy system won't
--
2.7.4

2017-04-25 23:49:49

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: Uncharge poisoned pages

On Tue, Apr 25, 2017 at 04:27:51PM +0200, Laurent Dufour wrote:
> When page are poisoned, they should be uncharged from the root memory
> cgroup.
>
> This is required to avoid a BUG raised when the page is onlined back:
> BUG: Bad page state in process mem-on-off-test pfn:7ae3b
> page:f000000001eb8ec0 count:0 mapcount:0 mapping: (null)
> index:0x1
> flags: 0x3ffff800200000(hwpoison)
> raw: 003ffff800200000 0000000000000000 0000000000000001
> 00000000ffffffff
> raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000
> c0000007fe055800
> page dumped because: page still charged to cgroup
> page->mem_cgroup:c0000007fe055800
> Modules linked in: pseries_rng rng_core vmx_crypto virtio_balloon
> ip_tables x_tables autofs4 virtio_blk virtio_net virtio_pci
> virtio_ring virtio
> CPU: 34 PID: 5946 Comm: mem-on-off-test Tainted: G B 4.11.0-rc7-hwp
> Call Trace:
> [c0000007e4a737f0] [c000000000958e8c] dump_stack+0xb0/0xf0
> (unreliable)
> [c0000007e4a73830] [c00000000021588c] bad_page+0x11c/0x190
> [c0000007e4a738c0] [c00000000021757c] free_pcppages_bulk+0x46c/0x600
> [c0000007e4a73990] [c00000000021924c] free_hot_cold_page+0x2ec/0x320
> [c0000007e4a739e0] [c0000000002a6440] generic_online_page+0x50/0x70
> [c0000007e4a73a10] [c0000000002a6184] online_pages_range+0x94/0xe0
> [c0000007e4a73a70] [c00000000005a2b0] walk_system_ram_range+0xe0/0x120
> [c0000007e4a73ac0] [c0000000002cce44] online_pages+0x2b4/0x6b0
> [c0000007e4a73b60] [c000000000600558] memory_subsys_online+0x218/0x270
> [c0000007e4a73bf0] [c0000000005dec84] device_online+0xb4/0x110
> [c0000007e4a73c30] [c000000000600f00] store_mem_state+0xc0/0x190
> [c0000007e4a73c70] [c0000000005da1d4] dev_attr_store+0x34/0x60
> [c0000007e4a73c90] [c000000000377c70] sysfs_kf_write+0x60/0xa0
> [c0000007e4a73cb0] [c0000000003769fc] kernfs_fop_write+0x16c/0x240
> [c0000007e4a73d00] [c0000000002d1b0c] __vfs_write+0x3c/0x1b0
> [c0000007e4a73d90] [c0000000002d34dc] vfs_write+0xcc/0x230
> [c0000007e4a73de0] [c0000000002d50e0] SyS_write+0x60/0x110
> [c0000007e4a73e30] [c00000000000b760] system_call+0x38/0xfc
>
> Signed-off-by: Laurent Dufour <[email protected]>

Acked-by: Naoya Horiguchi <[email protected]>

> ---
> mm/memory-failure.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 27f7210e7fab..22bd22eb25cb 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -529,6 +529,9 @@ static const char * const action_page_types[] = {
> */
> static int delete_from_lru_cache(struct page *p)
> {
> + if (memcg_kmem_enabled())
> + memcg_kmem_uncharge(p, 0);
> +
> if (!isolate_lru_page(p)) {
> /*
> * Clear sensible page flags, so that the buddy system won't
> --
> 2.7.4
>
>

2017-04-26 01:55:14

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: Uncharge poisoned pages

On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> When page are poisoned, they should be uncharged from the root memory
> cgroup.
>
> This is required to avoid a BUG raised when the page is onlined back:
> BUG: Bad page state in process mem-on-off-test pfn:7ae3b
> page:f000000001eb8ec0 count:0 mapcount:0 mapping: (null)
> index:0x1
> flags: 0x3ffff800200000(hwpoison)
> raw: 003ffff800200000 0000000000000000 0000000000000001
> 00000000ffffffff
> raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000
> c0000007fe055800
> page dumped because: page still charged to cgroup
> page->mem_cgroup:c0000007fe055800
> Modules linked in: pseries_rng rng_core vmx_crypto virtio_balloon
> ip_tables x_tables autofs4 virtio_blk virtio_net virtio_pci
> virtio_ring virtio
> CPU: 34 PID: 5946 Comm: mem-on-off-test Tainted: G B 4.11.0-rc7-hwp
> Call Trace:
> [c0000007e4a737f0] [c000000000958e8c] dump_stack+0xb0/0xf0
> (unreliable)
> [c0000007e4a73830] [c00000000021588c] bad_page+0x11c/0x190
> [c0000007e4a738c0] [c00000000021757c] free_pcppages_bulk+0x46c/0x600
> [c0000007e4a73990] [c00000000021924c] free_hot_cold_page+0x2ec/0x320
> [c0000007e4a739e0] [c0000000002a6440] generic_online_page+0x50/0x70
> [c0000007e4a73a10] [c0000000002a6184] online_pages_range+0x94/0xe0
> [c0000007e4a73a70] [c00000000005a2b0] walk_system_ram_range+0xe0/0x120
> [c0000007e4a73ac0] [c0000000002cce44] online_pages+0x2b4/0x6b0
> [c0000007e4a73b60] [c000000000600558] memory_subsys_online+0x218/0x270
> [c0000007e4a73bf0] [c0000000005dec84] device_online+0xb4/0x110
> [c0000007e4a73c30] [c000000000600f00] store_mem_state+0xc0/0x190
> [c0000007e4a73c70] [c0000000005da1d4] dev_attr_store+0x34/0x60
> [c0000007e4a73c90] [c000000000377c70] sysfs_kf_write+0x60/0xa0
> [c0000007e4a73cb0] [c0000000003769fc] kernfs_fop_write+0x16c/0x240
> [c0000007e4a73d00] [c0000000002d1b0c] __vfs_write+0x3c/0x1b0
> [c0000007e4a73d90] [c0000000002d34dc] vfs_write+0xcc/0x230
> [c0000007e4a73de0] [c0000000002d50e0] SyS_write+0x60/0x110
> [c0000007e4a73e30] [c00000000000b760] system_call+0x38/0xfc
>
> Signed-off-by: Laurent Dufour <[email protected]>
> ---
> mm/memory-failure.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 27f7210e7fab..22bd22eb25cb 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -529,6 +529,9 @@ static const char * const action_page_types[] = {
> */
> static int delete_from_lru_cache(struct page *p)
> {
> + if (memcg_kmem_enabled())
> + memcg_kmem_uncharge(p, 0);
> +

The changelog is not quite clear, so we are uncharging a page using
memcg_kmem_uncharge for a page in swap cache/page cache?

Balbir Singh.

2017-04-26 02:10:31

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages

On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> offlining pages") skip the HWPoisoned pages when offlining pages, but
> this should be skipped when onlining the pages too.
>
> Signed-off-by: Laurent Dufour <[email protected]>
> ---
> mm/memory_hotplug.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 6fa7208bcd56..741ddb50e7d2 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> if (PageReserved(pfn_to_page(start_pfn)))
> for (i = 0; i < nr_pages; i++) {
> page = pfn_to_page(start_pfn + i);
> + if (PageHWPoison(page)) {
> + ClearPageReserved(page);

Why do we clear page reserved? Also if the page is marked PageHWPoison, it
was never offlined to begin with? Or do you expect this to be set on newly
hotplugged memory? Also don't we need to skip the entire pageblock?

Balbir Singh.

2017-04-26 03:09:47

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: Uncharge poisoned pages

On Wed, Apr 26, 2017 at 11:54:58AM +1000, Balbir Singh wrote:
> On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > When page are poisoned, they should be uncharged from the root memory
> > cgroup.
> >
> > This is required to avoid a BUG raised when the page is onlined back:
> > BUG: Bad page state in process mem-on-off-test pfn:7ae3b
> > page:f000000001eb8ec0 count:0 mapcount:0 mapping: (null)
> > index:0x1
> > flags: 0x3ffff800200000(hwpoison)
> > raw: 003ffff800200000 0000000000000000 0000000000000001
> > 00000000ffffffff
> > raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000
> > c0000007fe055800
> > page dumped because: page still charged to cgroup
> > page->mem_cgroup:c0000007fe055800
> > Modules linked in: pseries_rng rng_core vmx_crypto virtio_balloon
> > ip_tables x_tables autofs4 virtio_blk virtio_net virtio_pci
> > virtio_ring virtio
> > CPU: 34 PID: 5946 Comm: mem-on-off-test Tainted: G B 4.11.0-rc7-hwp
> > Call Trace:
> > [c0000007e4a737f0] [c000000000958e8c] dump_stack+0xb0/0xf0
> > (unreliable)
> > [c0000007e4a73830] [c00000000021588c] bad_page+0x11c/0x190
> > [c0000007e4a738c0] [c00000000021757c] free_pcppages_bulk+0x46c/0x600
> > [c0000007e4a73990] [c00000000021924c] free_hot_cold_page+0x2ec/0x320
> > [c0000007e4a739e0] [c0000000002a6440] generic_online_page+0x50/0x70
> > [c0000007e4a73a10] [c0000000002a6184] online_pages_range+0x94/0xe0
> > [c0000007e4a73a70] [c00000000005a2b0] walk_system_ram_range+0xe0/0x120
> > [c0000007e4a73ac0] [c0000000002cce44] online_pages+0x2b4/0x6b0
> > [c0000007e4a73b60] [c000000000600558] memory_subsys_online+0x218/0x270
> > [c0000007e4a73bf0] [c0000000005dec84] device_online+0xb4/0x110
> > [c0000007e4a73c30] [c000000000600f00] store_mem_state+0xc0/0x190
> > [c0000007e4a73c70] [c0000000005da1d4] dev_attr_store+0x34/0x60
> > [c0000007e4a73c90] [c000000000377c70] sysfs_kf_write+0x60/0xa0
> > [c0000007e4a73cb0] [c0000000003769fc] kernfs_fop_write+0x16c/0x240
> > [c0000007e4a73d00] [c0000000002d1b0c] __vfs_write+0x3c/0x1b0
> > [c0000007e4a73d90] [c0000000002d34dc] vfs_write+0xcc/0x230
> > [c0000007e4a73de0] [c0000000002d50e0] SyS_write+0x60/0x110
> > [c0000007e4a73e30] [c00000000000b760] system_call+0x38/0xfc
> >
> > Signed-off-by: Laurent Dufour <[email protected]>
> > ---
> > mm/memory-failure.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 27f7210e7fab..22bd22eb25cb 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -529,6 +529,9 @@ static const char * const action_page_types[] = {
> > */
> > static int delete_from_lru_cache(struct page *p)
> > {
> > + if (memcg_kmem_enabled())
> > + memcg_kmem_uncharge(p, 0);
> > +
>
> The changelog is not quite clear, so we are uncharging a page using
> memcg_kmem_uncharge for a page in swap cache/page cache?

Hi Balbir,

Yes, in the normal page lifecycle, uncharge is done in page free time.
But in memory error handling case, in-use pages (i.e. swap cache and page
cache) are removed from normal path and they don't pass page freeing code.
So I think that this change is to keep the consistent charging for such a case.

- Naoya Horiguchi

2017-04-26 03:19:51

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages

On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > this should be skipped when onlining the pages too.
> >
> > Signed-off-by: Laurent Dufour <[email protected]>
> > ---
> > mm/memory_hotplug.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 6fa7208bcd56..741ddb50e7d2 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > if (PageReserved(pfn_to_page(start_pfn)))
> > for (i = 0; i < nr_pages; i++) {
> > page = pfn_to_page(start_pfn + i);
> > + if (PageHWPoison(page)) {
> > + ClearPageReserved(page);
>
> Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> was never offlined to begin with? Or do you expect this to be set on newly
> hotplugged memory? Also don't we need to skip the entire pageblock?

If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
that we skip the page status check for hwpoisoned pages *not* to prevent
memory offlining for memblocks with hwpoisoned pages. That means that
hwpoisoned pages can be offlined.

And another reason to clear PageReserved is that we could reuse the
hwpoisoned page after onlining back with replacing the broken DIMM.
In this usecase, we first do unpoisoning to clear PageHWPoison,
but it doesn't work if PageReserved is set. My simple testing shows
the BUG below in unpoisoning (without the ClearPageReserved):

Unpoison: Software-unpoisoned page 0x18000
BUG: Bad page state in process page-types pfn:18000
page:ffffda5440600000 count:0 mapcount:0 mapping: (null) index:0x70006b599
flags: 0x1fffc00004081a(error|uptodate|dirty|reserved|swapbacked)
raw: 001fffc00004081a 0000000000000000 000000070006b599 00000000ffffffff
raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags: 0x800(reserved)

Thanks,
Naoya Horiguchi

2017-04-26 03:45:17

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: Uncharge poisoned pages

> > > static int delete_from_lru_cache(struct page *p)
> > > {
> > > + if (memcg_kmem_enabled())
> > > + memcg_kmem_uncharge(p, 0);
> > > +
> >
> > The changelog is not quite clear, so we are uncharging a page using
> > memcg_kmem_uncharge for a page in swap cache/page cache?
>
> Hi Balbir,
>
> Yes, in the normal page lifecycle, uncharge is done in page free time.
> But in memory error handling case, in-use pages (i.e. swap cache and page
> cache) are removed from normal path and they don't pass page freeing code.
> So I think that this change is to keep the consistent charging for such a case.

I agree we should uncharge, but looking at the API name, it seems to
be for kmem pages, why are we not using mem_cgroup_uncharge()? Am I missing
something?

Balbir Singh.

2017-04-26 04:47:30

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: Uncharge poisoned pages

On Wed, Apr 26, 2017 at 01:45:00PM +1000, Balbir Singh wrote:
> > > > static int delete_from_lru_cache(struct page *p)
> > > > {
> > > > + if (memcg_kmem_enabled())
> > > > + memcg_kmem_uncharge(p, 0);
> > > > +
> > >
> > > The changelog is not quite clear, so we are uncharging a page using
> > > memcg_kmem_uncharge for a page in swap cache/page cache?
> >
> > Hi Balbir,
> >
> > Yes, in the normal page lifecycle, uncharge is done in page free time.
> > But in memory error handling case, in-use pages (i.e. swap cache and page
> > cache) are removed from normal path and they don't pass page freeing code.
> > So I think that this change is to keep the consistent charging for such a case.
>
> I agree we should uncharge, but looking at the API name, it seems to
> be for kmem pages, why are we not using mem_cgroup_uncharge()? Am I missing
> something?

Thank you for pointing out.
Actually I had the same question and this surely looks strange.
But simply calling mem_cgroup_uncharge() here doesn't work because it
assumes that page_refcount(p) == 0, which is not true in hwpoison context.
We need some other clearer way or at least some justifying comment about
why this is ok.

- Naoya

2017-04-26 09:00:49

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: Uncharge poisoned pages

On Wed, 2017-04-26 at 04:46 +0000, Naoya Horiguchi wrote:
> On Wed, Apr 26, 2017 at 01:45:00PM +1000, Balbir Singh wrote:
> > > > > static int delete_from_lru_cache(struct page *p)
> > > > > {
> > > > > + if (memcg_kmem_enabled())
> > > > > + memcg_kmem_uncharge(p, 0);
> > > > > +
> > > >
> > > > The changelog is not quite clear, so we are uncharging a page using
> > > > memcg_kmem_uncharge for a page in swap cache/page cache?
> > >
> > > Hi Balbir,
> > >
> > > Yes, in the normal page lifecycle, uncharge is done in page free time.
> > > But in memory error handling case, in-use pages (i.e. swap cache and page
> > > cache) are removed from normal path and they don't pass page freeing code.
> > > So I think that this change is to keep the consistent charging for such a case.
> >
> > I agree we should uncharge, but looking at the API name, it seems to
> > be for kmem pages, why are we not using mem_cgroup_uncharge()? Am I missing
> > something?
>
> Thank you for pointing out.
> Actually I had the same question and this surely looks strange.
> But simply calling mem_cgroup_uncharge() here doesn't work because it
> assumes that page_refcount(p) == 0, which is not true in hwpoison context.
> We need some other clearer way or at least some justifying comment about
> why this is ok.
>

We should call mem_cgroup_uncharge() after isolate_lru_page()/put_page().
We could check if page_count() is 0 or force if required (!MF_RECOVERED &&
!MF_DELAYED). We could even skip the VM_BUG_ON if the page is poisoned.

Balbir Singh.

2017-04-27 14:37:30

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: Uncharge poisoned pages

On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
> When page are poisoned, they should be uncharged from the root memory
> cgroup.
>
> This is required to avoid a BUG raised when the page is onlined back:
> BUG: Bad page state in process mem-on-off-test pfn:7ae3b
> page:f000000001eb8ec0 count:0 mapcount:0 mapping: (null)
> index:0x1
> flags: 0x3ffff800200000(hwpoison)

My knowledge of memory poisoning is very rudimentary but aren't those
pages supposed to leak and never come back? In other words isn't the
hoplug code broken because it should leave them alone?
--
Michal Hocko
SUSE Labs

2017-04-27 20:51:34

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: Uncharge poisoned pages

Michal Hocko <[email protected]> writes:

> On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
>> When page are poisoned, they should be uncharged from the root memory
>> cgroup.
>>
>> This is required to avoid a BUG raised when the page is onlined back:
>> BUG: Bad page state in process mem-on-off-test pfn:7ae3b
>> page:f000000001eb8ec0 count:0 mapcount:0 mapping: (null)
>> index:0x1
>> flags: 0x3ffff800200000(hwpoison)
>
> My knowledge of memory poisoning is very rudimentary but aren't those
> pages supposed to leak and never come back? In other words isn't the
> hoplug code broken because it should leave them alone?

Yes that would be the right interpretation. If it was really offlined
due to a hardware error the memory will be poisoned and any access
could cause a machine check.

hwpoison has an own "unpoison" option (only used for debugging), which
I think handles this.

-Andi

2017-04-28 02:51:52

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages

On Wed, 2017-04-26 at 03:13 +0000, Naoya Horiguchi wrote:
> On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > > this should be skipped when onlining the pages too.
> > >
> > > Signed-off-by: Laurent Dufour <[email protected]>
> > > ---
> > > mm/memory_hotplug.c | 4 ++++
> > > 1 file changed, 4 insertions(+)
> > >
> > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > index 6fa7208bcd56..741ddb50e7d2 100644
> > > --- a/mm/memory_hotplug.c
> > > +++ b/mm/memory_hotplug.c
> > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > > if (PageReserved(pfn_to_page(start_pfn)))
> > > for (i = 0; i < nr_pages; i++) {
> > > page = pfn_to_page(start_pfn + i);
> > > + if (PageHWPoison(page)) {
> > > + ClearPageReserved(page);
> >
> > Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> > was never offlined to begin with? Or do you expect this to be set on newly
> > hotplugged memory? Also don't we need to skip the entire pageblock?
>
> If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
> that we skip the page status check for hwpoisoned pages *not* to prevent
> memory offlining for memblocks with hwpoisoned pages. That means that
> hwpoisoned pages can be offlined.
>
> And another reason to clear PageReserved is that we could reuse the
> hwpoisoned page after onlining back with replacing the broken DIMM.
> In this usecase, we first do unpoisoning to clear PageHWPoison,
> but it doesn't work if PageReserved is set. My simple testing shows
> the BUG below in unpoisoning (without the ClearPageReserved):
>

Fair enough, thanks for the explanation

Balbir Singh.

2017-04-28 06:08:10

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: Uncharge poisoned pages

On Thu 27-04-17 13:51:23, Andi Kleen wrote:
> Michal Hocko <[email protected]> writes:
>
> > On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
> >> When page are poisoned, they should be uncharged from the root memory
> >> cgroup.
> >>
> >> This is required to avoid a BUG raised when the page is onlined back:
> >> BUG: Bad page state in process mem-on-off-test pfn:7ae3b
> >> page:f000000001eb8ec0 count:0 mapcount:0 mapping: (null)
> >> index:0x1
> >> flags: 0x3ffff800200000(hwpoison)
> >
> > My knowledge of memory poisoning is very rudimentary but aren't those
> > pages supposed to leak and never come back? In other words isn't the
> > hoplug code broken because it should leave them alone?
>
> Yes that would be the right interpretation. If it was really offlined
> due to a hardware error the memory will be poisoned and any access
> could cause a machine check.

OK, thanks for the clarification. Then I am not sure the patch is
correct. Why do we need to uncharge that page at all? It is not freed.
The correct thing to do is to not online it in the first place which is
done in patch2 [1]. Even if we need to uncharge the page the reason is
not to silent the BUG, that is merely papering a issue than a real fix.
Laurent can you elaborate please.

[1] http://lkml.kernel.org/r/[email protected]
--
Michal Hocko
SUSE Labs

2017-04-28 06:30:57

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages

On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote:
> On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > > this should be skipped when onlining the pages too.
> > >
> > > Signed-off-by: Laurent Dufour <[email protected]>
> > > ---
> > > mm/memory_hotplug.c | 4 ++++
> > > 1 file changed, 4 insertions(+)
> > >
> > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > index 6fa7208bcd56..741ddb50e7d2 100644
> > > --- a/mm/memory_hotplug.c
> > > +++ b/mm/memory_hotplug.c
> > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > > if (PageReserved(pfn_to_page(start_pfn)))
> > > for (i = 0; i < nr_pages; i++) {
> > > page = pfn_to_page(start_pfn + i);
> > > + if (PageHWPoison(page)) {
> > > + ClearPageReserved(page);
> >
> > Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> > was never offlined to begin with? Or do you expect this to be set on newly
> > hotplugged memory? Also don't we need to skip the entire pageblock?
>
> If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
> that we skip the page status check for hwpoisoned pages *not* to prevent
> memory offlining for memblocks with hwpoisoned pages. That means that
> hwpoisoned pages can be offlined.

Is this patch actually correct? I am trying to wrap my head around it
but it smells like it tries to avoid the problem rather than fix it
properly. I might be wrong here of course but to me it sounds like
poisoned page should simply be offlined and keep its poison state all
the time. If the memory is hot-removed and added again we have lost the
struct page along with the state which is the expected behavior. If it
is still broken we will re-poison it.

Anyway a patch to skip over poisoned pages during online makes perfect
sense to me. The PageReserved fiddling around much less so.

Or am I missing something. Let's CC Wen Congyang for the clarification
here.
--
Michal Hocko
SUSE Labs

2017-04-28 06:51:04

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages

[Drop Wen Congyang because his address bounces - we will have to find
out ourselves...]

On Fri 28-04-17 08:30:48, Michal Hocko wrote:
> On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote:
> > On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> > > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > > > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > > > this should be skipped when onlining the pages too.
> > > >
> > > > Signed-off-by: Laurent Dufour <[email protected]>
> > > > ---
> > > > mm/memory_hotplug.c | 4 ++++
> > > > 1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > > index 6fa7208bcd56..741ddb50e7d2 100644
> > > > --- a/mm/memory_hotplug.c
> > > > +++ b/mm/memory_hotplug.c
> > > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > > > if (PageReserved(pfn_to_page(start_pfn)))
> > > > for (i = 0; i < nr_pages; i++) {
> > > > page = pfn_to_page(start_pfn + i);
> > > > + if (PageHWPoison(page)) {
> > > > + ClearPageReserved(page);
> > >
> > > Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> > > was never offlined to begin with? Or do you expect this to be set on newly
> > > hotplugged memory? Also don't we need to skip the entire pageblock?
> >
> > If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
> > that we skip the page status check for hwpoisoned pages *not* to prevent
> > memory offlining for memblocks with hwpoisoned pages. That means that
> > hwpoisoned pages can be offlined.
>
> Is this patch actually correct? I am trying to wrap my head around it
> but it smells like it tries to avoid the problem rather than fix it
> properly. I might be wrong here of course but to me it sounds like
> poisoned page should simply be offlined and keep its poison state all
> the time. If the memory is hot-removed and added again we have lost the
> struct page along with the state which is the expected behavior. If it
> is still broken we will re-poison it.
>
> Anyway a patch to skip over poisoned pages during online makes perfect
> sense to me. The PageReserved fiddling around much less so.
>
> Or am I missing something. Let's CC Wen Congyang for the clarification
> here.

--
Michal Hocko
SUSE Labs

2017-04-28 06:51:44

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages

On Fri 28-04-17 08:50:50, Michal Hocko wrote:
> [Drop Wen Congyang because his address bounces - we will have to find
> out ourselves...]

Ble, for real this time. Sorry about spamming

> On Fri 28-04-17 08:30:48, Michal Hocko wrote:
> > On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote:
> > > On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> > > > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > > > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > > > > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > > > > this should be skipped when onlining the pages too.
> > > > >
> > > > > Signed-off-by: Laurent Dufour <[email protected]>
> > > > > ---
> > > > > mm/memory_hotplug.c | 4 ++++
> > > > > 1 file changed, 4 insertions(+)
> > > > >
> > > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > > > index 6fa7208bcd56..741ddb50e7d2 100644
> > > > > --- a/mm/memory_hotplug.c
> > > > > +++ b/mm/memory_hotplug.c
> > > > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > > > > if (PageReserved(pfn_to_page(start_pfn)))
> > > > > for (i = 0; i < nr_pages; i++) {
> > > > > page = pfn_to_page(start_pfn + i);
> > > > > + if (PageHWPoison(page)) {
> > > > > + ClearPageReserved(page);
> > > >
> > > > Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> > > > was never offlined to begin with? Or do you expect this to be set on newly
> > > > hotplugged memory? Also don't we need to skip the entire pageblock?
> > >
> > > If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
> > > that we skip the page status check for hwpoisoned pages *not* to prevent
> > > memory offlining for memblocks with hwpoisoned pages. That means that
> > > hwpoisoned pages can be offlined.
> >
> > Is this patch actually correct? I am trying to wrap my head around it
> > but it smells like it tries to avoid the problem rather than fix it
> > properly. I might be wrong here of course but to me it sounds like
> > poisoned page should simply be offlined and keep its poison state all
> > the time. If the memory is hot-removed and added again we have lost the
> > struct page along with the state which is the expected behavior. If it
> > is still broken we will re-poison it.
> >
> > Anyway a patch to skip over poisoned pages during online makes perfect
> > sense to me. The PageReserved fiddling around much less so.
> >
> > Or am I missing something. Let's CC Wen Congyang for the clarification
> > here.
>
> --
> Michal Hocko
> SUSE Labs

--
Michal Hocko
SUSE Labs

2017-04-28 07:31:50

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: Uncharge poisoned pages

[CC Johannes and Vladimir - the patch is
http://lkml.kernel.org/r/[email protected]]

On Fri 28-04-17 08:07:55, Michal Hocko wrote:
> On Thu 27-04-17 13:51:23, Andi Kleen wrote:
> > Michal Hocko <[email protected]> writes:
> >
> > > On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
> > >> When page are poisoned, they should be uncharged from the root memory
> > >> cgroup.
> > >>
> > >> This is required to avoid a BUG raised when the page is onlined back:
> > >> BUG: Bad page state in process mem-on-off-test pfn:7ae3b
> > >> page:f000000001eb8ec0 count:0 mapcount:0 mapping: (null)
> > >> index:0x1
> > >> flags: 0x3ffff800200000(hwpoison)
> > >
> > > My knowledge of memory poisoning is very rudimentary but aren't those
> > > pages supposed to leak and never come back? In other words isn't the
> > > hoplug code broken because it should leave them alone?
> >
> > Yes that would be the right interpretation. If it was really offlined
> > due to a hardware error the memory will be poisoned and any access
> > could cause a machine check.
>
> OK, thanks for the clarification. Then I am not sure the patch is
> correct. Why do we need to uncharge that page at all?

Now, I have realized that we actually want to uncharge that page because
it will pin the memcg and we do not want to have that memcg and its
whole hierarchy pinned as well. This used to work before the charge
rework 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") I guess
because we used to uncharge on page cache removal.

I do not think the patch is correct, though. memcg_kmem_enabled() will
check whether kmem accounting is enabled and we are talking about page
cache pages here. You should be using mem_cgroup_uncharge instead.
--
Michal Hocko
SUSE Labs

2017-04-28 09:18:32

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: Uncharge poisoned pages

On 28/04/2017 09:31, Michal Hocko wrote:
> [CC Johannes and Vladimir - the patch is
> http://lkml.kernel.org/r/[email protected]]
>
> On Fri 28-04-17 08:07:55, Michal Hocko wrote:
>> On Thu 27-04-17 13:51:23, Andi Kleen wrote:
>>> Michal Hocko <[email protected]> writes:
>>>
>>>> On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
>>>>> When page are poisoned, they should be uncharged from the root memory
>>>>> cgroup.
>>>>>
>>>>> This is required to avoid a BUG raised when the page is onlined back:
>>>>> BUG: Bad page state in process mem-on-off-test pfn:7ae3b
>>>>> page:f000000001eb8ec0 count:0 mapcount:0 mapping: (null)
>>>>> index:0x1
>>>>> flags: 0x3ffff800200000(hwpoison)
>>>>
>>>> My knowledge of memory poisoning is very rudimentary but aren't those
>>>> pages supposed to leak and never come back? In other words isn't the
>>>> hoplug code broken because it should leave them alone?
>>>
>>> Yes that would be the right interpretation. If it was really offlined
>>> due to a hardware error the memory will be poisoned and any access
>>> could cause a machine check.
>>
>> OK, thanks for the clarification. Then I am not sure the patch is
>> correct. Why do we need to uncharge that page at all?
>
> Now, I have realized that we actually want to uncharge that page because
> it will pin the memcg and we do not want to have that memcg and its
> whole hierarchy pinned as well. This used to work before the charge
> rework 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") I guess
> because we used to uncharge on page cache removal.
>
> I do not think the patch is correct, though. memcg_kmem_enabled() will
> check whether kmem accounting is enabled and we are talking about page
> cache pages here. You should be using mem_cgroup_uncharge instead.

Thanks for the review Michal.

I was not comfortable either with this patch.

I did some tests calling mem_cgroup_uncharge() when isolate_lru_page()
succeeds only, so not calling it if isolate_lru_page() failed.

This seems to work as well, so if everyone agree on that, I'll send a
new version soon.

Cheers,
Laurent.

2017-04-28 09:32:55

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: Uncharge poisoned pages

On 26/04/2017 10:59, Balbir Singh wrote:
> On Wed, 2017-04-26 at 04:46 +0000, Naoya Horiguchi wrote:
>> On Wed, Apr 26, 2017 at 01:45:00PM +1000, Balbir Singh wrote:
>>>>>> static int delete_from_lru_cache(struct page *p)
>>>>>> {
>>>>>> + if (memcg_kmem_enabled())
>>>>>> + memcg_kmem_uncharge(p, 0);
>>>>>> +
>>>>>
>>>>> The changelog is not quite clear, so we are uncharging a page using
>>>>> memcg_kmem_uncharge for a page in swap cache/page cache?
>>>>
>>>> Hi Balbir,
>>>>
>>>> Yes, in the normal page lifecycle, uncharge is done in page free time.
>>>> But in memory error handling case, in-use pages (i.e. swap cache and page
>>>> cache) are removed from normal path and they don't pass page freeing code.
>>>> So I think that this change is to keep the consistent charging for such a case.
>>>
>>> I agree we should uncharge, but looking at the API name, it seems to
>>> be for kmem pages, why are we not using mem_cgroup_uncharge()? Am I missing
>>> something?
>>
>> Thank you for pointing out.
>> Actually I had the same question and this surely looks strange.
>> But simply calling mem_cgroup_uncharge() here doesn't work because it
>> assumes that page_refcount(p) == 0, which is not true in hwpoison context.
>> We need some other clearer way or at least some justifying comment about
>> why this is ok.
>>
>
> We should call mem_cgroup_uncharge() after isolate_lru_page()/put_page().

Thanks for the review Naoya and Balbir,

I changed the patch to call mem_cgroup_uncharge() once
isolate_lru_page() succeeded, but before calling put_page().
It seems to work fine.

> We could check if page_count() is 0 or force if required (!MF_RECOVERED &&
> !MF_DELAYED). We could even skip the VM_BUG_ON if the page is poisoned.

This doesn't seem to be needed. Am I still missing something here ?

Cheers,
Laurent.

2017-04-28 13:48:44

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] mm: Uncharge poisoned pages

On Fri 28-04-17 11:17:34, Laurent Dufour wrote:
> On 28/04/2017 09:31, Michal Hocko wrote:
> > [CC Johannes and Vladimir - the patch is
> > http://lkml.kernel.org/r/[email protected]]
> >
> > On Fri 28-04-17 08:07:55, Michal Hocko wrote:
> >> On Thu 27-04-17 13:51:23, Andi Kleen wrote:
> >>> Michal Hocko <[email protected]> writes:
> >>>
> >>>> On Tue 25-04-17 16:27:51, Laurent Dufour wrote:
> >>>>> When page are poisoned, they should be uncharged from the root memory
> >>>>> cgroup.
> >>>>>
> >>>>> This is required to avoid a BUG raised when the page is onlined back:
> >>>>> BUG: Bad page state in process mem-on-off-test pfn:7ae3b
> >>>>> page:f000000001eb8ec0 count:0 mapcount:0 mapping: (null)
> >>>>> index:0x1
> >>>>> flags: 0x3ffff800200000(hwpoison)
> >>>>
> >>>> My knowledge of memory poisoning is very rudimentary but aren't those
> >>>> pages supposed to leak and never come back? In other words isn't the
> >>>> hoplug code broken because it should leave them alone?
> >>>
> >>> Yes that would be the right interpretation. If it was really offlined
> >>> due to a hardware error the memory will be poisoned and any access
> >>> could cause a machine check.
> >>
> >> OK, thanks for the clarification. Then I am not sure the patch is
> >> correct. Why do we need to uncharge that page at all?
> >
> > Now, I have realized that we actually want to uncharge that page because
> > it will pin the memcg and we do not want to have that memcg and its
> > whole hierarchy pinned as well. This used to work before the charge
> > rework 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") I guess
> > because we used to uncharge on page cache removal.
> >
> > I do not think the patch is correct, though. memcg_kmem_enabled() will
> > check whether kmem accounting is enabled and we are talking about page
> > cache pages here. You should be using mem_cgroup_uncharge instead.
>
> Thanks for the review Michal.
>
> I was not comfortable either with this patch.
>
> I did some tests calling mem_cgroup_uncharge() when isolate_lru_page()
> succeeds only, so not calling it if isolate_lru_page() failed.

Wait a moment. This cannot possibly work. isolate_lru_page asumes page
count > 0 and increments the counter so the resulting page count is > 1
I have only now realized that we have VM_BUG_ON_PAGE(page_count(page), page)
in uncharge_list().

This is getting quite hairy. What is the expected page count of the
hwpoison page? I guess we would need to update the VM_BUG_ON in the
memcg uncharge code to ignore the page count of hwpoison pages if it can
be arbitrary.

Before we go any further, is there any documentation about the expected
behavior and the state of the hwpoison pages? I have a very bad feeling
that the current behavior is quite arbitrary and "testing driven"
holes plugging will make it only more messy. So let's start with the
clear description of what should happen with the hwpoison pages.
--
Michal Hocko
SUSE Labs

2018-01-17 23:04:42

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages

On Fri, 28 Apr 2017 08:30:48 +0200 Michal Hocko <[email protected]> wrote:

> On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote:
> > On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
> > > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
> > > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
> > > > offlining pages") skip the HWPoisoned pages when offlining pages, but
> > > > this should be skipped when onlining the pages too.
> > > >
> > > > Signed-off-by: Laurent Dufour <[email protected]>
> > > > ---
> > > > mm/memory_hotplug.c | 4 ++++
> > > > 1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > > index 6fa7208bcd56..741ddb50e7d2 100644
> > > > --- a/mm/memory_hotplug.c
> > > > +++ b/mm/memory_hotplug.c
> > > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
> > > > if (PageReserved(pfn_to_page(start_pfn)))
> > > > for (i = 0; i < nr_pages; i++) {
> > > > page = pfn_to_page(start_pfn + i);
> > > > + if (PageHWPoison(page)) {
> > > > + ClearPageReserved(page);
> > >
> > > Why do we clear page reserved? Also if the page is marked PageHWPoison, it
> > > was never offlined to begin with? Or do you expect this to be set on newly
> > > hotplugged memory? Also don't we need to skip the entire pageblock?
> >
> > If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
> > that we skip the page status check for hwpoisoned pages *not* to prevent
> > memory offlining for memblocks with hwpoisoned pages. That means that
> > hwpoisoned pages can be offlined.
>
> Is this patch actually correct? I am trying to wrap my head around it
> but it smells like it tries to avoid the problem rather than fix it
> properly. I might be wrong here of course but to me it sounds like
> poisoned page should simply be offlined and keep its poison state all
> the time. If the memory is hot-removed and added again we have lost the
> struct page along with the state which is the expected behavior. If it
> is still broken we will re-poison it.
>
> Anyway a patch to skip over poisoned pages during online makes perfect
> sense to me. The PageReserved fiddling around much less so.
>
> Or am I missing something. Let's CC Wen Congyang for the clarification
> here.

Wen Congyang appears to have disappeared and this fix isn't yet
finalized. Can we all please revisit it and have a think about
Michal's questions?

Thanks.


From: Laurent Dufour <[email protected]>
Subject: mm: skip HWPoisoned pages when onlining pages

b023f46813cd ("memory-hotplug: skip HWPoisoned page when offlining pages")
skipped the HWPoisoned pages when offlining pages, but this should be
skipped when onlining the pages too.

[email protected] said:

: If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd
: means that we skip the page status check for hwpoisoned pages *not* to
: prevent memory offlining for memblocks with hwpoisoned pages. That
: means that hwpoisoned pages can be offlined.
:
: And another reason to clear PageReserved is that we could reuse the
: hwpoisoned page after onlining back with replacing the broken DIMM. In
: this usecase, we first do unpoisoning to clear PageHWPoison, but it
: doesn't work if PageReserved is set. My simple testing shows the BUG
: below in unpoisoning (without the ClearPageReserved):
:
: Unpoison: Software-unpoisoned page 0x18000
: BUG: Bad page state in process page-types pfn:18000
: page:ffffda5440600000 count:0 mapcount:0 mapping: (null) index:0x70006b599
: flags: 0x1fffc00004081a(error|uptodate|dirty|reserved|swapbacked)
: raw: 001fffc00004081a 0000000000000000 000000070006b599 00000000ffffffff
: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
: page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
: bad because of flags: 0x800(reserved)

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laurent Dufour <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Andrey Vagin <[email protected]>
Cc: Glauber Costa <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Balbir Singh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

mm/memory_hotplug.c | 4 ++++
1 file changed, 4 insertions(+)

diff -puN mm/memory_hotplug.c~mm-skip-hwpoisoned-pages-when-onlining-pages mm/memory_hotplug.c
--- a/mm/memory_hotplug.c~mm-skip-hwpoisoned-pages-when-onlining-pages
+++ a/mm/memory_hotplug.c
@@ -696,6 +696,10 @@ static int online_pages_range(unsigned l
if (PageReserved(pfn_to_page(start_pfn)))
for (i = 0; i < nr_pages; i++) {
page = pfn_to_page(start_pfn + i);
+ if (PageHWPoison(page)) {
+ ClearPageReserved(page);
+ continue;
+ }
(*online_page_callback)(page);
onlined_pages++;
}
_


2018-01-23 18:16:55

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages

Hi Andrew,

On 18/01/2018 00:03, Andrew Morton wrote:
> On Fri, 28 Apr 2017 08:30:48 +0200 Michal Hocko <[email protected]> wrote:
>
>> On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote:
>>> On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
>>>> On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
>>>>> The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
>>>>> offlining pages") skip the HWPoisoned pages when offlining pages, but
>>>>> this should be skipped when onlining the pages too.
>>>>>
>>>>> Signed-off-by: Laurent Dufour <[email protected]>
>>>>> ---
>>>>> mm/memory_hotplug.c | 4 ++++
>>>>> 1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>>>> index 6fa7208bcd56..741ddb50e7d2 100644
>>>>> --- a/mm/memory_hotplug.c
>>>>> +++ b/mm/memory_hotplug.c
>>>>> @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
>>>>> if (PageReserved(pfn_to_page(start_pfn)))
>>>>> for (i = 0; i < nr_pages; i++) {
>>>>> page = pfn_to_page(start_pfn + i);
>>>>> + if (PageHWPoison(page)) {
>>>>> + ClearPageReserved(page);
>>>>
>>>> Why do we clear page reserved? Also if the page is marked PageHWPoison, it
>>>> was never offlined to begin with? Or do you expect this to be set on newly
>>>> hotplugged memory? Also don't we need to skip the entire pageblock?
>>>
>>> If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
>>> that we skip the page status check for hwpoisoned pages *not* to prevent
>>> memory offlining for memblocks with hwpoisoned pages. That means that
>>> hwpoisoned pages can be offlined.
>>
>> Is this patch actually correct? I am trying to wrap my head around it
>> but it smells like it tries to avoid the problem rather than fix it
>> properly. I might be wrong here of course but to me it sounds like
>> poisoned page should simply be offlined and keep its poison state all
>> the time. If the memory is hot-removed and added again we have lost the
>> struct page along with the state which is the expected behavior. If it
>> is still broken we will re-poison it.
>>
>> Anyway a patch to skip over poisoned pages during online makes perfect
>> sense to me. The PageReserved fiddling around much less so.
>>
>> Or am I missing something. Let's CC Wen Congyang for the clarification
>> here.
>
> Wen Congyang appears to have disappeared and this fix isn't yet
> finalized. Can we all please revisit it and have a think about
> Michal's questions?

I tried to recreate the original issue, but there were a lot of changes
done in this area since the last April.

I was not able to offline a poisoned page because isolate_movable_page() is
failing. I'll investigate that further...

Cheers,
Laurent.


> Thanks.
>
>
> From: Laurent Dufour <[email protected]>
> Subject: mm: skip HWPoisoned pages when onlining pages
>
> b023f46813cd ("memory-hotplug: skip HWPoisoned page when offlining pages")
> skipped the HWPoisoned pages when offlining pages, but this should be
> skipped when onlining the pages too.
>
> [email protected] said:
>
> : If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd
> : means that we skip the page status check for hwpoisoned pages *not* to
> : prevent memory offlining for memblocks with hwpoisoned pages. That
> : means that hwpoisoned pages can be offlined.
> :
> : And another reason to clear PageReserved is that we could reuse the
> : hwpoisoned page after onlining back with replacing the broken DIMM. In
> : this usecase, we first do unpoisoning to clear PageHWPoison, but it
> : doesn't work if PageReserved is set. My simple testing shows the BUG
> : below in unpoisoning (without the ClearPageReserved):
> :
> : Unpoison: Software-unpoisoned page 0x18000
> : BUG: Bad page state in process page-types pfn:18000
> : page:ffffda5440600000 count:0 mapcount:0 mapping: (null) index:0x70006b599
> : flags: 0x1fffc00004081a(error|uptodate|dirty|reserved|swapbacked)
> : raw: 001fffc00004081a 0000000000000000 000000070006b599 00000000ffffffff
> : raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
> : page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> : bad because of flags: 0x800(reserved)
>
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Laurent Dufour <[email protected]>
> Cc: Naoya Horiguchi <[email protected]>
> Cc: Andrey Vagin <[email protected]>
> Cc: Glauber Costa <[email protected]>
> Cc: Vladimir Davydov <[email protected]>
> Cc: Balbir Singh <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> ---
>
> mm/memory_hotplug.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff -puN mm/memory_hotplug.c~mm-skip-hwpoisoned-pages-when-onlining-pages mm/memory_hotplug.c
> --- a/mm/memory_hotplug.c~mm-skip-hwpoisoned-pages-when-onlining-pages
> +++ a/mm/memory_hotplug.c
> @@ -696,6 +696,10 @@ static int online_pages_range(unsigned l
> if (PageReserved(pfn_to_page(start_pfn)))
> for (i = 0; i < nr_pages; i++) {
> page = pfn_to_page(start_pfn + i);
> + if (PageHWPoison(page)) {
> + ClearPageReserved(page);
> + continue;
> + }
> (*online_page_callback)(page);
> onlined_pages++;
> }
> _
>