2020-11-19 11:00:51

by Oscar Salvador

[permalink] [raw]
Subject: [PATCH 0/7] HWPoison: Refactor get page interface

Hi,

following up on previous fix-ups an refactors, this patchset simplifies
the get page interface and removes the MF_COUNT_INCREASED trick we have
for soft offline.

Please, note that this patchset is on top of [1] and [2].

This patchset does three things:

1) Drops MF_COUNT_INCREASED trick
2) Refactors get page interface
3) Places a common entry for grabbin a page from both hard offline
and soft offline guarded by zone_pcp_{disable/enable}, so we do not
have to drain pcplists by ourself and retry again.

Note that the MF_COUNT_INCREASED trick was left because if get_hwpoison_page
races with put_page (e.g:)

CPU0 CPU1
put_page (refcount decremented to 0)
__put_single_page
free_unref_page
free_unref_page_prepare
free_pcp_prepare
free_pages_prepare soft_offline_page
:page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP get_any_page
get_hwpoison_page
free_unref_page_commit
free_one_page
__free_one_page (place it in buddy)

get_hwpoison_page sees that page has a refcount of 0, but since it was not placed
in buddy yet we cannot really handle it.
We now have a sort of maximum passes in get_any_page, so in case we race
with either an allocation or a put_page, we retry again.

After an off-list discussion with Naoya, he agreed to proceed.

[1] https://patchwork.kernel.org/project/linux-mm/list/?series=364009
[2] https://patchwork.kernel.org/project/linux-mm/list/?series=381903

Naoya Horiguchi (3):
mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
mm,hwpoison: remove MF_COUNT_INCREASED
mm,hwpoison: remove flag argument from soft offline functions

Oscar Salvador (4):
mm,hwpoison: Refactor get_any_page
mm,hwpoison: Drop pfn parameter
mm,hwpoison: Disable pcplists before grabbing a refcount
mm,hwpoison: Remove drain_all_pages from shake_page

drivers/base/memory.c | 2 +-
include/linux/mm.h | 9 +--
mm/madvise.c | 19 +++--
mm/memory-failure.c | 168 +++++++++++++++++-------------------------
4 files changed, 85 insertions(+), 113 deletions(-)

--
2.26.2


2020-11-19 11:01:17

by Oscar Salvador

[permalink] [raw]
Subject: [PATCH 3/7] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED

From: Naoya Horiguchi <[email protected]>

The call to get_user_pages_fast is only to get the pointer to a struct
page of a given address, pinning it is memory-poisoning handler's job,
so drop the refcount grabbed by get_user_pages_fast().

Note that the target page is still pinned after this put_page() because
the current process should have refcount from mapping.

Signed-off-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Oscar Salvador <[email protected]>
---
mm/madvise.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index c6b5524add58..7a0f64b93635 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -900,20 +900,23 @@ static int madvise_inject_error(int behavior,
*/
size = page_size(compound_head(page));

+ /*
+ * The get_user_pages_fast() is just to get the pfn of the
+ * given address, and the refcount has nothing to do with
+ * what we try to test, so it should be released immediately.
+ * This is racy but it's intended because the real hardware
+ * errors could happen at any moment and memory error handlers
+ * must properly handle the race.
+ */
+ put_page(page);
+
if (behavior == MADV_SOFT_OFFLINE) {
pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n",
pfn, start);
- ret = soft_offline_page(pfn, MF_COUNT_INCREASED);
+ ret = soft_offline_page(pfn, 0);
} else {
pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
pfn, start);
- /*
- * Drop the page reference taken by get_user_pages_fast(). In
- * the absence of MF_COUNT_INCREASED the memory_failure()
- * routine is responsible for pinning the page to prevent it
- * from being released back to the page allocator.
- */
- put_page(page);
ret = memory_failure(pfn, 0);
}

--
2.26.2

2020-11-19 11:01:37

by Oscar Salvador

[permalink] [raw]
Subject: [PATCH 5/7] mm,hwpoison: remove flag argument from soft offline functions

From: Naoya Horiguchi <[email protected]>

The argument @flag no longer affects the behavior of soft_offline_page()
and its variants, so let's remove them.

Signed-off-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Oscar Salvador <[email protected]>
---
drivers/base/memory.c | 2 +-
include/linux/mm.h | 2 +-
mm/madvise.c | 2 +-
mm/memory-failure.c | 9 ++++-----
4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index eef4ffb6122c..00a2f7191357 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -464,7 +464,7 @@ static ssize_t soft_offline_page_store(struct device *dev,
if (kstrtoull(buf, 0, &pfn) < 0)
return -EINVAL;
pfn >>= PAGE_SHIFT;
- ret = soft_offline_page(pfn, 0);
+ ret = soft_offline_page(pfn);
return ret == 0 ? count : ret;
}

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e2341d445ecb..48ba7deec46b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3037,7 +3037,7 @@ extern int sysctl_memory_failure_early_kill;
extern int sysctl_memory_failure_recovery;
extern void shake_page(struct page *p, int access);
extern atomic_long_t num_poisoned_pages __read_mostly;
-extern int soft_offline_page(unsigned long pfn, int flags);
+extern int soft_offline_page(unsigned long pfn);


/*
diff --git a/mm/madvise.c b/mm/madvise.c
index 7a0f64b93635..2bae426b01d5 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -913,7 +913,7 @@ static int madvise_inject_error(int behavior,
if (behavior == MADV_SOFT_OFFLINE) {
pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n",
pfn, start);
- ret = soft_offline_page(pfn, 0);
+ ret = soft_offline_page(pfn);
} else {
pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
pfn, start);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 37f4bb5a49d5..4abf5d5ffc96 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1572,7 +1572,7 @@ static void memory_failure_work_func(struct work_struct *work)
if (!gotten)
break;
if (entry.flags & MF_SOFT_OFFLINE)
- soft_offline_page(entry.pfn, entry.flags);
+ soft_offline_page(entry.pfn);
else
memory_failure(entry.pfn, entry.flags);
}
@@ -1712,7 +1712,7 @@ EXPORT_SYMBOL(unpoison_memory);
* We only incremented refcount in case the page was already in-use and it is
* a known type we can handle.
*/
-static int get_any_page(struct page *p, int flags)
+static int get_any_page(struct page *p)
{
int ret = 0, pass = 0;

@@ -1882,7 +1882,6 @@ static int soft_offline_free_page(struct page *page)
/**
* soft_offline_page - Soft offline a page.
* @pfn: pfn to soft-offline
- * @flags: flags. Same as memory_failure().
*
* Returns 0 on success, otherwise negated errno.
*
@@ -1901,7 +1900,7 @@ static int soft_offline_free_page(struct page *page)
* This is not a 100% solution for all memory, but tries to be
* ``good enough'' for the majority of memory.
*/
-int soft_offline_page(unsigned long pfn, int flags)
+int soft_offline_page(unsigned long pfn)
{
int ret;
struct page *page;
@@ -1921,7 +1920,7 @@ int soft_offline_page(unsigned long pfn, int flags)

retry:
get_online_mems();
- ret = get_any_page(page, flags);
+ ret = get_any_page(page);
put_online_mems();

if (ret > 0) {
--
2.26.2

2020-11-19 11:02:25

by Oscar Salvador

[permalink] [raw]
Subject: [PATCH 4/7] mm,hwpoison: remove MF_COUNT_INCREASED

From: Naoya Horiguchi <[email protected]>

Now there's no user of MF_COUNT_INCREASED, so we can safely remove
it from all calling points.

Signed-off-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Oscar Salvador <[email protected]>
---
include/linux/mm.h | 7 +++----
mm/memory-failure.c | 9 ++-------
2 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index db6ae4d3fb4e..e2341d445ecb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3025,10 +3025,9 @@ void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
unsigned long nr_pages);

enum mf_flags {
- MF_COUNT_INCREASED = 1 << 0,
- MF_ACTION_REQUIRED = 1 << 1,
- MF_MUST_KILL = 1 << 2,
- MF_SOFT_OFFLINE = 1 << 3,
+ MF_ACTION_REQUIRED = 1 << 0,
+ MF_MUST_KILL = 1 << 1,
+ MF_SOFT_OFFLINE = 1 << 2,
};
extern int memory_failure(unsigned long pfn, int flags);
extern void memory_failure_queue(unsigned long pfn, int flags);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 04237fc04a00..37f4bb5a49d5 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1185,7 +1185,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)

num_poisoned_pages_inc();

- if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) {
+ if (!get_hwpoison_page(p)) {
/*
* Check "filter hit" and "race with other subpage."
*/
@@ -1387,7 +1387,7 @@ int memory_failure(unsigned long pfn, int flags)
* In fact it's dangerous to directly bump up page count from 0,
* that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
*/
- if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) {
+ if (!get_hwpoison_page(p)) {
if (is_free_buddy_page(p)) {
if (take_page_off_buddy(p)) {
page_ref_inc(p);
@@ -1716,9 +1716,6 @@ static int get_any_page(struct page *p, int flags)
{
int ret = 0, pass = 0;

- if (flags & MF_COUNT_INCREASED)
- return 1;
-
try_again:
if (!get_hwpoison_page(p)) {
if (page_count(p)) {
@@ -1919,8 +1916,6 @@ int soft_offline_page(unsigned long pfn, int flags)

if (PageHWPoison(page)) {
pr_info("%s: %#lx page already poisoned\n", __func__, pfn);
- if (flags & MF_COUNT_INCREASED)
- put_page(page);
return 0;
}

--
2.26.2

2020-11-19 11:02:39

by Oscar Salvador

[permalink] [raw]
Subject: [PATCH 2/7] mm,hwpoison: Drop pfn parameter

pfn parameter is no longer needed, drop it.

Signed-off-by: Oscar Salvador <[email protected]>
---
mm/memory-failure.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 0d2323ba4b8e..04237fc04a00 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1712,7 +1712,7 @@ EXPORT_SYMBOL(unpoison_memory);
* We only incremented refcount in case the page was already in-use and it is
* a known type we can handle.
*/
-static int get_any_page(struct page *p, unsigned long pfn, int flags)
+static int get_any_page(struct page *p, int flags)
{
int ret = 0, pass = 0;

@@ -1926,7 +1926,7 @@ int soft_offline_page(unsigned long pfn, int flags)

retry:
get_online_mems();
- ret = get_any_page(page, pfn, flags);
+ ret = get_any_page(page, flags);
put_online_mems();

if (ret > 0) {
--
2.26.2

Subject: Re: [PATCH 2/7] mm,hwpoison: Drop pfn parameter

On Thu, Nov 19, 2020 at 11:57:11AM +0100, Oscar Salvador wrote:
> pfn parameter is no longer needed, drop it.
>
> Signed-off-by: Oscar Salvador <[email protected]>

Acked-by: Naoya Horiguchi <[email protected]>

2020-11-25 17:01:30

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 2/7] mm,hwpoison: Drop pfn parameter

On 11/19/20 11:57 AM, Oscar Salvador wrote:
> pfn parameter is no longer needed, drop it.

Could have been also part of previous patch.

> Signed-off-by: Oscar Salvador <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

> ---
> mm/memory-failure.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 0d2323ba4b8e..04237fc04a00 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1712,7 +1712,7 @@ EXPORT_SYMBOL(unpoison_memory);
> * We only incremented refcount in case the page was already in-use and it is
> * a known type we can handle.
> */
> -static int get_any_page(struct page *p, unsigned long pfn, int flags)
> +static int get_any_page(struct page *p, int flags)
> {
> int ret = 0, pass = 0;
>
> @@ -1926,7 +1926,7 @@ int soft_offline_page(unsigned long pfn, int flags)
>
> retry:
> get_online_mems();
> - ret = get_any_page(page, pfn, flags);
> + ret = get_any_page(page, flags);
> put_online_mems();
>
> if (ret > 0) {
>

2020-11-25 18:26:37

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 3/7] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED

On 11/19/20 11:57 AM, Oscar Salvador wrote:
> From: Naoya Horiguchi <[email protected]>
>
> The call to get_user_pages_fast is only to get the pointer to a struct
> page of a given address, pinning it is memory-poisoning handler's job,
> so drop the refcount grabbed by get_user_pages_fast().
>
> Note that the target page is still pinned after this put_page() because
> the current process should have refcount from mapping.

Well, but can't it go away due to reclaim, migration or whatever?

> Signed-off-by: Naoya Horiguchi <[email protected]>
> Signed-off-by: Oscar Salvador <[email protected]>
> ---
> mm/madvise.c | 19 +++++++++++--------
> 1 file changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index c6b5524add58..7a0f64b93635 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -900,20 +900,23 @@ static int madvise_inject_error(int behavior,
> */
> size = page_size(compound_head(page));
>
> + /*
> + * The get_user_pages_fast() is just to get the pfn of the
> + * given address, and the refcount has nothing to do with
> + * what we try to test, so it should be released immediately.
> + * This is racy but it's intended because the real hardware
> + * errors could happen at any moment and memory error handlers
> + * must properly handle the race.

Sure they have to. We might just be unexpectedly messing with other process'
memory. Or does anything else prevent that?

> + */
> + put_page(page);
> +
> if (behavior == MADV_SOFT_OFFLINE) {
> pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n",
> pfn, start);
> - ret = soft_offline_page(pfn, MF_COUNT_INCREASED);
> + ret = soft_offline_page(pfn, 0);
> } else {
> pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
> pfn, start);
> - /*
> - * Drop the page reference taken by get_user_pages_fast(). In
> - * the absence of MF_COUNT_INCREASED the memory_failure()
> - * routine is responsible for pinning the page to prevent it
> - * from being released back to the page allocator.
> - */
> - put_page(page);
> ret = memory_failure(pfn, 0);
> }
>
>

2020-12-01 11:40:48

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH 3/7] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED

On Wed, Nov 25, 2020 at 07:20:33PM +0100, Vlastimil Babka wrote:
> On 11/19/20 11:57 AM, Oscar Salvador wrote:
> > From: Naoya Horiguchi <[email protected]>
> >
> > The call to get_user_pages_fast is only to get the pointer to a struct
> > page of a given address, pinning it is memory-poisoning handler's job,
> > so drop the refcount grabbed by get_user_pages_fast().
> >
> > Note that the target page is still pinned after this put_page() because
> > the current process should have refcount from mapping.
>
> Well, but can't it go away due to reclaim, migration or whatever?

Yes, it can.

> > @@ -900,20 +900,23 @@ static int madvise_inject_error(int behavior,
> > */
> > size = page_size(compound_head(page));
> > + /*
> > + * The get_user_pages_fast() is just to get the pfn of the
> > + * given address, and the refcount has nothing to do with
> > + * what we try to test, so it should be released immediately.
> > + * This is racy but it's intended because the real hardware
> > + * errors could happen at any moment and memory error handlers
> > + * must properly handle the race.
>
> Sure they have to. We might just be unexpectedly messing with other process'
> memory. Or does anything else prevent that?

No, nothing does, and I have to confess that I managed to confuse myself here.
If we release such page and that page ends up in buddy, nothing prevents someone
else to get that page, and then we would be messing with other process memory.

I guess the right thing to do is just to make sure we got that page and that
that page remains pinned as long as the memory failure handling goes.

I will remove those patches from the patchset and re-submit with only the
refactoring and pcp-disabling.

Thanks Vlastimil

--
Oscar Salvador
SUSE L3

2020-12-02 13:41:00

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH 0/7] HWPoison: Refactor get page interface

On Thu, 2020-11-19 at 11:57 +0100, Oscar Salvador wrote:
> Hi,
>
> following up on previous fix-ups an refactors, this patchset simplifies
> the get page interface and removes the MF_COUNT_INCREASED trick we have
> for soft offline.

Well, the madvise() EIO is back. I don't understand why we can't test it on a
NUMA system before posting this over and over again.

# git clone https://e.coding.net/cailca/linux/mm
# cd mm; make
# ./ranbug 1
- start: migrate_huge_offline
- use NUMA nodes 0,3.
- mmap and free 8388608 bytes hugepages on node 0
- mmap and free 8388608 bytes hugepages on node 3
madvise: Input/output error

[ 1270.054919][ T7497] Soft offlining pfn 0x1958e00 at process virtual address 0x7f7d9ca00000
[ 1270.067318][ T7497] Soft offlining pfn 0x18d0600 at process virtual address 0x7f7d9c800000
[ 1270.078856][ T7497] Soft offlining pfn 0x1ac800 at process virtual address 0x7f7d9ca00000
[ 1270.091268][ T7497] Soft offlining pfn 0x1e10a00 at process virtual address 0x7f7d9c800000
[ 1270.101946][ T7497] Soft offlining pfn 0x18c800 at process virtual address 0x7f7d9ca00000
[ 1270.111678][ T7497] soft offline: 0x18c800: hugepage isolation failed: 0, page count 2, type bfffc00001000e (referenced|uptodate|dirty|head)
[ 1270.126133][ T7497] Soft offlining pfn 0x18b5400 at process virtual address 0x7f7d9c800000
[ 1270.136581][ T7497] Soft offlining pfn 0x211c00 at process virtual address 0x7f7d9ca00000
[ 1270.146214][ T7497] soft offline: 0x211c00: hugepage isolation failed: 0, page count 2, type bfffc00001000e (referenced|uptodate|dirty|head)
[ 1270.160624][ T7497] Soft offlining pfn 0x19bee00 at process virtual address 0x7f7d9c800000
[ 1270.170896][ T7497] Soft offlining pfn 0x1e21a00 at process virtual address 0x7f7d9ca00000
[ 1270.185011][ T7497] Soft offlining pfn 0x1fd1200 at process virtual address 0x7f7d9c800000
[ 1270.195341][ T7497] Soft offlining pfn 0x1882400 at process virtual address 0x7f7d9ca00000
[ 1270.480593][ T7497] Soft offlining pfn 0x18bc000 at process virtual address 0x7f7d9c800000
[ 1270.491961][ T7497] soft offline: 0x18bc000: hugepage isolation failed: 0, page count 2, type 3bfffc00001000e (referenced|uptodate|dirty|head)
[ 1270.506018][ T7497] Soft offlining pfn 0x1e76a00 at process virtual address 0x7f7d9c800000
[ 1270.590266][ T7497] Soft offlining pfn 0x1b3c00 at process virtual address 0x7f7d9ca00000
[ 1270.600207][ T7497] soft offline: 0x1b3c00: hugepage isolation failed: 0, page count 2, type bfffc00001000e (referenced|uptodate|dirty|head)
[ 1270.614316][ T7497] Soft offlining pfn 0x1882600 at process virtual address 0x7f7d9c800000
[ 1270.662427][ T7497] Soft offlining pfn 0x1b3c00 at process virtual address 0x7f7d9ca00000
[ 1270.744249][ T7497] Soft offlining pfn 0x18bc000 at process virtual address 0x7f7d9c800000
[ 1270.754314][ T7497] Soft offlining pfn 0x18d1200 at process virtual address 0x7f7d9ca00000
[ 1270.765204][ T7497] soft offline: 0x18d1200: hugepage isolation failed: 0, page count 2, type 3bfffc00001000e (referenced|uptodate|dirty|head)
[ 1270.816653][ T7497] Soft offlining pfn 0x18d0400 at process virtual address 0x7f7d9c800000
[ 1270.827049][ T7497] Soft offlining pfn 0x18d1200 at process virtual address 0x7f7d9ca00000
[ 1270.837997][ T7497] soft offline: 0x18d1200: hugepage isolation failed: 0, page count 2, type 3bfffc00001000e (referenced|uptodate|dirty|head)
[ 1270.852156][ T7497] Soft offlining pfn 0x186ca00 at process virtual address 0x7f7d9c800000
[ 1270.862350][ T7497] Soft offlining pfn 0x18d1200 at process virtual address 0x7f7d9ca00000
[ 1270.872922][ T7497] soft offline: 0x18d1200: hugepage isolation failed: 0, page count 2, type 3bfffc00001000e (referenced|uptodate|dirty|head)
[ 1270.887133][ T7497] Soft offlining pfn 0x18ac200 at process virtual address 0x7f7d9c800000
[ 1270.897450][ T7497] Soft offlining pfn 0x211c00 at process virtual address 0x7f7d9ca00000
[ 1270.907416][ T7497] soft offline: 0x211c00: hugepage isolation failed: 0, page count 2, type bfffc00001000e (referenced|uptodate|dirty|head)
[ 1270.921365][ T7497] Soft offlining pfn 0x1e1cc00 at process virtual address 0x7f7d9c800000
[ 1270.931700][ T7497] Soft offlining pfn 0x18c800 at process virtual address 0x7f7d9ca00000
[ 1270.941580][ T7497] soft offline: 0x18c800: hugepage isolation failed: 0, page count 2, type bfffc00001000e (referenced|uptodate|dirty|head)
[ 1270.955649][ T7497] Soft offlining pfn 0x1e6ae00 at process virtual address 0x7f7d9c800000
[ 1270.966063][ T7497] Soft offlining pfn 0x211c00 at process virtual address 0x7f7d9ca00000
[ 1270.975965][ T7497] soft offline: 0x211c00: hugepage isolation failed: 0, page count 2, type bfffc00001000e (referenced|uptodate|dirty|head)
[ 1270.990059][ T7497] Soft offlining pfn 0x1e72e00 at process virtual address 0x7f7d9c800000
[ 1271.000323][ T7497] Soft offlining pfn 0x18d1200 at process virtual address 0x7f7d9ca00000
[ 1271.011006][ T7497] soft offline: 0x18d1200: hugepage isolation failed: 0, page count 2, type 3bfffc00001000e (referenced|uptodate|dirty|head)
[ 1271.025152][ T7497] Soft offlining pfn 0x1e22200 at process virtual address 0x7f7d9c800000
[ 1271.035395][ T7497] Soft offlining pfn 0x18d1200 at process virtual address 0x7f7d9ca00000
[ 1271.045916][ T7497] soft offline: 0x18d1200: hugepage isolation failed: 0, page count 2, type 3bfffc00001000e (referenced|uptodate|dirty|head)
[ 1271.060159][ T7497] Soft offlining pfn 0x1e6fe00 at process virtual address 0x7f7d9c800000
[ 1271.070695][ T7497] Soft offlining pfn 0x18c800 at process virtual address 0x7f7d9ca00000
[ 1271.080596][ T7497] soft offline: 0x18c800: hugepage isolation failed: 0, page count 2, type bfffc00001000e (referenced|uptodate|dirty|head)
[ 1271.094725][ T7497] Soft offlining pfn 0x1968200 at process virtual address 0x7f7d9c800000
[ 1271.105006][ T7497] Soft offlining pfn 0x18d1200 at process virtual address 0x7f7d9ca00000
[ 1271.115567][ T7497] soft offline: 0x18d1200: hugepage isolation failed: 0, page count 2, type 3bfffc00001000e (referenced|uptodate|dirty|head)
[ 1271.129775][ T7497] Soft offlining pfn 0x1e1ae00 at process virtual address 0x7f7d9c800000
[ 1271.140285][ T7497] Soft offlining pfn 0x18c800 at process virtual address 0x7f7d9ca00000
[ 1271.150185][ T7497] soft offline: 0x18c800: hugepage isolation failed: 0, page count 2, type bfffc[ 1271.468115][ T7497] Soft offlining pfn 0x1de4600 at process virtual address 0x7f7d9c800000
[ 1271.479348][ T7497] Soft offlining pfn 0x145e00 at process virtual address 0x7f7d9ca00000
[ 1271.489928][ T7497] soft offline: 0x145e00: hugepage isolation 1271.538433][ T7497] Soft offlining pfn 0x1fae00 at process virtual address 0x7f7d9c800000
[ 1271.548880][ T7497] Soft offlining pfn 0x1995e00 at process virtual address 0x7f7d9ca00000
[ 1271.558877][ T7497] soft offline: 0x1995e00: hugepage isolation failed: 0, page count 2, type 3bfffc00001000e (referenced|uptodate|dirty|head)
[ 1271.573055][ T7497] Soft offlining pfn 0x221e00 at process virtual address 0x7f7d9c800000
[ 1271.583453][ T7497] Soft offlining pfn 0x1901800 at process virtual address 0x7f7d9ca00000
[ 1271.593440][ T7497] soft offline: 0x1901800: hugepage isolation failed: 0, page count 2, type 3bfffc00001000e (referenced|uptodate|dirty|head)
[ 1271.610005][ T7497] Soft offlining pfn 0x232400 at process virtual address 0x7f7d9c800000
[ 1271.620439][ T7497] Soft offlinin[ 1272.005890][ T7497] Soft offlining pfn 0x230e00 at process virtual address 0x7f7d9c800000
[ 1272.017226][ T7497] Soft offlining pfn 0x185fe00 at process virtual address 0x7f7d9ca00000
[ 1272.029194][ T7497] Soft offlining pfn 0x1f1400 at process virtual address 0x7f7d9c800000
[ 1272.040088][ T7497] Soft offlining pfn 0x1f9e00 at process virtual address 0x7f7d9ca00000
[ 1272.052415][ T7497] Soft offlining pfn 0x1885a00 at process virtual address 0x7f7d9c800000
[ 1272.062510][ T7497] Soft offlining pfn 0x18b6000 at process virtual address 0x7f7d9ca00000
[ 1272.071931][ T7497] soft_offline_page: 0x18b6000: unknown page type: 3bfffc000000000 ((%pG?))

>
> Please, note that this patchset is on top of [1] and [2].
>
> This patchset does three things:
>
> 1) Drops MF_COUNT_INCREASED trick
> 2) Refactors get page interface
> 3) Places a common entry for grabbin a page from both hard offline
> and soft offline guarded by zone_pcp_{disable/enable}, so we do not
> have to drain pcplists by ourself and retry again.
>
> Note that the MF_COUNT_INCREASED trick was left because if get_hwpoison_page
> races with put_page (e.g:)
>
> CPU0 CPU1
> put_page (refcount decremented to 0)
> __put_single_page
> free_unref_page
> free_unref_page_prepare
> free_pcp_prepare
> free_pages_prepare soft_offline_page
> :page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP get_any_page
> get_hwpoison_page
> free_unref_page_commit
> free_one_page
> __free_one_page (place it in buddy)
>
> get_hwpoison_page sees that page has a refcount of 0, but since it was not
> placed
> in buddy yet we cannot really handle it.
> We now have a sort of maximum passes in get_any_page, so in case we race
> with either an allocation or a put_page, we retry again.
>
> After an off-list discussion with Naoya, he agreed to proceed.
>
> [1] https://patchwork.kernel.org/project/linux-mm/list/?series=364009
> [2] https://patchwork.kernel.org/project/linux-mm/list/?series=381903
>
> Naoya Horiguchi (3):
> mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
> mm,hwpoison: remove MF_COUNT_INCREASED
> mm,hwpoison: remove flag argument from soft offline functions
>
> Oscar Salvador (4):
> mm,hwpoison: Refactor get_any_page
> mm,hwpoison: Drop pfn parameter
> mm,hwpoison: Disable pcplists before grabbing a refcount
> mm,hwpoison: Remove drain_all_pages from shake_page
>
> drivers/base/memory.c | 2 +-
> include/linux/mm.h | 9 +--
> mm/madvise.c | 19 +++--
> mm/memory-failure.c | 168 +++++++++++++++++-------------------------
> 4 files changed, 85 insertions(+), 113 deletions(-)
>

2020-12-02 13:45:14

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH 0/7] HWPoison: Refactor get page interface

On Wed, Dec 02, 2020 at 08:34:57AM -0500, Qian Cai wrote:
> On Thu, 2020-11-19 at 11:57 +0100, Oscar Salvador wrote:
> > Hi,
> >
> > following up on previous fix-ups an refactors, this patchset simplifies
> > the get page interface and removes the MF_COUNT_INCREASED trick we have
> > for soft offline.
>
> Well, the madvise() EIO is back. I don't understand why we can't test it on a
> NUMA system before posting this over and over again.
>
> # git clone https://e.coding.net/cailca/linux/mm
> # cd mm; make
> # ./ranbug 1
> - start: migrate_huge_offline
> - use NUMA nodes 0,3.
> - mmap and free 8388608 bytes hugepages on node 0
> - mmap and free 8388608 bytes hugepages on node 3
> madvise: Input/output error

I tried it out myself enlarging the window race artificially but I was not able
to get -EIO anymore.
But as Vlastimil pointed out in the respective patch, it is better to keep the
page pinned for madvise.

I am planning to re-post leaving out the patches that remove the pinning.

Anyway, thanks for the report.

--
Oscar Salvador
SUSE L3

2020-12-04 17:31:01

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 3/7] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED

On 12/1/20 12:35 PM, Oscar Salvador wrote:
> On Wed, Nov 25, 2020 at 07:20:33PM +0100, Vlastimil Babka wrote:
>> On 11/19/20 11:57 AM, Oscar Salvador wrote:
>> > From: Naoya Horiguchi <[email protected]>
>> >
>> > The call to get_user_pages_fast is only to get the pointer to a struct
>> > page of a given address, pinning it is memory-poisoning handler's job,
>> > so drop the refcount grabbed by get_user_pages_fast().
>> >
>> > Note that the target page is still pinned after this put_page() because
>> > the current process should have refcount from mapping.
>>
>> Well, but can't it go away due to reclaim, migration or whatever?
>
> Yes, it can.
>
>> > @@ -900,20 +900,23 @@ static int madvise_inject_error(int behavior,
>> > */
>> > size = page_size(compound_head(page));
>> > + /*
>> > + * The get_user_pages_fast() is just to get the pfn of the
>> > + * given address, and the refcount has nothing to do with
>> > + * what we try to test, so it should be released immediately.
>> > + * This is racy but it's intended because the real hardware
>> > + * errors could happen at any moment and memory error handlers
>> > + * must properly handle the race.
>>
>> Sure they have to. We might just be unexpectedly messing with other process'
>> memory. Or does anything else prevent that?
>
> No, nothing does, and I have to confess that I managed to confuse myself here.
> If we release such page and that page ends up in buddy, nothing prevents someone
> else to get that page, and then we would be messing with other process memory.
>
> I guess the right thing to do is just to make sure we got that page and that
> that page remains pinned as long as the memory failure handling goes.

OK, so that means we don't introduce this race for MADV_SOFT_OFFLINE, but it's
already (and still) there for MADV_HWPOISON since Dan's 23e7b5c2e271 ("mm,
madvise_inject_error: Let memory_failure() optionally take a page reference") no?

> I will remove those patches from the patchset and re-submit with only the
> refactoring and pcp-disabling.
>
> Thanks Vlastimil
>

2020-12-05 18:09:28

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH 3/7] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED

On Fri, Dec 04, 2020 at 06:25:31PM +0100, Vlastimil Babka wrote:
> OK, so that means we don't introduce this race for MADV_SOFT_OFFLINE, but it's
> already (and still) there for MADV_HWPOISON since Dan's 23e7b5c2e271 ("mm,
> madvise_inject_error: Let memory_failure() optionally take a page reference") no?

What about the following?
CCing Dan as well.

From: Oscar Salvador <[email protected]>
Date: Sat, 5 Dec 2020 16:14:40 +0100
Subject: [PATCH] mm,memory_failure: Always pin the page in
madvise_inject_error

madvise_inject_error() uses get_user_pages_fast to get the page
from the addr we specified.
After [1], we drop such extra reference for memory_failure() path.
That commit says that memory_failure wanted to keep the pin in order
to take the page out of circulation.

The truth is that we need to keep the page pinned, otherwise the
page might be re-used after the put_page(), and we can end up messing
with someone else's memory.
E.g:

CPU0
process X CPU1
madvise_inject_error
get_user_pages
put_page
page gets reclaimed
process Y allocates the page
memory_failure
// We mess with process Y memory

madvise() is meant to operate on a self address space, so messing with
pages that do not belong to us seems the wrong thing to do.
To avoid that, let us keep the page pinned for memory_failure as well.

Pages for DAX mappings will release this extra refcount in
memory_failure_dev_pagemap.

[1] ("23e7b5c2e271: mm, madvise_inject_error:
Let memory_failure() optionally take a page reference")

Signed-off-by: Oscar Salvador <[email protected]>
Suggested-by: Vlastimil Babka <[email protected]>
Fixes: 23e7b5c2e271 ("mm, madvise_inject_error: Let memory_failure() optionally take a page reference")
---
mm/madvise.c | 9 +--------
mm/memory-failure.c | 6 ++++++
2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index c6b5524add58..19edddba196d 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -907,14 +907,7 @@ static int madvise_inject_error(int behavior,
} else {
pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
pfn, start);
- /*
- * Drop the page reference taken by get_user_pages_fast(). In
- * the absence of MF_COUNT_INCREASED the memory_failure()
- * routine is responsible for pinning the page to prevent it
- * from being released back to the page allocator.
- */
- put_page(page);
- ret = memory_failure(pfn, 0);
+ ret = memory_failure(pfn, MF_COUNT_INCREASED);
}

if (ret)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 869ece2a1de2..ba861169c9ae 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1269,6 +1269,12 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
if (!cookie)
goto out;

+ if (flags & MF_COUNT_INCREASED)
+ /*
+ * Drop the extra refcount in case we come from madvise().
+ */
+ put_page(page);
+
if (hwpoison_filter(page)) {
rc = 0;
goto unlock;


--
Oscar Salvador
SUSE L3

Subject: Re: [PATCH 3/7] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED

On Sat, Dec 05, 2020 at 04:34:23PM +0100, Oscar Salvador wrote:
> On Fri, Dec 04, 2020 at 06:25:31PM +0100, Vlastimil Babka wrote:
> > OK, so that means we don't introduce this race for MADV_SOFT_OFFLINE, but it's
> > already (and still) there for MADV_HWPOISON since Dan's 23e7b5c2e271 ("mm,
> > madvise_inject_error: Let memory_failure() optionally take a page reference") no?
>
> What about the following?
> CCing Dan as well.

Hi Oscar, Vlastimil,

Thanks for mentioning this. I agree with that direction.

>
> From: Oscar Salvador <[email protected]>
> Date: Sat, 5 Dec 2020 16:14:40 +0100
> Subject: [PATCH] mm,memory_failure: Always pin the page in
> madvise_inject_error
>
> madvise_inject_error() uses get_user_pages_fast to get the page
> from the addr we specified.
> After [1], we drop such extra reference for memory_failure() path.
> That commit says that memory_failure wanted to keep the pin in order
> to take the page out of circulation.
>
> The truth is that we need to keep the page pinned, otherwise the
> page might be re-used after the put_page(), and we can end up messing
> with someone else's memory.
> E.g:
>
> CPU0
> process X CPU1
> madvise_inject_error
> get_user_pages
> put_page
> page gets reclaimed
> process Y allocates the page
> memory_failure
> // We mess with process Y memory
>
> madvise() is meant to operate on a self address space, so messing with
> pages that do not belong to us seems the wrong thing to do.
> To avoid that, let us keep the page pinned for memory_failure as well.
>
> Pages for DAX mappings will release this extra refcount in
> memory_failure_dev_pagemap.
>
> [1] ("23e7b5c2e271: mm, madvise_inject_error:
> Let memory_failure() optionally take a page reference")
>
> Signed-off-by: Oscar Salvador <[email protected]>
> Suggested-by: Vlastimil Babka <[email protected]>
> Fixes: 23e7b5c2e271 ("mm, madvise_inject_error: Let memory_failure() optionally take a page reference")
> ---
> mm/madvise.c | 9 +--------
> mm/memory-failure.c | 6 ++++++
> 2 files changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index c6b5524add58..19edddba196d 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -907,14 +907,7 @@ static int madvise_inject_error(int behavior,
> } else {
> pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
> pfn, start);
> - /*
> - * Drop the page reference taken by get_user_pages_fast(). In
> - * the absence of MF_COUNT_INCREASED the memory_failure()
> - * routine is responsible for pinning the page to prevent it
> - * from being released back to the page allocator.
> - */
> - put_page(page);
> - ret = memory_failure(pfn, 0);
> + ret = memory_failure(pfn, MF_COUNT_INCREASED);
> }
>
> if (ret)
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 869ece2a1de2..ba861169c9ae 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1269,6 +1269,12 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
> if (!cookie)
> goto out;
>
> + if (flags & MF_COUNT_INCREASED)
> + /*
> + * Drop the extra refcount in case we come from madvise().
> + */
> + put_page(page);
> +

Should this if-block come before dax_lock_page() block?
It seems that if dax_lock_page returns NULL, memory_failure_dev_pagemap()
returns without releasing the refcount.
memory_failure() on dev_pagemap doesn't use page refcount (unlike other
type of memory), so we can release it unconditionally.

Thanks,
Naoya Horiguchi

2020-12-07 07:28:20

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH 3/7] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED

On 2020-12-07 03:34, HORIGUCHI NAOYA wrote:
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 869ece2a1de2..ba861169c9ae 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1269,6 +1269,12 @@ static int memory_failure_dev_pagemap(unsigned
>> long pfn, int flags,
>> if (!cookie)
>> goto out;
>>
>> + if (flags & MF_COUNT_INCREASED)
>> + /*
>> + * Drop the extra refcount in case we come from madvise().
>> + */
>> + put_page(page);
>> +
>
> Should this if-block come before dax_lock_page() block?

Yeah, it should go first thing since as you noticed we kept the refcount
if we fail.
Saturday brain... I will fix it.

Thanks Naoya

--
Oscar Salvador
SUSE L3