This fixes a bug in madvise() where if you'd try to soft offline a
hugepage via madvise(), while walking the address range you'd end up,
using the wrong page offset due to attempting to get the compound
order of a former but presently not compound page, due to dissolving
the huge page (since c3114a8).
Signed-off-by: Alexandru Moise <[email protected]>
---
mm/madvise.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/mm/madvise.c b/mm/madvise.c
index 21261ff0466f..25bade36e9ca 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -625,18 +625,26 @@ static int madvise_inject_error(int behavior,
{
struct page *page;
struct zone *zone;
+ unsigned int order;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
- for (; start < end; start += PAGE_SIZE <<
- compound_order(compound_head(page))) {
+
+ for (; start < end; start += PAGE_SIZE << order) {
int ret;
ret = get_user_pages_fast(start, 1, 0, &page);
if (ret != 1)
return ret;
+ /*
+ * When soft offlining hugepages, after migrating the page
+ * we dissolve it, therefore in the second loop "page" will
+ * no longer be a compound page, and order will be 0.
+ */
+ order = compound_order(compound_head(page));
+
if (PageHWPoison(page)) {
put_page(page);
continue;
--
2.14.1
On Tue, 12 Sep 2017 22:43:06 +0200 Alexandru Moise <[email protected]> wrote:
> This fixes a bug in madvise() where if you'd try to soft offline a
> hugepage via madvise(), while walking the address range you'd end up,
> using the wrong page offset due to attempting to get the compound
> order of a former but presently not compound page, due to dissolving
> the huge page (since c3114a8).
What are the user visible effects of the bug? The wrong page is
offlined? No offlining occurs?
On Tue, 12 Sep 2017 13:54:48 -0700 Andrew Morton <[email protected]> wrote:
> On Tue, 12 Sep 2017 22:43:06 +0200 Alexandru Moise <[email protected]> wrote:
>
> > This fixes a bug in madvise() where if you'd try to soft offline a
> > hugepage via madvise(), while walking the address range you'd end up,
> > using the wrong page offset due to attempting to get the compound
> > order of a former but presently not compound page, due to dissolving
> > the huge page (since c3114a8).
>
> What are the user visible effects of the bug? The wrong page is
> offlined? No offlining occurs?
This also affects MADV_HWPOISON?
On Tue, Sep 12, 2017 at 01:54:48PM -0700, Andrew Morton wrote:
> On Tue, 12 Sep 2017 22:43:06 +0200 Alexandru Moise <[email protected]> wrote:
>
> > This fixes a bug in madvise() where if you'd try to soft offline a
> > hugepage via madvise(), while walking the address range you'd end up,
> > using the wrong page offset due to attempting to get the compound
> > order of a former but presently not compound page, due to dissolving
> > the huge page (since c3114a8).
>
> What are the user visible effects of the bug? The wrong page is
> offlined? No offlining occurs?
I end up with all my free pages getting offlined. Except 1.
../Alex
On Tue, Sep 12, 2017 at 01:58:35PM -0700, Andrew Morton wrote:
> On Tue, 12 Sep 2017 13:54:48 -0700 Andrew Morton <[email protected]> wrote:
>
> > On Tue, 12 Sep 2017 22:43:06 +0200 Alexandru Moise <[email protected]> wrote:
> >
> > > This fixes a bug in madvise() where if you'd try to soft offline a
> > > hugepage via madvise(), while walking the address range you'd end up,
> > > using the wrong page offset due to attempting to get the compound
> > > order of a former but presently not compound page, due to dissolving
> > > the huge page (since c3114a8).
> >
> > What are the user visible effects of the bug? The wrong page is
> > offlined? No offlining occurs?
>
> This also affects MADV_HWPOISON?
No, MADV_HWPOISON is ok because it doesn't dissolve the hugepage, so the page
remains a compound page the 2nd loop around.
../Alex
Hi Alexandru,
On Tue, Sep 12, 2017 at 10:43:06PM +0200, Alexandru Moise wrote:
> This fixes a bug in madvise() where if you'd try to soft offline a
> hugepage via madvise(), while walking the address range you'd end up,
> using the wrong page offset due to attempting to get the compound
> order of a former but presently not compound page, due to dissolving
> the huge page (since c3114a8).
>
> Signed-off-by: Alexandru Moise <[email protected]>
There was a similar discussion in https://marc.info/?l=linux-kernel&m=150354919510631&w=2
over thp. As I stated there, if we give multi-page range into the parameters
[start, end), we expect that memory errors are injected to every single page
within the range.
So I start to feel that we should revert the following patch which introduced
the multi-page stepping.
commit 20cb6cab52a21b46e3c0dc7bd23f004f810fb421
Author: Wanpeng Li <[email protected]>
Date: Mon Sep 30 13:45:21 2013 -0700
mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood
In order to suppress the printk flood, we can use ratelimit mechanism, or
just s/pr_info/pr_debug/ might be ok.
Thanks,
Naoya Horiguchi
> ---
> mm/madvise.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 21261ff0466f..25bade36e9ca 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -625,18 +625,26 @@ static int madvise_inject_error(int behavior,
> {
> struct page *page;
> struct zone *zone;
> + unsigned int order;
>
> if (!capable(CAP_SYS_ADMIN))
> return -EPERM;
>
> - for (; start < end; start += PAGE_SIZE <<
> - compound_order(compound_head(page))) {
> +
> + for (; start < end; start += PAGE_SIZE << order) {
> int ret;
>
> ret = get_user_pages_fast(start, 1, 0, &page);
> if (ret != 1)
> return ret;
>
> + /*
> + * When soft offlining hugepages, after migrating the page
> + * we dissolve it, therefore in the second loop "page" will
> + * no longer be a compound page, and order will be 0.
> + */
> + order = compound_order(compound_head(page));
> +
> if (PageHWPoison(page)) {
> put_page(page);
> continue;
> --
> 2.14.1
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On Wed, Sep 13, 2017 at 12:13:09AM +0000, Naoya Horiguchi wrote:
> Hi Alexandru,
>
> On Tue, Sep 12, 2017 at 10:43:06PM +0200, Alexandru Moise wrote:
> > This fixes a bug in madvise() where if you'd try to soft offline a
> > hugepage via madvise(), while walking the address range you'd end up,
> > using the wrong page offset due to attempting to get the compound
> > order of a former but presently not compound page, due to dissolving
> > the huge page (since c3114a8).
> >
> > Signed-off-by: Alexandru Moise <[email protected]>
>
> There was a similar discussion in https://marc.info/?l=linux-kernel&m=150354919510631&w=2
> over thp. As I stated there, if we give multi-page range into the parameters
> [start, end), we expect that memory errors are injected to every single page
> within the range.
At the moment we'll end up offlining the i'th subpage of the newly migrated page with
each itteration. That's why I end up without free pages in hugetlbfs.
With this patch we migrate the hugepage, offline 1 subpage and dissolve the rest,
which is closer to how mcelog should behave, mcelog will usually try to offline random
spots within a hugepage, not offline a whole hugepage at once, which doesn't make
sense as you usually just get 1-2 stuck bits on your DIMM. The whole point of soft
offlining is as a preventive measure against large number of correctable memory
errors on a particular page.
I agree that if we give a range we should expect all the subpages to be offlined
although I don't know what value that would add.
>
> So I start to feel that we should revert the following patch which introduced
> the multi-page stepping.
>
> commit 20cb6cab52a21b46e3c0dc7bd23f004f810fb421
> Author: Wanpeng Li <[email protected]>
> Date: Mon Sep 30 13:45:21 2013 -0700
>
> mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood
>
> In order to suppress the printk flood, we can use ratelimit mechanism, or
> just s/pr_info/pr_debug/ might be ok.
I'd rather keep the printouts, it's not really that much of a hot path, if
they went on forever sure, but if you manually offline 512 pages you should expect
512 printouts. It's nice to see exactly which PFNs get offlined as well.
../Alex
>
> Thanks,
> Naoya Horiguchi
>
> > ---
> > mm/madvise.c | 12 ++++++++++--
> > 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 21261ff0466f..25bade36e9ca 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -625,18 +625,26 @@ static int madvise_inject_error(int behavior,
> > {
> > struct page *page;
> > struct zone *zone;
> > + unsigned int order;
> >
> > if (!capable(CAP_SYS_ADMIN))
> > return -EPERM;
> >
> > - for (; start < end; start += PAGE_SIZE <<
> > - compound_order(compound_head(page))) {
> > +
> > + for (; start < end; start += PAGE_SIZE << order) {
> > int ret;
> >
> > ret = get_user_pages_fast(start, 1, 0, &page);
> > if (ret != 1)
> > return ret;
> >
> > + /*
> > + * When soft offlining hugepages, after migrating the page
> > + * we dissolve it, therefore in the second loop "page" will
> > + * no longer be a compound page, and order will be 0.
> > + */
> > + order = compound_order(compound_head(page));
> > +
> > if (PageHWPoison(page)) {
> > put_page(page);
> > continue;
> > --
> > 2.14.1
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to [email protected]. For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"[email protected]"> [email protected] </a>