Get_hwpoison_page() must recheck relation between head and tail pages.
Signed-off-by: Konstantin Khlebnikov <[email protected]>
---
mm/memory-failure.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 78f5f2641b91..ca5acee53b7a 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -888,7 +888,15 @@ int get_hwpoison_page(struct page *page)
}
}
- return get_page_unless_zero(head);
+ if (get_page_unless_zero(head)) {
+ if (head == compound_head(page))
+ return 1;
+
+ pr_info("MCE: %#lx cannot catch tail\n", page_to_pfn(page));
+ put_page(head);
+ }
+
+ return 0;
}
EXPORT_SYMBOL_GPL(get_hwpoison_page);
# CCed Andrew,
On Mon, Apr 18, 2016 at 02:43:45PM +0300, Konstantin Khlebnikov wrote:
> Get_hwpoison_page() must recheck relation between head and tail pages.
>
> Signed-off-by: Konstantin Khlebnikov <[email protected]>
Looks good to me. Without this recheck, the race causes kernel to pin
an irrelevant page, and finally makes kernel crash for refcount mismcach...
Acked-by: Naoya Horiguchi <[email protected]>
> ---
> mm/memory-failure.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 78f5f2641b91..ca5acee53b7a 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -888,7 +888,15 @@ int get_hwpoison_page(struct page *page)
> }
> }
>
> - return get_page_unless_zero(head);
> + if (get_page_unless_zero(head)) {
> + if (head == compound_head(page))
> + return 1;
> +
> + pr_info("MCE: %#lx cannot catch tail\n", page_to_pfn(page));
Recently Chen Yucong replaced the label "MCE:" with "Memory failure:",
but the resolution is trivial, I think.
Thanks,
Naoya Horiguchi
> + put_page(head);
> + }
> +
> + return 0;
> }
> EXPORT_SYMBOL_GPL(get_hwpoison_page);
>
>
On Tue, Apr 19, 2016 at 2:15 AM, Naoya Horiguchi
<[email protected]> wrote:
> # CCed Andrew,
>
> On Mon, Apr 18, 2016 at 02:43:45PM +0300, Konstantin Khlebnikov wrote:
>> Get_hwpoison_page() must recheck relation between head and tail pages.
>>
>> Signed-off-by: Konstantin Khlebnikov <[email protected]>
>
> Looks good to me. Without this recheck, the race causes kernel to pin
> an irrelevant page, and finally makes kernel crash for refcount mismcach...
Yep. I seen that a lot. Unfortunately that was in 3.18 branch and
it'll took several months to verify this fix.
This code and page reference counting overall have changed
significantly since then, so probably here is more bugs.
For example, I'm not sure about races with atomic set for page
reference counting,
I've found and removed couple in mellanox driver but there're more in
mm and net.
>
> Acked-by: Naoya Horiguchi <[email protected]>
>
>> ---
>> mm/memory-failure.c | 10 +++++++++-
>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 78f5f2641b91..ca5acee53b7a 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -888,7 +888,15 @@ int get_hwpoison_page(struct page *page)
>> }
>> }
>>
>> - return get_page_unless_zero(head);
>> + if (get_page_unless_zero(head)) {
>> + if (head == compound_head(page))
>> + return 1;
>> +
>> + pr_info("MCE: %#lx cannot catch tail\n", page_to_pfn(page));
>
> Recently Chen Yucong replaced the label "MCE:" with "Memory failure:",
> but the resolution is trivial, I think.
>
> Thanks,
> Naoya Horiguchi
>
>> + put_page(head);
>> + }
>> +
>> + return 0;
>> }
>> EXPORT_SYMBOL_GPL(get_hwpoison_page);
>>
>>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a hrefmailto:"[email protected]"> [email protected] </a>
On Mon, 18 Apr 2016 23:15:52 +0000 Naoya Horiguchi <[email protected]> wrote:
> # CCed Andrew,
Thanks.
> On Mon, Apr 18, 2016 at 02:43:45PM +0300, Konstantin Khlebnikov wrote:
> > Get_hwpoison_page() must recheck relation between head and tail pages.
> >
> > Signed-off-by: Konstantin Khlebnikov <[email protected]>
>
> Looks good to me. Without this recheck, the race causes kernel to pin
> an irrelevant page, and finally makes kernel crash for refcount mismcach...
Thanks. I'll add the above (important!) info to the changelog and
cc:stable.
> > - return get_page_unless_zero(head);
> > + if (get_page_unless_zero(head)) {
> > + if (head == compound_head(page))
> > + return 1;
> > +
> > + pr_info("MCE: %#lx cannot catch tail\n", page_to_pfn(page));
>
> Recently Chen Yucong replaced the label "MCE:" with "Memory failure:",
> but the resolution is trivial, I think.
Yup, that patch is in my (large) backlog. Away at conferences for
seven days, receiving 100 actionable emails per day. Give me a few
days ;)