Hi,
I find a little problem in the memory_failure function in
mm/memory-failure.c . Please check it.
memory_failure: remove redundant check for the PG_HWPoison flag of
`hpage'.
Since we have check the PG_HWPoison flag by `PageHWPoison' before,
so the later check by `TestSetPageHWPoison' must return true, there
is no need to check again!
Signed-off-by: Wang Xiaoqiang <[email protected]>
---
mm/memory-failure.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 1cf7f29..7794fd8 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1115,7 +1115,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
lock_page(hpage);
if (PageHWPoison(hpage)) {
if ((hwpoison_filter(p) && TestClearPageHWPoison(p))
- || (p != hpage && TestSetPageHWPoison(hpage))) {
+ || p != hpage) {
atomic_long_sub(nr_pages, &num_poisoned_pages);
unlock_page(hpage);
return 0;
--
1.7.10.4
--
thx!
Wang Xiaoqiang
# CC:ed linux-mm
Hi Xiaoqiang,
On Wed, Jul 29, 2015 at 03:52:46PM +0800, Wang Xiaoqiang wrote:
> Hi,
>
> I find a little problem in the memory_failure function in
> mm/memory-failure.c . Please check it.
>
> memory_failure: remove redundant check for the PG_HWPoison flag of
> `hpage'.
>
> Since we have check the PG_HWPoison flag by `PageHWPoison' before,
> so the later check by `TestSetPageHWPoison' must return true, there
> is no need to check again!
I'm afraid that this TestSetPageHWPoison is not redundant, because this code
serializes the concurrent memory error events over the same hugetlb page
(, where 'p' indicates the 4kB error page and 'hpage' indicates the head page.)
When an error hits a hugetlb page, set_page_hwpoison_huge_page() sets
PageHWPoison flags over all subpages of the hugetlb page in the ascending
order of pfn. So if we don't have this TestSet, memory error handler can
run more than once on concurrent errors when the 1st memory error hits
(for example) the 100th subpage and the 2nd memory error hits (for example)
the 50th subpage.
Thanks,
Naoya Horiguchi
> Signed-off-by: Wang Xiaoqiang <[email protected]>
> ---
> mm/memory-failure.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 1cf7f29..7794fd8 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1115,7 +1115,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
> lock_page(hpage);
> if (PageHWPoison(hpage)) {
> if ((hwpoison_filter(p) && TestClearPageHWPoison(p))
> - || (p != hpage && TestSetPageHWPoison(hpage))) {
> + || p != hpage) {
> atomic_long_sub(nr_pages, &num_poisoned_pages);
> unlock_page(hpage);
> return 0;
> --
> 1.7.10.4
>
>
>
> --
> thx!
> Wang Xiaoqiang
> -
On Wed, 29 Jul 2015 09:17:32 +0000
Naoya Horiguchi <[email protected]> wrote:
> # CC:ed linux-mm
>
> Hi Xiaoqiang,
>
> On Wed, Jul 29, 2015 at 03:52:46PM +0800, Wang Xiaoqiang wrote:
> > Hi,
> >
> > I find a little problem in the memory_failure function in
> > mm/memory-failure.c . Please check it.
> >
> > memory_failure: remove redundant check for the PG_HWPoison flag of
> > `hpage'.
> >
> > Since we have check the PG_HWPoison flag by `PageHWPoison' before,
> > so the later check by `TestSetPageHWPoison' must return true, there
> > is no need to check again!
>
> I'm afraid that this TestSetPageHWPoison is not redundant, because
> this code serializes the concurrent memory error events over the same
> hugetlb page (, where 'p' indicates the 4kB error page and 'hpage'
> indicates the head page.)
>
> When an error hits a hugetlb page, set_page_hwpoison_huge_page() sets
> PageHWPoison flags over all subpages of the hugetlb page in the
> ascending order of pfn. So if we don't have this TestSet, memory
> error handler can run more than once on concurrent errors when the
> 1st memory error hits (for example) the 100th subpage and the 2nd
> memory error hits (for example) the 50th subpage.
In your example, the 100th subage enter the memory
error handler firstly, and then it uses the
set_page_hwpoison_huge_page to set all subpages
with PG_HWPoison flag, the 50th page handler waits
for grab the lock_page(hpage) now.
When the 100th page handler unlock the 'hpage',
the 50th grab it, and now the 'hapge' has been
set with PG_HWPosison. So PageHWPoison micro
will return true, and the following code will
be executed:
if (PageHWPoison(hpage)) {
if ((hwpoison_filter(p) && TestClearPageHWPoison(p))
|| (p != hpage && TestSetPageHWPoison(hpage))) {
atomic_long_sub(nr_pages, &num_poisoned_pages);
unlock_page(hpage);
return 0;
}
}
Now 'p' is 50th subpage, it doesn't equal the
'hpage' obviously, so if we don't have TestSetPageHWPoison
here, it still will ignore the 50th error.
Why the memory error handler can run more than once?
Hope to receive from you!
thx,
Wang Xiaoqiang
>
> Thanks,
> Naoya Horiguchi
>
> > Signed-off-by: Wang Xiaoqiang <[email protected]>
> > ---
> > mm/memory-failure.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index 1cf7f29..7794fd8 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1115,7 +1115,7 @@ int memory_failure(unsigned long pfn, int
> > trapno, int flags) lock_page(hpage);
> > if (PageHWPoison(hpage)) {
> > if ((hwpoison_filter(p) &&
> > TestClearPageHWPoison(p))
> > - || (p != hpage &&
> > TestSetPageHWPoison(hpage))) {
> > + || p != hpage) {
> > atomic_long_sub(nr_pages,
> > &num_poisoned_pages); unlock_page(hpage);
> > return 0;
> > --
> > 1.7.10.4
> >
> >
> >
> > --
> > thx!
> > Wang Xiaoqiang
> >
On Thu, Jul 30, 2015 at 10:52:46AM +0800, Wang Xiaoqiang wrote:
...
> In your example, the 100th subage enter the memory
> error handler firstly, and then it uses the
> set_page_hwpoison_huge_page to set all subpages
> with PG_HWPoison flag, the 50th page handler waits
> for grab the lock_page(hpage) now.
>
> When the 100th page handler unlock the 'hpage',
> the 50th grab it, and now the 'hapge' has been
> set with PG_HWPosison. So PageHWPoison micro
> will return true, and the following code will
> be executed:
>
> if (PageHWPoison(hpage)) {
> if ((hwpoison_filter(p) && TestClearPageHWPoison(p))
> || (p != hpage && TestSetPageHWPoison(hpage))) {
> atomic_long_sub(nr_pages, &num_poisoned_pages);
> unlock_page(hpage);
> return 0;
> }
> }
>
> Now 'p' is 50th subpage, it doesn't equal the
> 'hpage' obviously, so if we don't have TestSetPageHWPoison
> here, it still will ignore the 50th error.
Ah, you're right, thanks for the explanation, Xiaoqiang!
Acked-by: Naoya Horiguchi <[email protected]>-