Hi,
This small patch series fixes problems on hugepage coredump,
where we cannot include any data on hugepages into coredump.
See individual patches for more details.
Thanks,
Naoya Horiguchi
With applying the previous patch "hugetlbfs: stop setting VM_DONTDUMP in
initializing vma(VM_HUGETLB)" to reenable hugepage coredump, if a memory
error happens on a hugepage and the affected processes try to access
the error hugepage, we hit VM_BUG_ON(atomic_read(&page->_count) <= 0)
in get_page().
The reason for this bug is that coredump-related code doesn't recognise
"hugepage hwpoison entry" with which a pmd entry is replaced when a memory
error occurs on a hugepage.
In other words, physical address information is stored in different bit layout
between hugepage hwpoison entry and pmd entry, so follow_hugetlb_page()
which is called in get_dump_page() returns a wrong page from a given address.
We need to filter out only hwpoison hugepages to have data on healthy
hugepages in coredump. So this patch makes follow_hugetlb_page() avoid
trying to get page when a pmd is in swap entry like format.
Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/hugetlb.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git v3.9-rc3.orig/mm/hugetlb.c v3.9-rc3/mm/hugetlb.c
index 0d1705b..8462e2c 100644
--- v3.9-rc3.orig/mm/hugetlb.c
+++ v3.9-rc3/mm/hugetlb.c
@@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
* first, for the page indexing below to work.
*/
pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
- absent = !pte || huge_pte_none(huge_ptep_get(pte));
+ absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
+ is_swap_pte(huge_ptep_get(pte));
/*
* When coredumping, it suits get_dump_page if we just return
--
1.7.11.7
Currently we fail to include any data on hugepages into coredump,
because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
mm->reserved_vm counter". This looks to me a serious regression,
so let's fix it.
Signed-off-by: Naoya Horiguchi <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
---
fs/hugetlbfs/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git v3.9-rc3.orig/fs/hugetlbfs/inode.c v3.9-rc3/fs/hugetlbfs/inode.c
index 84e3d85..523464e 100644
--- v3.9-rc3.orig/fs/hugetlbfs/inode.c
+++ v3.9-rc3/fs/hugetlbfs/inode.c
@@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
* way when do_mmap_pgoff unwinds (may be important on powerpc
* and ia64).
*/
- vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
+ vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
vma->vm_ops = &hugetlb_vm_ops;
if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
--
1.7.11.7
On Thu, Mar 28, 2013 at 11:42:37AM -0400, Naoya Horiguchi wrote:
> Currently we fail to include any data on hugepages into coredump,
> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> mm->reserved_vm counter". This looks to me a serious regression,
> so let's fix it.
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> Cc: Konstantin Khlebnikov <[email protected]>
> ---
> fs/hugetlbfs/inode.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
<formletter>
This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read Documentation/stable_kernel_rules.txt
for how to do this properly.
</formletter>
On Thu, Mar 28, 2013 at 08:51:09AM -0700, Greg KH wrote:
> On Thu, Mar 28, 2013 at 11:42:37AM -0400, Naoya Horiguchi wrote:
> > Currently we fail to include any data on hugepages into coredump,
> > because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> > introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> > mm->reserved_vm counter". This looks to me a serious regression,
> > so let's fix it.
> >
> > Signed-off-by: Naoya Horiguchi <[email protected]>
> > Cc: Konstantin Khlebnikov <[email protected]>
> > ---
> > fs/hugetlbfs/inode.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
>
>
> <formletter>
>
> This is not the correct way to submit patches for inclusion in the
> stable kernel tree. Please read Documentation/stable_kernel_rules.txt
> for how to do this properly.
>
> </formletter>
I guess you mean this patch violates one/both of these rules:
- It must fix a problem that causes a build error (but not for things
marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
security issue, or some "oh, that's not good" issue. In short, something
critical.
- It or an equivalent fix must already exist in Linus' tree (upstream).
I'm not sure if the problem "we can't get any hugepage in coredump"
is considered as 'some "oh, that's not good" issue'.
But yes, it's not a critical one.
If you mean I violated the second rule, sorry, I'll get it into upstream first.
Thanks,
Naoya
Naoya Horiguchi wrote:
> Currently we fail to include any data on hugepages into coredump,
> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> mm->reserved_vm counter". This looks to me a serious regression,
> so let's fix it.
That was introduced in my patch? Really?
Here was VM_RESERVED and it had the same effect as VM_DONTDUMP. At least I thought so.
>
> Signed-off-by: Naoya Horiguchi<[email protected]>
> Cc: Konstantin Khlebnikov<[email protected]>
> ---
> fs/hugetlbfs/inode.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git v3.9-rc3.orig/fs/hugetlbfs/inode.c v3.9-rc3/fs/hugetlbfs/inode.c
> index 84e3d85..523464e 100644
> --- v3.9-rc3.orig/fs/hugetlbfs/inode.c
> +++ v3.9-rc3/fs/hugetlbfs/inode.c
> @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
> * way when do_mmap_pgoff unwinds (may be important on powerpc
> * and ia64).
> */
> - vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
> + vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
> vma->vm_ops =&hugetlb_vm_ops;
>
> if (vma->vm_pgoff& (~huge_page_mask(h)>> PAGE_SHIFT))
On Thu, Mar 28, 2013 at 09:03:16PM +0400, Konstantin Khlebnikov wrote:
> Naoya Horiguchi wrote:
> >Currently we fail to include any data on hugepages into coredump,
> >because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> >introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> >mm->reserved_vm counter". This looks to me a serious regression,
> >so let's fix it.
>
> That was introduced in my patch? Really?
> Here was VM_RESERVED and it had the same effect as VM_DONTDUMP. At least I thought so.
vma_dump_size() does like this (the diff is the one in 314e51b98):
static unsigned long vma_dump_size(struct vm_area_struct *vma,
unsigned long mm_flags)
{
#define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type))
/* always dump the vdso and vsyscall sections */
if (always_dump_vma(vma))
goto whole;
if (vma->vm_flags & VM_DONTDUMP)
return 0;
/* Hugetlb memory check */
if (vma->vm_flags & VM_HUGETLB) {
if ((vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_SHARED))
goto whole;
if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
goto whole;
}
/* Do not dump I/O mapped devices or special mappings */
- if (vma->vm_flags & (VM_IO | VM_RESERVED))
+ if (vma->vm_flags & VM_IO)
return 0;
We have hugetlb memory check after VM_DONTDUMP check, so the following
changed the behavior.
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
* way when do_mmap_pgoff unwinds (may be important on powerpc
* and ia64).
*/
- vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
+ vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
vma->vm_ops = &hugetlb_vm_ops;
if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
I think we don't have to set VM_DONTDUMP on hugetlbfs's vma.
Thanks,
Naoya
On Thu, Mar 28, 2013 at 12:04:29PM -0400, Naoya Horiguchi wrote:
[...]
> I guess you mean this patch violates one/both of these rules:
>
> - It must fix a problem that causes a build error (but not for things
> marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
> security issue, or some "oh, that's not good" issue. In short, something
> critical.
> - It or an equivalent fix must already exist in Linus' tree (upstream).
>
> I'm not sure if the problem "we can't get any hugepage in coredump"
> is considered as 'some "oh, that's not good" issue'.
> But yes, it's not a critical one.
> If you mean I violated the second rule, sorry, I'll get it into upstream first.
The second rule is the clear one. If you are submitting a patch to
a subsystem maintainer and you want it to go into stable branches as
well, you must put 'Cc: [email protected]' in the commit message,
not just the mail header.
Ben.
--
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
- Albert Camus
On Thu, Mar 28, 2013 at 07:39:01PM +0000, Ben Hutchings wrote:
> On Thu, Mar 28, 2013 at 12:04:29PM -0400, Naoya Horiguchi wrote:
> [...]
> > I guess you mean this patch violates one/both of these rules:
> >
> > - It must fix a problem that causes a build error (but not for things
> > marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
> > security issue, or some "oh, that's not good" issue. In short, something
> > critical.
> > - It or an equivalent fix must already exist in Linus' tree (upstream).
> >
> > I'm not sure if the problem "we can't get any hugepage in coredump"
> > is considered as 'some "oh, that's not good" issue'.
> > But yes, it's not a critical one.
> > If you mean I violated the second rule, sorry, I'll get it into upstream first.
>
> The second rule is the clear one. If you are submitting a patch to
> a subsystem maintainer and you want it to go into stable branches as
> well, you must put 'Cc: [email protected]' in the commit message,
> not just the mail header.
Got it. Thank you.
Naoya
Naoya Horiguchi wrote:
> On Thu, Mar 28, 2013 at 09:03:16PM +0400, Konstantin Khlebnikov wrote:
>> Naoya Horiguchi wrote:
>>> Currently we fail to include any data on hugepages into coredump,
>>> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
>>> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
>>> mm->reserved_vm counter". This looks to me a serious regression,
>>> so let's fix it.
>>
>> That was introduced in my patch? Really?
>> Here was VM_RESERVED and it had the same effect as VM_DONTDUMP. At least I thought so.
>
> vma_dump_size() does like this (the diff is the one in 314e51b98):
>
> static unsigned long vma_dump_size(struct vm_area_struct *vma,
> unsigned long mm_flags)
> {
> #define FILTER(type) (mm_flags& (1UL<< MMF_DUMP_##type))
>
> /* always dump the vdso and vsyscall sections */
> if (always_dump_vma(vma))
> goto whole;
>
> if (vma->vm_flags& VM_DONTDUMP)
> return 0;
>
> /* Hugetlb memory check */
> if (vma->vm_flags& VM_HUGETLB) {
> if ((vma->vm_flags& VM_SHARED)&& FILTER(HUGETLB_SHARED))
> goto whole;
> if (!(vma->vm_flags& VM_SHARED)&& FILTER(HUGETLB_PRIVATE))
> goto whole;
> }
>
> /* Do not dump I/O mapped devices or special mappings */
> - if (vma->vm_flags& (VM_IO | VM_RESERVED))
> + if (vma->vm_flags& VM_IO)
> return 0;
>
> We have hugetlb memory check after VM_DONTDUMP check, so the following
> changed the behavior.
Ok, I missed this in my patch.
>
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
> * way when do_mmap_pgoff unwinds (may be important on powerpc
> * and ia64).
> */
> - vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
> + vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
> vma->vm_ops =&hugetlb_vm_ops;
>
> if (vma->vm_pgoff& (~huge_page_mask(h)>> PAGE_SHIFT))
>
> I think we don't have to set VM_DONTDUMP on hugetlbfs's vma.
Acked-by: Konstantin Khlebnikov <[email protected]>
>
> Thanks,
> Naoya
Konstantin Khlebnikov wrote:
> Naoya Horiguchi wrote:
>> On Thu, Mar 28, 2013 at 09:03:16PM +0400, Konstantin Khlebnikov wrote:
>>> Naoya Horiguchi wrote:
>>>> Currently we fail to include any data on hugepages into coredump,
>>>> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
>>>> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
>>>> mm->reserved_vm counter". This looks to me a serious regression,
>>>> so let's fix it.
>>>
>>> That was introduced in my patch? Really?
>>> Here was VM_RESERVED and it had the same effect as VM_DONTDUMP. At least I thought so.
>>
>> vma_dump_size() does like this (the diff is the one in 314e51b98):
>>
>> static unsigned long vma_dump_size(struct vm_area_struct *vma,
>> unsigned long mm_flags)
>> {
>> #define FILTER(type) (mm_flags& (1UL<< MMF_DUMP_##type))
>>
>> /* always dump the vdso and vsyscall sections */
>> if (always_dump_vma(vma))
>> goto whole;
>>
>> if (vma->vm_flags& VM_DONTDUMP)
>> return 0;
>>
>> /* Hugetlb memory check */
>> if (vma->vm_flags& VM_HUGETLB) {
>> if ((vma->vm_flags& VM_SHARED)&& FILTER(HUGETLB_SHARED))
>> goto whole;
>> if (!(vma->vm_flags& VM_SHARED)&& FILTER(HUGETLB_PRIVATE))
>> goto whole;
>> }
>>
>> /* Do not dump I/O mapped devices or special mappings */
>> - if (vma->vm_flags& (VM_IO | VM_RESERVED))
>> + if (vma->vm_flags& VM_IO)
>> return 0;
>>
>> We have hugetlb memory check after VM_DONTDUMP check, so the following
>> changed the behavior.
>
> Ok, I missed this in my patch.
>
>>
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>> * way when do_mmap_pgoff unwinds (may be important on powerpc
>> * and ia64).
>> */
>> - vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
>> + vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
>> vma->vm_ops =&hugetlb_vm_ops;
>>
>> if (vma->vm_pgoff& (~huge_page_mask(h)>> PAGE_SHIFT))
>>
>> I think we don't have to set VM_DONTDUMP on hugetlbfs's vma.
>
> Acked-by: Konstantin Khlebnikov<[email protected]>
hugetlb coredump filter also should be fixed in this way:
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1154,6 +1154,7 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
goto whole;
if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
goto whole;
+ return 0;
}
/* Do not dump I/O mapped devices or special mappings */
>
>>
>> Thanks,
>> Naoya
>
On Thu 28-03-13 11:42:37, Naoya Horiguchi wrote:
> Currently we fail to include any data on hugepages into coredump,
> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> mm->reserved_vm counter". This looks to me a serious regression,
> so let's fix it.
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> Cc: Konstantin Khlebnikov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
> ---
> fs/hugetlbfs/inode.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git v3.9-rc3.orig/fs/hugetlbfs/inode.c v3.9-rc3/fs/hugetlbfs/inode.c
> index 84e3d85..523464e 100644
> --- v3.9-rc3.orig/fs/hugetlbfs/inode.c
> +++ v3.9-rc3/fs/hugetlbfs/inode.c
> @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
> * way when do_mmap_pgoff unwinds (may be important on powerpc
> * and ia64).
> */
> - vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
> + vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
> vma->vm_ops = &hugetlb_vm_ops;
>
> if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
> --
> 1.7.11.7
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Michal Hocko
SUSE Labs
On Thu 28-03-13 11:42:38, Naoya Horiguchi wrote:
[...]
> @@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> * first, for the page indexing below to work.
> */
> pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
> - absent = !pte || huge_pte_none(huge_ptep_get(pte));
> + absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> + is_swap_pte(huge_ptep_get(pte));
is_swap_pte doesn't seem right. Shouldn't you use is_hugetlb_entry_hwpoisoned
instead?
--
Michal Hocko
SUSE Labs
On Fri, Mar 29, 2013 at 04:09:36PM +0400, Konstantin Khlebnikov wrote:
...
> hugetlb coredump filter also should be fixed in this way:
>
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1154,6 +1154,7 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
> goto whole;
> if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
> goto whole;
> + return 0;
> }
OK, I'll add it.
Thanks,
Naoya
Hi,
On Fri, Mar 29, 2013 at 02:57:30PM +0100, Michal Hocko wrote:
> On Thu 28-03-13 11:42:38, Naoya Horiguchi wrote:
> [...]
> > @@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> > * first, for the page indexing below to work.
> > */
> > pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
> > - absent = !pte || huge_pte_none(huge_ptep_get(pte));
> > + absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> > + is_swap_pte(huge_ptep_get(pte));
>
> is_swap_pte doesn't seem right. Shouldn't you use is_hugetlb_entry_hwpoisoned
> instead?
I tested only hwpoisoned hugepage, but the same can happen for hugepages
under migration. So I intended to filter out all types of swap entries.
The local variable 'absent' seems to mean whether data on the address
is immediately available, so swap type entry isn't included in it.
Thanks,
Naoya
On Fri 29-03-13 13:23:38, Naoya Horiguchi wrote:
> Hi,
>
> On Fri, Mar 29, 2013 at 02:57:30PM +0100, Michal Hocko wrote:
> > On Thu 28-03-13 11:42:38, Naoya Horiguchi wrote:
> > [...]
> > > @@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> > > * first, for the page indexing below to work.
> > > */
> > > pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
> > > - absent = !pte || huge_pte_none(huge_ptep_get(pte));
> > > + absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> > > + is_swap_pte(huge_ptep_get(pte));
> >
> > is_swap_pte doesn't seem right. Shouldn't you use is_hugetlb_entry_hwpoisoned
> > instead?
>
> I tested only hwpoisoned hugepage, but the same can happen for hugepages
> under migration. So I intended to filter out all types of swap entries.
> The local variable 'absent' seems to mean whether data on the address
> is immediately available, so swap type entry isn't included in it.
OK, I didn't consider huge pages under migration and I was merely worried
that is_hugetlb_entry_hwpoisoned sounds more appropriate than
is_swap_pte.
Could you add a comment which would clarify that is_swap_pte covers both
migration and hwpoison pages, please? Something like:
/*
* is_swap_pte test covers both is_hugetlb_entry_hwpoisoned
* and hugepages under migration in which case
* hugetlb_fault waits for the migration and bails out
* properly for HWPosined pages.
*/
absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
is_swap_pte(huge_ptep_get(pte));
Other than that feel free to add
Reviewed-by: Michal Hocko <[email protected]>
Thanks.
--
Michal Hocko
SUSE Labs
On Tue, Apr 02, 2013 at 11:24:41AM +0200, Michal Hocko wrote:
> On Fri 29-03-13 13:23:38, Naoya Horiguchi wrote:
> > Hi,
> >
> > On Fri, Mar 29, 2013 at 02:57:30PM +0100, Michal Hocko wrote:
> > > On Thu 28-03-13 11:42:38, Naoya Horiguchi wrote:
> > > [...]
> > > > @@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> > > > * first, for the page indexing below to work.
> > > > */
> > > > pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
> > > > - absent = !pte || huge_pte_none(huge_ptep_get(pte));
> > > > + absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> > > > + is_swap_pte(huge_ptep_get(pte));
> > >
> > > is_swap_pte doesn't seem right. Shouldn't you use is_hugetlb_entry_hwpoisoned
> > > instead?
> >
> > I tested only hwpoisoned hugepage, but the same can happen for hugepages
> > under migration. So I intended to filter out all types of swap entries.
> > The local variable 'absent' seems to mean whether data on the address
> > is immediately available, so swap type entry isn't included in it.
>
> OK, I didn't consider huge pages under migration and I was merely worried
> that is_hugetlb_entry_hwpoisoned sounds more appropriate than
> is_swap_pte.
>
> Could you add a comment which would clarify that is_swap_pte covers both
> migration and hwpoison pages, please? Something like:
>
> /*
> * is_swap_pte test covers both is_hugetlb_entry_hwpoisoned
> * and hugepages under migration in which case
> * hugetlb_fault waits for the migration and bails out
> * properly for HWPosined pages.
> */
> absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> is_swap_pte(huge_ptep_get(pte));
OK, I'll add this.
> Other than that feel free to add
> Reviewed-by: Michal Hocko <[email protected]>
Thank you!
Naoya