2011-06-05 13:43:21

by Al Viro

[permalink] [raw]
Subject: ENOSPC returned by handle_mm_fault()

When alloc_huge_page() runs afoul of quota, it returns ERR_PTR(-ENOSPC).
Callers do not expect that - hugetlb_cow() returns ENOSPC if it gets that
and so does hugetlb_no_page(). Eventually the thing propagates back to
hugetlb_fault() and is returned by it.

Callers of hugetlb_fault() clearly expect a bitmap of VM_... and
not something from errno.h: one place is
ret = hugetlb_fault(mm, vma, vaddr,
(flags & FOLL_WRITE) ? FAULT_FLAG_WRITE : 0);
spin_lock(&mm->page_table_lock);
if (!(ret & VM_FAULT_ERROR))
continue;
and another is handle_mm_fault(), which ends up returning ENOSPC and *its*
callers are definitely not ready to deal with that.

ENOSPC is 28, i.e. VM_FAULT_MAJOR | VM_FAULT_WRITE | VM_FAULT_HWPOISON;
it's also theoretically possible to get ENOMEM if region_chg() ends up
hitting
nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
if (!nrg)
return -ENOMEM;
region_chg() <- vma_needs_reservation() <- alloc_huge_page() and from that
point as with ENOSPC. ENOMEM is 12, i.e. VM_FAULT_MAJOR | VM_FAULT_WRITE...

Am I right assuming that we want VM_FAULT_OOM in both cases?


2011-06-05 19:16:20

by Hugh Dickins

[permalink] [raw]
Subject: Re: ENOSPC returned by handle_mm_fault()

On Sun, 5 Jun 2011, Al Viro wrote:
> When alloc_huge_page() runs afoul of quota, it returns ERR_PTR(-ENOSPC).
> Callers do not expect that - hugetlb_cow() returns ENOSPC if it gets that
> and so does hugetlb_no_page(). Eventually the thing propagates back to
> hugetlb_fault() and is returned by it.
>
> Callers of hugetlb_fault() clearly expect a bitmap of VM_... and
> not something from errno.h: one place is
> ret = hugetlb_fault(mm, vma, vaddr,
> (flags & FOLL_WRITE) ? FAULT_FLAG_WRITE : 0);
> spin_lock(&mm->page_table_lock);
> if (!(ret & VM_FAULT_ERROR))
> continue;
> and another is handle_mm_fault(), which ends up returning ENOSPC and *its*
> callers are definitely not ready to deal with that.
>
> ENOSPC is 28, i.e. VM_FAULT_MAJOR | VM_FAULT_WRITE | VM_FAULT_HWPOISON;
> it's also theoretically possible to get ENOMEM if region_chg() ends up
> hitting
> nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
> if (!nrg)
> return -ENOMEM;
> region_chg() <- vma_needs_reservation() <- alloc_huge_page() and from that
> point as with ENOSPC. ENOMEM is 12, i.e. VM_FAULT_MAJOR | VM_FAULT_WRITE...

Good find, news to me. Interesting uses of -PTR_ERR()!
Looks like we'd better not have more than 12 VM_FAULT_ flags.

>
> Am I right assuming that we want VM_FAULT_OOM in both cases?

No, where hugetlb_get_quota() fails it should be VM_FAULT_SIGBUS:
there's no excuse to go on an OOM-killing spree just because hugetlb
quota is exhausted.

VM_FAULT_OOM is appropriate where vma_needs_reservation() fails,
because region_chg() couldn't kmalloc a structure, as you point out.

(Though that doesn't matter much, since the only way the kmalloc can
fail is when this task is already selected for OOM-kill - I think.)

Hugh

2011-06-05 19:50:30

by Al Viro

[permalink] [raw]
Subject: Re: ENOSPC returned by handle_mm_fault()

On Sun, Jun 05, 2011 at 12:16:08PM -0700, Hugh Dickins wrote:

> Good find, news to me. Interesting uses of -PTR_ERR()!

*snerk*

I've run into a bug where ->open() returned -PTR_ERR(...) on one of the failure
exits and went grepping. Caught so far:
* l2tp_debugfs - originally found bug
* xfs mknod() returning the error with wrong sign if xfs_get_acl()
fails
* jfs lmLogOpen() - positive error value returned (and propagated
all way be to userland if we'd been doing remount) if block device can't be
opened
* sunrpc - two bugs of the same kind
* this one, where the *sign* is right, but mixing E.. with VM_FAULT_..
is not.

Bugs are like mushrooms - found one, look around for more...

> Looks like we'd better not have more than 12 VM_FAULT_ flags.

> > Am I right assuming that we want VM_FAULT_OOM in both cases?
>
> No, where hugetlb_get_quota() fails it should be VM_FAULT_SIGBUS:
> there's no excuse to go on an OOM-killing spree just because hugetlb
> quota is exhausted.

Good point...

> VM_FAULT_OOM is appropriate where vma_needs_reservation() fails,
> because region_chg() couldn't kmalloc a structure, as you point out.
>
> (Though that doesn't matter much, since the only way the kmalloc can
> fail is when this task is already selected for OOM-kill - I think.)

You mean, something like the diff below?

Signed-off-by: Al Viro <[email protected]>
---
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f33bb31..3de23f0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -125,7 +125,7 @@ static long region_chg(struct list_head *head, long f, long t)
if (&rg->link == head || t < rg->from) {
nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
if (!nrg)
- return -ENOMEM;
+ return -VM_FAULT_OOM;
nrg->from = f;
nrg->to = f;
INIT_LIST_HEAD(&nrg->link);
@@ -1036,7 +1036,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
return ERR_PTR(chg);
if (chg)
if (hugetlb_get_quota(inode->i_mapping, chg))
- return ERR_PTR(-ENOSPC);
+ return ERR_PTR(-VM_FAULT_SIGBUS);

spin_lock(&hugetlb_lock);
page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);

2011-06-05 20:49:08

by Hugh Dickins

[permalink] [raw]
Subject: Re: ENOSPC returned by handle_mm_fault()

On Sun, 5 Jun 2011, Al Viro wrote:
> On Sun, Jun 05, 2011 at 12:16:08PM -0700, Hugh Dickins wrote:
>
> > Good find, news to me. Interesting uses of -PTR_ERR()!
>
> You mean, something like the diff below?

Second hunk yes, but first hunk no: there's at least one other place
(hugetlb_reserve_pages) which calls region_chg(), and expects a
conventional -errno return from it; and even if there weren't,
I'd rather not spread these unconventional return values any deeper.

Something more like the one at the bottom I think: okay, it's slightly
tacky to assume the nature of the failure from vma_needs_reservation(),
but we already have two places which do make that assumption.

>
> Signed-off-by: Al Viro <[email protected]>
> ---
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f33bb31..3de23f0 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -125,7 +125,7 @@ static long region_chg(struct list_head *head, long f, long t)
> if (&rg->link == head || t < rg->from) {
> nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
> if (!nrg)
> - return -ENOMEM;
> + return -VM_FAULT_OOM;
> nrg->from = f;
> nrg->to = f;
> INIT_LIST_HEAD(&nrg->link);
> @@ -1036,7 +1036,7 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
> return ERR_PTR(chg);
> if (chg)
> if (hugetlb_get_quota(inode->i_mapping, chg))
> - return ERR_PTR(-ENOSPC);
> + return ERR_PTR(-VM_FAULT_SIGBUS);
>
> spin_lock(&hugetlb_lock);
> page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);

Signed-off-by: Hugh Dickins <[email protected]
---

mm/hugetlb.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- 3.0-rc1/mm/hugetlb.c 2011-05-29 18:42:37.425882575 -0700
+++ linux/mm/hugetlb.c 2011-06-05 13:33:22.795341004 -0700
@@ -1033,10 +1033,10 @@ static struct page *alloc_huge_page(stru
*/
chg = vma_needs_reservation(h, vma, addr);
if (chg < 0)
- return ERR_PTR(chg);
+ return ERR_PTR(-VM_FAULT_OOM);
if (chg)
if (hugetlb_get_quota(inode->i_mapping, chg))
- return ERR_PTR(-ENOSPC);
+ return ERR_PTR(-VM_FAULT_SIGBUS);

spin_lock(&hugetlb_lock);
page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);

2011-06-05 22:13:48

by Al Viro

[permalink] [raw]
Subject: Re: ENOSPC returned by handle_mm_fault()

On Sun, Jun 05, 2011 at 01:48:55PM -0700, Hugh Dickins wrote:
> On Sun, 5 Jun 2011, Al Viro wrote:
> > On Sun, Jun 05, 2011 at 12:16:08PM -0700, Hugh Dickins wrote:
> >
> > > Good find, news to me. Interesting uses of -PTR_ERR()!
> >
> > You mean, something like the diff below?
>
> Second hunk yes, but first hunk no: there's at least one other place
> (hugetlb_reserve_pages) which calls region_chg(), and expects a
> conventional -errno return from it; and even if there weren't,
> I'd rather not spread these unconventional return values any deeper.

Umm... FWIW, callers of hugetlb_reserve_pages() only check if it's 0;
exact value is lost. But yes, I agree that your variant makes more
sense - they might start caring at some point.

> Signed-off-by: Hugh Dickins <[email protected]
Acked-by: Al Viro <[email protected]>

> mm/hugetlb.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> --- 3.0-rc1/mm/hugetlb.c 2011-05-29 18:42:37.425882575 -0700
> +++ linux/mm/hugetlb.c 2011-06-05 13:33:22.795341004 -0700
> @@ -1033,10 +1033,10 @@ static struct page *alloc_huge_page(stru
> */
> chg = vma_needs_reservation(h, vma, addr);
> if (chg < 0)
> - return ERR_PTR(chg);
> + return ERR_PTR(-VM_FAULT_OOM);
> if (chg)
> if (hugetlb_get_quota(inode->i_mapping, chg))
> - return ERR_PTR(-ENOSPC);
> + return ERR_PTR(-VM_FAULT_SIGBUS);
>
> spin_lock(&hugetlb_lock);
> page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);

2011-06-06 05:03:32

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH] mm: fix ENOSPC returned by handle_mm_fault()

Al Viro observes that in the hugetlb case, handle_mm_fault() may return
a value of the kind ENOSPC when its caller is expecting a value of the
kind VM_FAULT_SIGBUS: fix alloc_huge_page()'s failure returns.

Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Al Viro <[email protected]>
Cc: [email protected]
---

mm/hugetlb.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- 3.0-rc1/mm/hugetlb.c 2011-05-29 18:42:37.425882575 -0700
+++ linux/mm/hugetlb.c 2011-06-05 13:33:22.795341004 -0700
@@ -1033,10 +1033,10 @@ static struct page *alloc_huge_page(stru
*/
chg = vma_needs_reservation(h, vma, addr);
if (chg < 0)
- return ERR_PTR(chg);
+ return ERR_PTR(-VM_FAULT_OOM);
if (chg)
if (hugetlb_get_quota(inode->i_mapping, chg))
- return ERR_PTR(-ENOSPC);
+ return ERR_PTR(-VM_FAULT_SIGBUS);

spin_lock(&hugetlb_lock);
page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve);

2011-06-07 09:57:55

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] mm: fix ENOSPC returned by handle_mm_fault()

On Sun, Jun 05, 2011 at 10:03:13PM -0700, Hugh Dickins wrote:
> Al Viro observes that in the hugetlb case, handle_mm_fault() may return
> a value of the kind ENOSPC when its caller is expecting a value of the
> kind VM_FAULT_SIGBUS: fix alloc_huge_page()'s failure returns.
>
> Signed-off-by: Hugh Dickins <[email protected]>
> Acked-by: Al Viro <[email protected]>
> Cc: [email protected]

Nicely spotted!

Acked-by: Mel Gorman <[email protected]>

--
Mel Gorman
SUSE Labs