2008-10-21 15:29:31

by J. R. Okajima

[permalink] [raw]
Subject: __vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread


Hello all,

When sysctl_overcommit_memory is set OVERCOMMIT_NEVER,
__vm_enough_memory() refers current->mm.

For example,
# exportfs -i -o ... localhost:/tmpfs
# mkdir /tmp/w
# mount -o ... localhost:/tmpfs /tmp/w
# yes > /tmp/w/fileA

In this case, NFSD (kernel thread) calls shmem_file_write() or
shmem_write_begin() and __vm_enough_memory() is called. But current->mm
is NULL and the kernel crashes.
If a user have to set OVERCOMMIT_NEVER, where should we fix?


Junjiro R. Okajima


2008-10-21 15:46:22

by Alan

[permalink] [raw]
Subject: Re: __vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread

> In this case, NFSD (kernel thread) calls shmem_file_write() or
> shmem_write_begin() and __vm_enough_memory() is called. But current->mm
> is NULL and the kernel crashes.
> If a user have to set OVERCOMMIT_NEVER, where should we fix?

Calling into the file system code assuming that current->mm is
NULL isn't safe and hasn't been for a very long time since someone added
the 3% hack.

The shmem case is actually a bit special so my thoughts are:

Make security_vm_enough_memory() WARN() if current->mm = NULL
Make security_vm_enough_memory_mm() WARN() if the passed mm = NULL
Add security_vm_enough_memory_fs() which does not do the warning test

All would still call security->ops->vm_enough_memory and then
__vm_enough_memory() would skip the 3% adjustment when the passed mm was
NULL

Does that sound sensible ?

Alan

2008-10-21 23:11:18

by J. R. Okajima

[permalink] [raw]
Subject: Re: __vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread


Alan Cox:
> Calling into the file system code assuming that current->mm is
> NULL isn't safe and hasn't been for a very long time since someone added
> the 3% hack.

I guess
- people don't care overcommit and leave it as default, so they don't
meet the problem
- people who cares overcommit has rich memory, and they don't meet the
problem too.


> The shmem case is actually a bit special so my thoughts are:
>
> Make security_vm_enough_memory() WARN() if current->mm = NULL
> Make security_vm_enough_memory_mm() WARN() if the passed mm = NULL
> Add security_vm_enough_memory_fs() which does not do the warning test
>
> All would still call security->ops->vm_enough_memory and then
> __vm_enough_memory() would skip the 3% adjustment when the passed mm was
> NULL
>
> Does that sound sensible ?

In your first option, write() to the exported tmpfs will produce the
warning on nfs server even if much memory is left. I don't think it is a
good idea.
I'd suggest to make __vm_enough_memory() would skip the 3% adjustment
only.

--- /src/linux-2.6/linux-2.6.27/mm/mmap.c 2008-10-10 07:13:53.000000000 +0900
+++ /tmp/mmap.c 2008-10-22 08:07:09.000000000 +0900
@@ -173,9 +173,10 @@
allowed -= allowed / 32;
allowed += total_swap_pages;

- /* Don't let a single process grow too big:
+ /* Don't let a single user process grow too big:
leave 3% of the size of this process for other processes */
- allowed -= mm->total_vm / 32;
+ if (mm)
+ allowed -= mm->total_vm / 32;

/*
* cast `allowed' as a signed long because vm_committed_space


Junjiro R. Okajima

2008-10-22 05:35:36

by J. R. Okajima

[permalink] [raw]
Subject: Re: __vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread


> I guess
> - people don't care overcommit and leave it as default, so they don't
> meet the problem
> - people who cares overcommit has rich memory, and they don't meet the
> problem too.

Correction: the problem is unrelated to the memory size.
I was sleepy when I wrote that.
Regardless the memory size, when nfsd writes to tmpfs the system surely
crashes.


Junjiro R. Okajima

2008-10-22 08:27:40

by Alan

[permalink] [raw]
Subject: Re: __vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread

> In your first option, write() to the exported tmpfs will produce the
> warning on nfs server even if much memory is left. I don't think it is a

No - the shmem routine would use vm_enough_memory_fs which wouldn't care
if current->mm is NULL, but the others would check.

> good idea.
> I'd suggest to make __vm_enough_memory() would skip the 3% adjustment
> only.
>
> --- /src/linux-2.6/linux-2.6.27/mm/mmap.c 2008-10-10 07:13:53.000000000 +0900
> +++ /tmp/mmap.c 2008-10-22 08:07:09.000000000 +0900
> @@ -173,9 +173,10 @@
> allowed -= allowed / 32;
> allowed += total_swap_pages;
>
> - /* Don't let a single process grow too big:
> + /* Don't let a single user process grow too big:
> leave 3% of the size of this process for other processes */
> - allowed -= mm->total_vm / 32;
> + if (mm)
> + allowed -= mm->total_vm / 32;

Doing this means we lose the ability to trap incorrect calls to
vm_enough_memory.

Alan

2008-10-22 11:26:36

by J. R. Okajima

[permalink] [raw]
Subject: Re: __vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread


Alan Cox:
> No - the shmem routine would use vm_enough_memory_fs which wouldn't care
> if current->mm is NULL, but the others would check.

Unfortunately I cannot imagine what new vm_enough_memory_fs() will be.
Would you show me its draft or pseudo code?


Junjiro R. Okajima

2008-10-22 12:09:28

by Alan

[permalink] [raw]
Subject: Re: __vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread

On Wed, 22 Oct 2008 20:26:19 +0900
[email protected] wrote:

>
> Alan Cox:
> > No - the shmem routine would use vm_enough_memory_fs which wouldn't care
> > if current->mm is NULL, but the others would check.
>
> Unfortunately I cannot imagine what new vm_enough_memory_fs() will be.
> Would you show me its draft or pseudo code?

This is the patch I propose:

nfsd: Fix vm overcommit crash

From: Alan Cox <[email protected]>

Junjiro R. Okajima reported a problem where knfsd crashes if you are using
it to export shmemfs objects and run strict overcommit. In this situation
the current->mm based modifier to the overcommit goes through a NULL
pointer.

We could simply check for NULL and skip the modifier but we've caught other
real bugs in the past from mm being NULL here - cases where we did need a
valid mm set up (eg the exec bug about a year ago).

To preserve the checks and get the logic we want shuffle the checking
around and add a new helper to the vm_ security wrappers

Also fix a current->mm reference in nommu that should use the passed mm
---

include/linux/security.h | 1 +
mm/mmap.c | 3 ++-
mm/nommu.c | 3 ++-
mm/shmem.c | 4 ++--
security/security.c | 9 +++++++++
5 files changed, 16 insertions(+), 4 deletions(-)


diff --git a/include/linux/security.h b/include/linux/security.h
index f5c4a51..a2b8430 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1585,6 +1585,7 @@ int security_syslog(int type);
int security_settime(struct timespec *ts, struct timezone *tz);
int security_vm_enough_memory(long pages);
int security_vm_enough_memory_mm(struct mm_struct *mm, long pages);
+int security_vm_enough_memory_kern(long pages);
int security_bprm_alloc(struct linux_binprm *bprm);
void security_bprm_free(struct linux_binprm *bprm);
void security_bprm_apply_creds(struct linux_binprm *bprm, int unsafe);
diff --git a/mm/mmap.c b/mm/mmap.c
index 74f4d15..de14ac2 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -175,7 +175,8 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)

/* Don't let a single process grow too big:
leave 3% of the size of this process for other processes */
- allowed -= mm->total_vm / 32;
+ if (mm)
+ allowed -= mm->total_vm / 32;

/*
* cast `allowed' as a signed long because vm_committed_space
diff --git a/mm/nommu.c b/mm/nommu.c
index 2696b24..7695dc8 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1454,7 +1454,8 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)

/* Don't let a single process grow too big:
leave 3% of the size of this process for other processes */
- allowed -= current->mm->total_vm / 32;
+ if (mm)
+ allowed -= mm->total_vm / 32;

/*
* cast `allowed' as a signed long because vm_committed_space
diff --git a/mm/shmem.c b/mm/shmem.c
index d38d7e6..1677b3e 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -162,7 +162,7 @@ static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb)
static inline int shmem_acct_size(unsigned long flags, loff_t size)
{
return (flags & VM_ACCOUNT)?
- security_vm_enough_memory(VM_ACCT(size)): 0;
+ security_vm_enough_memory_kern(VM_ACCT(size)): 0;
}

static inline void shmem_unacct_size(unsigned long flags, loff_t size)
@@ -180,7 +180,7 @@ static inline void shmem_unacct_size(unsigned long flags, loff_t size)
static inline int shmem_acct_block(unsigned long flags)
{
return (flags & VM_ACCOUNT)?
- 0: security_vm_enough_memory(VM_ACCT(PAGE_CACHE_SIZE));
+ 0: security_vm_enough_memory_kern(VM_ACCT(PAGE_CACHE_SIZE));
}

static inline void shmem_unacct_blocks(unsigned long flags, long pages)
diff --git a/security/security.c b/security/security.c
index 255b085..c0acfa7 100644
--- a/security/security.c
+++ b/security/security.c
@@ -198,14 +198,23 @@ int security_settime(struct timespec *ts, struct timezone *tz)

int security_vm_enough_memory(long pages)
{
+ WARN_ON(current->mm == NULL);
return security_ops->vm_enough_memory(current->mm, pages);
}

int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
{
+ WARN_ON(mm == NULL);
return security_ops->vm_enough_memory(mm, pages);
}

+int security_vm_enough_memory_kern(long pages)
+{
+ /* If current->mm is a kernel thread then we will pass NULL,
+ for this specific case that is fine */
+ return security_ops->vm_enough_memory(current->mm, pages);
+}
+
int security_bprm_alloc(struct linux_binprm *bprm)
{
return security_ops->bprm_alloc_security(bprm);

2008-10-22 12:46:55

by J. R. Okajima

[permalink] [raw]
Subject: Re: __vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread


Alan Cox:
> This is the patch I propose:
:::
> We could simply check for NULL and skip the modifier but we've caught other
> real bugs in the past from mm being NULL here - cases where we did need a
> valid mm set up (eg the exec bug about a year ago).

OK.
I didn't know there was an actual bug but now I understand why you want
to insert WARN_ON().

I have nothing to say about your patch.


Thank you
Junjiro R. Okajima

2008-10-22 13:37:27

by James Morris

[permalink] [raw]
Subject: Re: __vm_enough_memory(), OVERCOMMIT_NEVER, current->mm, kernel thread

On Wed, 22 Oct 2008, Alan Cox wrote:

[Adding lsm list to the cc]

> On Wed, 22 Oct 2008 20:26:19 +0900
> [email protected] wrote:
>
> >
> > Alan Cox:
> > > No - the shmem routine would use vm_enough_memory_fs which wouldn't care
> > > if current->mm is NULL, but the others would check.
> >
> > Unfortunately I cannot imagine what new vm_enough_memory_fs() will be.
> > Would you show me its draft or pseudo code?
>
> This is the patch I propose:
>
> nfsd: Fix vm overcommit crash
>
> From: Alan Cox <[email protected]>
>
> Junjiro R. Okajima reported a problem where knfsd crashes if you are using
> it to export shmemfs objects and run strict overcommit. In this situation
> the current->mm based modifier to the overcommit goes through a NULL
> pointer.
>
> We could simply check for NULL and skip the modifier but we've caught other
> real bugs in the past from mm being NULL here - cases where we did need a
> valid mm set up (eg the exec bug about a year ago).
>
> To preserve the checks and get the logic we want shuffle the checking
> around and add a new helper to the vm_ security wrappers
>
> Also fix a current->mm reference in nommu that should use the passed mm

Looks ok to me.

Acked-by: James Morris <[email protected]>


> ---
>
> include/linux/security.h | 1 +
> mm/mmap.c | 3 ++-
> mm/nommu.c | 3 ++-
> mm/shmem.c | 4 ++--
> security/security.c | 9 +++++++++
> 5 files changed, 16 insertions(+), 4 deletions(-)
>
>
> diff --git a/include/linux/security.h b/include/linux/security.h
> index f5c4a51..a2b8430 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -1585,6 +1585,7 @@ int security_syslog(int type);
> int security_settime(struct timespec *ts, struct timezone *tz);
> int security_vm_enough_memory(long pages);
> int security_vm_enough_memory_mm(struct mm_struct *mm, long pages);
> +int security_vm_enough_memory_kern(long pages);
> int security_bprm_alloc(struct linux_binprm *bprm);
> void security_bprm_free(struct linux_binprm *bprm);
> void security_bprm_apply_creds(struct linux_binprm *bprm, int unsafe);
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 74f4d15..de14ac2 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -175,7 +175,8 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
>
> /* Don't let a single process grow too big:
> leave 3% of the size of this process for other processes */
> - allowed -= mm->total_vm / 32;
> + if (mm)
> + allowed -= mm->total_vm / 32;
>
> /*
> * cast `allowed' as a signed long because vm_committed_space
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 2696b24..7695dc8 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -1454,7 +1454,8 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
>
> /* Don't let a single process grow too big:
> leave 3% of the size of this process for other processes */
> - allowed -= current->mm->total_vm / 32;
> + if (mm)
> + allowed -= mm->total_vm / 32;
>
> /*
> * cast `allowed' as a signed long because vm_committed_space
> diff --git a/mm/shmem.c b/mm/shmem.c
> index d38d7e6..1677b3e 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -162,7 +162,7 @@ static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb)
> static inline int shmem_acct_size(unsigned long flags, loff_t size)
> {
> return (flags & VM_ACCOUNT)?
> - security_vm_enough_memory(VM_ACCT(size)): 0;
> + security_vm_enough_memory_kern(VM_ACCT(size)): 0;
> }
>
> static inline void shmem_unacct_size(unsigned long flags, loff_t size)
> @@ -180,7 +180,7 @@ static inline void shmem_unacct_size(unsigned long flags, loff_t size)
> static inline int shmem_acct_block(unsigned long flags)
> {
> return (flags & VM_ACCOUNT)?
> - 0: security_vm_enough_memory(VM_ACCT(PAGE_CACHE_SIZE));
> + 0: security_vm_enough_memory_kern(VM_ACCT(PAGE_CACHE_SIZE));
> }
>
> static inline void shmem_unacct_blocks(unsigned long flags, long pages)
> diff --git a/security/security.c b/security/security.c
> index 255b085..c0acfa7 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -198,14 +198,23 @@ int security_settime(struct timespec *ts, struct timezone *tz)
>
> int security_vm_enough_memory(long pages)
> {
> + WARN_ON(current->mm == NULL);
> return security_ops->vm_enough_memory(current->mm, pages);
> }
>
> int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
> {
> + WARN_ON(mm == NULL);
> return security_ops->vm_enough_memory(mm, pages);
> }
>
> +int security_vm_enough_memory_kern(long pages)
> +{
> + /* If current->mm is a kernel thread then we will pass NULL,
> + for this specific case that is fine */
> + return security_ops->vm_enough_memory(current->mm, pages);
> +}
> +
> int security_bprm_alloc(struct linux_binprm *bprm)
> {
> return security_ops->bprm_alloc_security(bprm);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
James Morris
<[email protected]>