LinuxLists.cc - Re: TMPFS over NFSv4

2010-05-21 20:56:07

Subject: Re: TMPFS over NFSv4

On Fri, 21 May 2010, Tharindu Rukshan Bamunuarachchi wrote:
>
> I tried to export tmpfs file system over NFS and got followin oops ....
> this kernel is provided with SLES 11 and tainted due to OFED installation.
>
> I am using NFSv4. Please help me to find the root cause if you feel free ....
>
>
> BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
> IP: __vm_enough_memory+0xf9/0x14e
> PGD 0
> Oops: 0000 [1] SMP
> last sysfs file:
> /sys/devices/pci0000:00/0000:00:09.0/0000:24:00.0/infiniband/mlx4_1/node_desc
> CPU 0
> Modules linked in: blah blah blah
> Supported: No, Unsupported modules are loaded
> Pid: 8855, comm: nfsd Tainted: G 2.6.27.45-0.1-default #1
> RIP: 0010: __vm_enough_memory+0xf9/0x14e
...
> Process nfsd (pid: 8855, threadinfo ffff8803642cc000, task ffff88036f140380)
> Stack: ffff88037b93b668 ffff88037009de40 ffff88037b93b668 0000000000000000
> ffff88037b93b601 ffffffff802a8573 ffffffff80a33680 ffffffff80a30730
> 0000000000000000 0000000300000002 ffff8803642cd930 0000000000000000
> Call Trace:
> shmem_getpage+0x4d8/0x764
> generic_perform_write+0xae/0x1b5
> generic_file_buffered_write+0x80/0x130
> __generic_file_aio_write_nolock+0x349/0x37d
> generic_file_aio_write+0x64/0xc4
> do_sync_readv_writev+0xc0/0x107
> do_readv_writev+0xb2/0x18b
> nfsd_vfs_write+0x10a/0x328 [nfsd]
> nfsd_write+0x79/0xe2 [nfsd]
> nfsd4_write+0xd9/0x10d [nfsd]
> nfsd4_proc_compound+0x1bd/0x2c7 [nfsd]
> nfsd_dispatch+0xdd/0x1b9 [nfsd]
> svc_process+0x3d8/0x700 [sunrpc]
> nfsd+0x1b1/0x27e [nfsd]
> kthread+0x47/0x73
> child_rip+0xa/0x11

I believe that was fixed in 2.6.28 by the patch below:
please would you try it, and if it works for you, then
I'll ask for it to be included in the next 2.6.27-stable,
which I expect SLES 11 will include in an update later.
Strange that more people haven't suffered from it...

Hugh

commit 731572d39fcd3498702eda4600db4c43d51e0b26
Author: Alan Cox <[email protected]>
Date: Wed Oct 29 14:01:20 2008 -0700

nfsd: fix vm overcommit crash

Junjiro R. Okajima reported a problem where knfsd crashes if you are
using it to export shmemfs objects and run strict overcommit. In this
situation the current->mm based modifier to the overcommit goes through a
NULL pointer.

We could simply check for NULL and skip the modifier but we've caught
other real bugs in the past from mm being NULL here - cases where we did
need a valid mm set up (eg the exec bug about a year ago).

To preserve the checks and get the logic we want shuffle the checking
around and add a new helper to the vm_ security wrappers

Also fix a current->mm reference in nommu that should use the passed mm

[[email protected]: coding-style fixes]
[[email protected]: fix build]
Reported-by: Junjiro R. Okajima <[email protected]>
Acked-by: James Morris <[email protected]>
Signed-off-by: Alan Cox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

diff --git a/include/linux/security.h b/include/linux/security.h
index f5c4a51..c13f1ce 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1585,6 +1585,7 @@ int security_syslog(int type);
int security_settime(struct timespec *ts, struct timezone *tz);
int security_vm_enough_memory(long pages);
int security_vm_enough_memory_mm(struct mm_struct *mm, long pages);
+int security_vm_enough_memory_kern(long pages);
int security_bprm_alloc(struct linux_binprm *bprm);
void security_bprm_free(struct linux_binprm *bprm);
void security_bprm_apply_creds(struct linux_binprm *bprm, int unsafe);
@@ -1820,6 +1821,11 @@ static inline int security_vm_enough_memory(long pages)
return cap_vm_enough_memory(current->mm, pages);
}

+static inline int security_vm_enough_memory_kern(long pages)
+{
+ return cap_vm_enough_memory(current->mm, pages);
+}
+
static inline int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
{
return cap_vm_enough_memory(mm, pages);
diff --git a/mm/mmap.c b/mm/mmap.c
index 74f4d15..de14ac2 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -175,7 +175,8 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)

/* Don't let a single process grow too big:
leave 3% of the size of this process for other processes */
- allowed -= mm->total_vm / 32;
+ if (mm)
+ allowed -= mm->total_vm / 32;

/*
* cast `allowed' as a signed long because vm_committed_space
diff --git a/mm/nommu.c b/mm/nommu.c
index 2696b24..7695dc8 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1454,7 +1454,8 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)

/* Don't let a single process grow too big:
leave 3% of the size of this process for other processes */
- allowed -= current->mm->total_vm / 32;
+ if (mm)
+ allowed -= mm->total_vm / 32;

/*
* cast `allowed' as a signed long because vm_committed_space
diff --git a/mm/shmem.c b/mm/shmem.c
index d38d7e6..0ed0752 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -161,8 +161,8 @@ static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb)
*/
static inline int shmem_acct_size(unsigned long flags, loff_t size)
{
- return (flags & VM_ACCOUNT)?
- security_vm_enough_memory(VM_ACCT(size)): 0;
+ return (flags & VM_ACCOUNT) ?
+ security_vm_enough_memory_kern(VM_ACCT(size)) : 0;
}

static inline void shmem_unacct_size(unsigned long flags, loff_t size)
@@ -179,8 +179,8 @@ static inline void shmem_unacct_size(unsigned long flags, loff_t size)
*/
static inline int shmem_acct_block(unsigned long flags)
{
- return (flags & VM_ACCOUNT)?
- 0: security_vm_enough_memory(VM_ACCT(PAGE_CACHE_SIZE));
+ return (flags & VM_ACCOUNT) ?
+ 0 : security_vm_enough_memory_kern(VM_ACCT(PAGE_CACHE_SIZE));
}

static inline void shmem_unacct_blocks(unsigned long flags, long pages)
diff --git a/security/security.c b/security/security.c
index 255b085..c0acfa7 100644
--- a/security/security.c
+++ b/security/security.c
@@ -198,14 +198,23 @@ int security_settime(struct timespec *ts, struct timezone *tz)

int security_vm_enough_memory(long pages)
{
+ WARN_ON(current->mm == NULL);
return security_ops->vm_enough_memory(current->mm, pages);
}

int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
{
+ WARN_ON(mm == NULL);
return security_ops->vm_enough_memory(mm, pages);
}

+int security_vm_enough_memory_kern(long pages)
+{
+ /* If current->mm is a kernel thread then we will pass NULL,
+ for this specific case that is fine */
+ return security_ops->vm_enough_memory(current->mm, pages);
+}
+
int security_bprm_alloc(struct linux_binprm *bprm)
{
return security_ops->bprm_alloc_security(bprm);

2010-05-24 23:46:34

by Hugh Dickins

[permalink] [raw]

Subject: Re: TMPFS over NFSv4

Hi Greg,

On Mon, 24 May 2010, Alan Cox wrote:
> On Mon, 24 May 2010 02:57:30 -0700
> Hugh Dickins <[email protected]> wrote:
> > On Mon, May 24, 2010 at 2:26 AM, Tharindu Rukshan Bamunuarachchi
> > <[email protected]> wrote:
> > > thankx a lot Hugh ... I will try this out ... (bit harder patch
> > > already patched SLES kernel :-p ) ....
> >
> > If patch conflicts are a problem, you really only need to put in the
> > two-liner patch to mm/mmap.c: Alan was seeking perfection in
> > the rest of the patch, but you can get away without it.
> >
> > >
> > > BTW, what does Alan means by "strict overcommit" ?
> >
> > Ah, that phrase, yes, it's a nonsense, but many of us do say it by mistake.
> > Alan meant to say "strict no-overcommit".
>
> No I always meant to say 'strict overcommit'. It avoids excess negatives
> and "no noovercommit" discussions.
>
> I guess 'strict overcommit control' would have been clearer 8)
>
> Alan

I see we've just missed 2.6.27.47-rc1, but if there's to be an -rc2,
please include Alan's 2.6.28 oops fix below: which Tharindu appears
to be needing - just now discussed on linux-mm and linux-nfs.
Failing that, please queue it up for 2.6.27.48.

Or if you'd prefer a smaller patch for -stable, then just the mm/mmap.c
part of it should suffice: I think it's fair to say that the rest of the
patch was more precautionary - as Alan describes, for catching other bugs,
so good for an ongoing development tree, but not necessarily in -stable.
(However, Alan may disagree - I've already misrepresented him once here!)

Thanks,
Hugh

commit 731572d39fcd3498702eda4600db4c43d51e0b26
Author: Alan Cox <[email protected]>
Date: Wed Oct 29 14:01:20 2008 -0700

nfsd: fix vm overcommit crash

Junjiro R. Okajima reported a problem where knfsd crashes if you are
using it to export shmemfs objects and run strict overcommit. In this
situation the current->mm based modifier to the overcommit goes through a
NULL pointer.

We could simply check for NULL and skip the modifier but we've caught
other real bugs in the past from mm being NULL here - cases where we did
need a valid mm set up (eg the exec bug about a year ago).

To preserve the checks and get the logic we want shuffle the checking
around and add a new helper to the vm_ security wrappers

Also fix a current->mm reference in nommu that should use the passed mm

[[email protected]: coding-style fixes]
[[email protected]: fix build]
Reported-by: Junjiro R. Okajima <[email protected]>
Acked-by: James Morris <[email protected]>
Signed-off-by: Alan Cox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

diff --git a/include/linux/security.h b/include/linux/security.h
index f5c4a51..c13f1ce 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1585,6 +1585,7 @@ int security_syslog(int type);
int security_settime(struct timespec *ts, struct timezone *tz);
int security_vm_enough_memory(long pages);
int security_vm_enough_memory_mm(struct mm_struct *mm, long pages);
+int security_vm_enough_memory_kern(long pages);
int security_bprm_alloc(struct linux_binprm *bprm);
void security_bprm_free(struct linux_binprm *bprm);
void security_bprm_apply_creds(struct linux_binprm *bprm, int unsafe);
@@ -1820,6 +1821,11 @@ static inline int security_vm_enough_memory(long pages)
return cap_vm_enough_memory(current->mm, pages);
}

+static inline int security_vm_enough_memory_kern(long pages)
+{
+ return cap_vm_enough_memory(current->mm, pages);
+}
+
static inline int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
{
return cap_vm_enough_memory(mm, pages);
diff --git a/mm/mmap.c b/mm/mmap.c
index 74f4d15..de14ac2 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -175,7 +175,8 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)

/* Don't let a single process grow too big:
leave 3% of the size of this process for other processes */
- allowed -= mm->total_vm / 32;
+ if (mm)
+ allowed -= mm->total_vm / 32;

/*
* cast `allowed' as a signed long because vm_committed_space
diff --git a/mm/nommu.c b/mm/nommu.c
index 2696b24..7695dc8 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1454,7 +1454,8 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)

/* Don't let a single process grow too big:
leave 3% of the size of this process for other processes */
- allowed -= current->mm->total_vm / 32;
+ if (mm)
+ allowed -= mm->total_vm / 32;

/*
* cast `allowed' as a signed long because vm_committed_space
diff --git a/mm/shmem.c b/mm/shmem.c
index d38d7e6..0ed0752 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -161,8 +161,8 @@ static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb)
*/
static inline int shmem_acct_size(unsigned long flags, loff_t size)
{
- return (flags & VM_ACCOUNT)?
- security_vm_enough_memory(VM_ACCT(size)): 0;
+ return (flags & VM_ACCOUNT) ?
+ security_vm_enough_memory_kern(VM_ACCT(size)) : 0;
}

static inline void shmem_unacct_size(unsigned long flags, loff_t size)
@@ -179,8 +179,8 @@ static inline void shmem_unacct_size(unsigned long flags, loff_t size)
*/
static inline int shmem_acct_block(unsigned long flags)
{
- return (flags & VM_ACCOUNT)?
- 0: security_vm_enough_memory(VM_ACCT(PAGE_CACHE_SIZE));
+ return (flags & VM_ACCOUNT) ?
+ 0 : security_vm_enough_memory_kern(VM_ACCT(PAGE_CACHE_SIZE));
}

static inline void shmem_unacct_blocks(unsigned long flags, long pages)
diff --git a/security/security.c b/security/security.c
index 255b085..c0acfa7 100644
--- a/security/security.c
+++ b/security/security.c
@@ -198,14 +198,23 @@ int security_settime(struct timespec *ts, struct timezone *tz)

int security_vm_enough_memory(long pages)
{
+ WARN_ON(current->mm == NULL);
return security_ops->vm_enough_memory(current->mm, pages);
}

int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
{
+ WARN_ON(mm == NULL);
return security_ops->vm_enough_memory(mm, pages);
}

+int security_vm_enough_memory_kern(long pages)
+{
+ /* If current->mm is a kernel thread then we will pass NULL,
+ for this specific case that is fine */
+ return security_ops->vm_enough_memory(current->mm, pages);
+}
+
int security_bprm_alloc(struct linux_binprm *bprm)
{
return security_ops->bprm_alloc_security(bprm);

2010-05-24 10:01:46

by Alan

[permalink] [raw]

Subject: Re: TMPFS over NFSv4

On Mon, 24 May 2010 02:57:30 -0700
Hugh Dickins <[email protected]> wrote:

> On Mon, May 24, 2010 at 2:26 AM, Tharindu Rukshan Bamunuarachchi
> <[email protected]> wrote:
> > thankx a lot Hugh ... I will try this out ... (bit harder patch
> > already patched SLES kernel :-p ) ....
>
> If patch conflicts are a problem, you really only need to put in the
> two-liner patch to mm/mmap.c: Alan was seeking perfection in
> the rest of the patch, but you can get away without it.
>
> >
> > BTW, what does Alan means by "strict overcommit" ?
>
> Ah, that phrase, yes, it's a nonsense, but many of us do say it by mistake.
> Alan meant to say "strict no-overcommit".

No I always meant to say 'strict overcommit'. It avoids excess negatives
and "no noovercommit" discussions.

I guess 'strict overcommit control' would have been clearer 8)

Alan

2010-05-24 09:55:26

by Alan

[permalink] [raw]

Subject: Re: TMPFS over NFSv4

On Mon, 24 May 2010 10:26:39 +0100
Tharindu Rukshan Bamunuarachchi <[email protected]> wrote:

> thankx a lot Hugh ... I will try this out ... (bit harder patch
> already patched SLES kernel :-p ) ....
>
> BTW, what does Alan means by "strict overcommit" ?

Strict overcommit works like banks should. It tries to ensure that at any
point it has sufficient swap and memory to fulfill any possible use of
allocated address space. So in strict overcommit mode you should almost
never see an OOM kill (there are perverse cases as always), but you will
need a lot more swap that may well never be used.

In the normal mode the kernel works like the US banking system and makes
speculative guesses that all the resources it hands out will never be
needed at once. That has the corresponding risk that one day it might at
which point you get a meltdown (or in the kernel case OOM kills)

Alan

2010-05-24 09:57:35

by Hugh Dickins

[permalink] [raw]

Subject: Re: TMPFS over NFSv4

On Mon, May 24, 2010 at 2:26 AM, Tharindu Rukshan Bamunuarachchi
<[email protected]> wrote:
> thankx a lot Hugh ... I will try this out ... (bit harder patch
> already patched SLES kernel :-p ) ....

If patch conflicts are a problem, you really only need to put in the
two-liner patch to mm/mmap.c: Alan was seeking perfection in
the rest of the patch, but you can get away without it.

>
> BTW, what does Alan means by "strict overcommit" ?

Ah, that phrase, yes, it's a nonsense, but many of us do say it by mistake.
Alan meant to say "strict no-overcommit".

>
> e.g.
> i did not see this issues with "0 > /proc/sys/vm/overcommit_accounting"

I assume "overcommit_accounting" is either a typo for "overcommit_memory",
or SLES gives "overcommit_memory" a slightly different name.

0 means overcommit memory (let people allocate more private writable user
memory than there is actually ram+swap to back), but throw in a check against
really wild allocation requests. 1 omits even that check.

> But this happened several times with "2 > /proc/sys/vm/overcommit_accounting"

2 means account for all private writable memory and fail any allocation which
would take the system over the edge - the edge being defined roughly by
overcommit_ratio * (ram+swap) (I expect there's a divisor needed in there!)
i.e. 2 means strict no-overcommit.

So what you see fits with what Alan was fixing.

Hugh

2010-05-25 17:03:30

by Greg KH

[permalink] [raw]

Subject: Re: TMPFS over NFSv4

On Mon, May 24, 2010 at 04:46:24PM -0700, Hugh Dickins wrote:
> Hi Greg,
>
> On Mon, 24 May 2010, Alan Cox wrote:
> > On Mon, 24 May 2010 02:57:30 -0700
> > Hugh Dickins <[email protected]> wrote:
> > > On Mon, May 24, 2010 at 2:26 AM, Tharindu Rukshan Bamunuarachchi
> > > <[email protected]> wrote:
> > > > thankx a lot Hugh ... I will try this out ... (bit harder patch
> > > > already patched SLES kernel :-p ) ....
> > >
> > > If patch conflicts are a problem, you really only need to put in the
> > > two-liner patch to mm/mmap.c: Alan was seeking perfection in
> > > the rest of the patch, but you can get away without it.
> > >
> > > >
> > > > BTW, what does Alan means by "strict overcommit" ?
> > >
> > > Ah, that phrase, yes, it's a nonsense, but many of us do say it by mistake.
> > > Alan meant to say "strict no-overcommit".
> >
> > No I always meant to say 'strict overcommit'. It avoids excess negatives
> > and "no noovercommit" discussions.
> >
> > I guess 'strict overcommit control' would have been clearer 8)
> >
> > Alan
>
> I see we've just missed 2.6.27.47-rc1, but if there's to be an -rc2,
> please include Alan's 2.6.28 oops fix below: which Tharindu appears
> to be needing - just now discussed on linux-mm and linux-nfs.
> Failing that, please queue it up for 2.6.27.48.

There is now going to be a -rc2 due to other problems, so I'll go queue
this one up as well.

> Or if you'd prefer a smaller patch for -stable, then just the mm/mmap.c
> part of it should suffice: I think it's fair to say that the rest of the
> patch was more precautionary - as Alan describes, for catching other bugs,
> so good for an ongoing development tree, but not necessarily in -stable.
> (However, Alan may disagree - I've already misrepresented him once here!)

The original is best, it makes more sense.

thanks,

greg k-h

2010-05-24 09:27:10

by Tharindu Rukshan Bamunuarachchi

[permalink] [raw]

Subject: Re: TMPFS over NFSv4

thankx a lot Hugh ... I will try this out ... (bit harder patch
already patched SLES kernel :-p ) ....

BTW, what does Alan means by "strict overcommit" ?

e.g.
i did not see this issues with "0 > /proc/sys/vm/overcommit_accounting"
But this happened several times with "2 > /proc/sys/vm/overcommit_accounting"

any clue ?

we are suffering everyday ..... :-|

__
tharindu.info

"those that can, do. Those that can?t, complain." -- Linus

On Fri, May 21, 2010 at 9:55 PM, Hugh Dickins <[email protected]> wrote:
> On Fri, 21 May 2010, Tharindu Rukshan Bamunuarachchi wrote:
>>
>> I tried to export tmpfs file system over NFS and got followin oops ....
>> this kernel is provided with SLES 11 and tainted due to OFED installation.
>>
>> I am using NFSv4. Please help me to find the root cause if you feel free ....
>>
>>
>> BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
>> IP: __vm_enough_memory+0xf9/0x14e
>> PGD 0
>> Oops: 0000 [1] SMP
>> last sysfs file:
>> /sys/devices/pci0000:00/0000:00:09.0/0000:24:00.0/infiniband/mlx4_1/node_desc
>> CPU 0
>> Modules linked in: blah blah blah
>> Supported: No, Unsupported modules are loaded
>> Pid: 8855, comm: nfsd Tainted: G 2.6.27.45-0.1-default #1
>> RIP: 0010: __vm_enough_memory+0xf9/0x14e
> ...
>> Process nfsd (pid: 8855, threadinfo ffff8803642cc000, task ffff88036f140380)
>> Stack: ffff88037b93b668 ffff88037009de40 ffff88037b93b668 0000000000000000
>> ffff88037b93b601 ffffffff802a8573 ffffffff80a33680 ffffffff80a30730
>> 0000000000000000 0000000300000002 ffff8803642cd930 0000000000000000
>> Call Trace:
>> shmem_getpage+0x4d8/0x764
>> generic_perform_write+0xae/0x1b5
>> generic_file_buffered_write+0x80/0x130
>> __generic_file_aio_write_nolock+0x349/0x37d
>> generic_file_aio_write+0x64/0xc4
>> do_sync_readv_writev+0xc0/0x107
>> do_readv_writev+0xb2/0x18b
>> nfsd_vfs_write+0x10a/0x328 [nfsd]
>> nfsd_write+0x79/0xe2 [nfsd]
>> nfsd4_write+0xd9/0x10d [nfsd]
>> nfsd4_proc_compound+0x1bd/0x2c7 [nfsd]
>> nfsd_dispatch+0xdd/0x1b9 [nfsd]
>> svc_process+0x3d8/0x700 [sunrpc]
>> nfsd+0x1b1/0x27e [nfsd]
>> kthread+0x47/0x73
>> child_rip+0xa/0x11
>
> I believe that was fixed in 2.6.28 by the patch below:
> please would you try it, and if it works for you, then
> I'll ask for it to be included in the next 2.6.27-stable,
> which I expect SLES 11 will include in an update later.
> Strange that more people haven't suffered from it...
>
> Hugh
>
> commit 731572d39fcd3498702eda4600db4c43d51e0b26
> Author: Alan Cox <[email protected]>
> Date: ? Wed Oct 29 14:01:20 2008 -0700
>
> ? ?nfsd: fix vm overcommit crash
>
> ? ?Junjiro R. ?Okajima reported a problem where knfsd crashes if you are
> ? ?using it to export shmemfs objects and run strict overcommit. ?In this
> ? ?situation the current->mm based modifier to the overcommit goes through a
> ? ?NULL pointer.
>
> ? ?We could simply check for NULL and skip the modifier but we've caught
> ? ?other real bugs in the past from mm being NULL here - cases where we did
> ? ?need a valid mm set up (eg the exec bug about a year ago).
>
> ? ?To preserve the checks and get the logic we want shuffle the checking
> ? ?around and add a new helper to the vm_ security wrappers
>
> ? ?Also fix a current->mm reference in nommu that should use the passed mm
>
> ? ?[[email protected]: coding-style fixes]
> ? ?[[email protected]: fix build]
> ? ?Reported-by: Junjiro R. Okajima <[email protected]>
> ? ?Acked-by: James Morris <[email protected]>
> ? ?Signed-off-by: Alan Cox <[email protected]>
> ? ?Signed-off-by: Andrew Morton <[email protected]>
> ? ?Signed-off-by: Linus Torvalds <[email protected]>
>
> diff --git a/include/linux/security.h b/include/linux/security.h
> index f5c4a51..c13f1ce 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -1585,6 +1585,7 @@ int security_syslog(int type);
> ?int security_settime(struct timespec *ts, struct timezone *tz);
> ?int security_vm_enough_memory(long pages);
> ?int security_vm_enough_memory_mm(struct mm_struct *mm, long pages);
> +int security_vm_enough_memory_kern(long pages);
> ?int security_bprm_alloc(struct linux_binprm *bprm);
> ?void security_bprm_free(struct linux_binprm *bprm);
> ?void security_bprm_apply_creds(struct linux_binprm *bprm, int unsafe);
> @@ -1820,6 +1821,11 @@ static inline int security_vm_enough_memory(long pages)
> ? ? ? ?return cap_vm_enough_memory(current->mm, pages);
> ?}
>
> +static inline int security_vm_enough_memory_kern(long pages)
> +{
> + ? ? ? return cap_vm_enough_memory(current->mm, pages);
> +}
> +
> ?static inline int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
> ?{
> ? ? ? ?return cap_vm_enough_memory(mm, pages);
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 74f4d15..de14ac2 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -175,7 +175,8 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
>
> ? ? ? ?/* Don't let a single process grow too big:
> ? ? ? ? ? leave 3% of the size of this process for other processes */
> - ? ? ? allowed -= mm->total_vm / 32;
> + ? ? ? if (mm)
> + ? ? ? ? ? ? ? allowed -= mm->total_vm / 32;
>
> ? ? ? ?/*
> ? ? ? ? * cast `allowed' as a signed long because vm_committed_space
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 2696b24..7695dc8 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -1454,7 +1454,8 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
>
> ? ? ? ?/* Don't let a single process grow too big:
> ? ? ? ? ? leave 3% of the size of this process for other processes */
> - ? ? ? allowed -= current->mm->total_vm / 32;
> + ? ? ? if (mm)
> + ? ? ? ? ? ? ? allowed -= mm->total_vm / 32;
>
> ? ? ? ?/*
> ? ? ? ? * cast `allowed' as a signed long because vm_committed_space
> diff --git a/mm/shmem.c b/mm/shmem.c
> index d38d7e6..0ed0752 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -161,8 +161,8 @@ static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb)
> ?*/
> ?static inline int shmem_acct_size(unsigned long flags, loff_t size)
> ?{
> - ? ? ? return (flags & VM_ACCOUNT)?
> - ? ? ? ? ? ? ? security_vm_enough_memory(VM_ACCT(size)): 0;
> + ? ? ? return (flags & VM_ACCOUNT) ?
> + ? ? ? ? ? ? ? security_vm_enough_memory_kern(VM_ACCT(size)) : 0;
> ?}
>
> ?static inline void shmem_unacct_size(unsigned long flags, loff_t size)
> @@ -179,8 +179,8 @@ static inline void shmem_unacct_size(unsigned long flags, loff_t size)
> ?*/
> ?static inline int shmem_acct_block(unsigned long flags)
> ?{
> - ? ? ? return (flags & VM_ACCOUNT)?
> - ? ? ? ? ? ? ? 0: security_vm_enough_memory(VM_ACCT(PAGE_CACHE_SIZE));
> + ? ? ? return (flags & VM_ACCOUNT) ?
> + ? ? ? ? ? ? ? 0 : security_vm_enough_memory_kern(VM_ACCT(PAGE_CACHE_SIZE));
> ?}
>
> ?static inline void shmem_unacct_blocks(unsigned long flags, long pages)
> diff --git a/security/security.c b/security/security.c
> index 255b085..c0acfa7 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -198,14 +198,23 @@ int security_settime(struct timespec *ts, struct timezone *tz)
>
> ?int security_vm_enough_memory(long pages)
> ?{
> + ? ? ? WARN_ON(current->mm == NULL);
> ? ? ? ?return security_ops->vm_enough_memory(current->mm, pages);
> ?}
>
> ?int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
> ?{
> + ? ? ? WARN_ON(mm == NULL);
> ? ? ? ?return security_ops->vm_enough_memory(mm, pages);
> ?}
>
> +int security_vm_enough_memory_kern(long pages)
> +{
> + ? ? ? /* If current->mm is a kernel thread then we will pass NULL,
> + ? ? ? ? ?for this specific case that is fine */
> + ? ? ? return security_ops->vm_enough_memory(current->mm, pages);
> +}
> +
> ?int security_bprm_alloc(struct linux_binprm *bprm)
> ?{
> ? ? ? ?return security_ops->bprm_alloc_security(bprm);
>

2010-05-25 09:00:29

by Tharindu Rukshan Bamunuarachchi

[permalink] [raw]

Subject: Re: TMPFS over NFSv4

hope that the 2.6.27.48 or later will be shipped with SP1 :-)

__
tharindu.info

"those that can, do. Those that can=92t, complain." -- Linus

On Tue, May 25, 2010 at 12:46 AM, Hugh Dickins <[email protected]> wrote:
> Hi Greg,
>
> On Mon, 24 May 2010, Alan Cox wrote:
>> On Mon, 24 May 2010 02:57:30 -0700
>> Hugh Dickins <[email protected]> wrote:
>> > On Mon, May 24, 2010 at 2:26 AM, Tharindu Rukshan Bamunuarachchi
>> > <[email protected]> wrote:
>> > > thankx a lot Hugh ... I will try this out ... (bit harder patch
>> > > already patched SLES kernel :-p ) ....
>> >
>> > If patch conflicts are a problem, you really only need to put in the
>> > two-liner patch to mm/mmap.c: Alan was seeking perfection in
>> > the rest of the patch, but you can get away without it.
>> >
>> > >
>> > > BTW, what does Alan means by "strict overcommit" ?
>> >
>> > Ah, that phrase, yes, it's a nonsense, but many of us do say it by mis=
take.
>> > Alan meant to say "strict no-overcommit".
>>
>> No I always meant to say 'strict overcommit'. It avoids excess negatives
>> and "no noovercommit" discussions.
>>
>> I guess 'strict overcommit control' would have been clearer 8)
>>
>> Alan
>
> I see we've just missed 2.6.27.47-rc1, but if there's to be an -rc2,
> please include Alan's 2.6.28 oops fix below: which Tharindu appears
> to be needing - just now discussed on linux-mm and linux-nfs.
> Failing that, please queue it up for 2.6.27.48.
>
> Or if you'd prefer a smaller patch for -stable, then just the mm/mmap.c
> part of it should suffice: I think it's fair to say that the rest of the
> patch was more precautionary - as Alan describes, for catching other bugs=
,
> so good for an ongoing development tree, but not necessarily in -stable.
> (However, Alan may disagree - I've already misrepresented him once here!)
>
> Thanks,
> Hugh
>
> commit 731572d39fcd3498702eda4600db4c43d51e0b26
> Author: Alan Cox <[email protected]>
> Date: =A0 Wed Oct 29 14:01:20 2008 -0700
>
> =A0 =A0nfsd: fix vm overcommit crash
>
> =A0 =A0Junjiro R. =A0Okajima reported a problem where knfsd crashes if yo=
u are
> =A0 =A0using it to export shmemfs objects and run strict overcommit. =A0I=
n this
> =A0 =A0situation the current->mm based modifier to the overcommit goes th=
rough a
> =A0 =A0NULL pointer.
>
> =A0 =A0We could simply check for NULL and skip the modifier but we've cau=
ght
> =A0 =A0other real bugs in the past from mm being NULL here - cases where =
we did
> =A0 =A0need a valid mm set up (eg the exec bug about a year ago).
>
> =A0 =A0To preserve the checks and get the logic we want shuffle the check=
ing
> =A0 =A0around and add a new helper to the vm_ security wrappers
>
> =A0 =A0Also fix a current->mm reference in nommu that should use the pass=
ed mm
>
> =A0 =A0[[email protected]: coding-style fixes]
> =A0 =A0[[email protected]: fix build]
> =A0 =A0Reported-by: Junjiro R. Okajima <[email protected]>
> =A0 =A0Acked-by: James Morris <[email protected]>
> =A0 =A0Signed-off-by: Alan Cox <[email protected]>
> =A0 =A0Signed-off-by: Andrew Morton <[email protected]>
> =A0 =A0Signed-off-by: Linus Torvalds <[email protected]>
>
> diff --git a/include/linux/security.h b/include/linux/security.h
> index f5c4a51..c13f1ce 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -1585,6 +1585,7 @@ int security_syslog(int type);
> =A0int security_settime(struct timespec *ts, struct timezone *tz);
> =A0int security_vm_enough_memory(long pages);
> =A0int security_vm_enough_memory_mm(struct mm_struct *mm, long pages);
> +int security_vm_enough_memory_kern(long pages);
> =A0int security_bprm_alloc(struct linux_binprm *bprm);
> =A0void security_bprm_free(struct linux_binprm *bprm);
> =A0void security_bprm_apply_creds(struct linux_binprm *bprm, int unsafe);
> @@ -1820,6 +1821,11 @@ static inline int security_vm_enough_memory(long p=
ages)
> =A0 =A0 =A0 =A0return cap_vm_enough_memory(current->mm, pages);
> =A0}
>
> +static inline int security_vm_enough_memory_kern(long pages)
> +{
> + =A0 =A0 =A0 return cap_vm_enough_memory(current->mm, pages);
> +}
> +
> =A0static inline int security_vm_enough_memory_mm(struct mm_struct *mm, l=
ong pages)
> =A0{
> =A0 =A0 =A0 =A0return cap_vm_enough_memory(mm, pages);
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 74f4d15..de14ac2 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -175,7 +175,8 @@ int __vm_enough_memory(struct mm_struct *mm, long pag=
es, int cap_sys_admin)
>
> =A0 =A0 =A0 =A0/* Don't let a single process grow too big:
> =A0 =A0 =A0 =A0 =A0 leave 3% of the size of this process for other proces=
ses */
> - =A0 =A0 =A0 allowed -=3D mm->total_vm / 32;
> + =A0 =A0 =A0 if (mm)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 allowed -=3D mm->total_vm / 32;
>
> =A0 =A0 =A0 =A0/*
> =A0 =A0 =A0 =A0 * cast `allowed' as a signed long because vm_committed_sp=
ace
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 2696b24..7695dc8 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -1454,7 +1454,8 @@ int __vm_enough_memory(struct mm_struct *mm, long p=
ages, int cap_sys_admin)
>
> =A0 =A0 =A0 =A0/* Don't let a single process grow too big:
> =A0 =A0 =A0 =A0 =A0 leave 3% of the size of this process for other proces=
ses */
> - =A0 =A0 =A0 allowed -=3D current->mm->total_vm / 32;
> + =A0 =A0 =A0 if (mm)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 allowed -=3D mm->total_vm / 32;
>
> =A0 =A0 =A0 =A0/*
> =A0 =A0 =A0 =A0 * cast `allowed' as a signed long because vm_committed_sp=
ace
> diff --git a/mm/shmem.c b/mm/shmem.c
> index d38d7e6..0ed0752 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -161,8 +161,8 @@ static inline struct shmem_sb_info *SHMEM_SB(struct s=
uper_block *sb)
> =A0*/
> =A0static inline int shmem_acct_size(unsigned long flags, loff_t size)
> =A0{
> - =A0 =A0 =A0 return (flags & VM_ACCOUNT)?
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 security_vm_enough_memory(VM_ACCT(size)): 0=
;
> + =A0 =A0 =A0 return (flags & VM_ACCOUNT) ?
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 security_vm_enough_memory_kern(VM_ACCT(size=
)) : 0;
> =A0}
>
> =A0static inline void shmem_unacct_size(unsigned long flags, loff_t size)
> @@ -179,8 +179,8 @@ static inline void shmem_unacct_size(unsigned long fl=
ags, loff_t size)
> =A0*/
> =A0static inline int shmem_acct_block(unsigned long flags)
> =A0{
> - =A0 =A0 =A0 return (flags & VM_ACCOUNT)?
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 0: security_vm_enough_memory(VM_ACCT(PAGE_C=
ACHE_SIZE));
> + =A0 =A0 =A0 return (flags & VM_ACCOUNT) ?
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 0 : security_vm_enough_memory_kern(VM_ACCT(=
PAGE_CACHE_SIZE));
> =A0}
>
> =A0static inline void shmem_unacct_blocks(unsigned long flags, long pages=
)
> diff --git a/security/security.c b/security/security.c
> index 255b085..c0acfa7 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -198,14 +198,23 @@ int security_settime(struct timespec *ts, struct ti=
mezone *tz)
>
> =A0int security_vm_enough_memory(long pages)
> =A0{
> + =A0 =A0 =A0 WARN_ON(current->mm =3D=3D NULL);
> =A0 =A0 =A0 =A0return security_ops->vm_enough_memory(current->mm, pages);
> =A0}
>
> =A0int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
> =A0{
> + =A0 =A0 =A0 WARN_ON(mm =3D=3D NULL);
> =A0 =A0 =A0 =A0return security_ops->vm_enough_memory(mm, pages);
> =A0}
>
> +int security_vm_enough_memory_kern(long pages)
> +{
> + =A0 =A0 =A0 /* If current->mm is a kernel thread then we will pass NULL=
,
> + =A0 =A0 =A0 =A0 =A0for this specific case that is fine */
> + =A0 =A0 =A0 return security_ops->vm_enough_memory(current->mm, pages);
> +}
> +
> =A0int security_bprm_alloc(struct linux_binprm *bprm)
> =A0{
> =A0 =A0 =A0 =A0return security_ops->bprm_alloc_security(bprm);
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2010-05-25 16:58:48

by Greg KH

[permalink] [raw]

Subject: Re: TMPFS over NFSv4

On Tue, May 25, 2010 at 10:00:29AM +0100, Tharindu Rukshan Bamunuarachchi wrote:
> hope that the 2.6.27.48 or later will be shipped with SP1 :-)

What do you mean "SP1"?

confused,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2010-05-24 11:37:21

by Tharindu Rukshan Bamunuarachchi

[permalink] [raw]

Subject: Re: TMPFS over NFSv4

Got it cleared.

BTW, nice example ... US Banking System :-)

__
tharindu.info

"those that can, do. Those that can=92t, complain." -- Linus

On Mon, May 24, 2010 at 11:02 AM, Alan Cox <[email protected]> w=
rote:
> On Mon, 24 May 2010 10:26:39 +0100
> Tharindu Rukshan Bamunuarachchi <[email protected]> wrote:
>
>> thankx a lot Hugh ... I will try this out ... (bit harder patch
>> already patched SLES kernel :-p ) ....
>>
>> BTW, what does Alan means by "strict overcommit" ?
>
> Strict overcommit works like banks should. It tries to ensure that at=
any
> point it has sufficient swap and memory to fulfill any possible use o=
f
> allocated address space. So in strict overcommit mode you should almo=
st
> never see an OOM kill (there are perverse cases as always), but you w=
ill
> need a lot more swap that may well never be used.
>
> In the normal mode the kernel works like the US banking system and ma=
kes
> speculative guesses that all the resources it hands out will never be
> needed at once. That has the corresponding risk that one day it might=
at
> which point you get a meltdown (or in the kernel case OOM kills)
>
> Alan
>