LinuxLists.cc - [0/6] 2.6.27.52 stable review [try 3]

2010-08-18 15:07:49

Subject: [0/6] 2.6.27.52 stable review [try 3]

Here's the 3rd try for the 2.6.27.52 kernel release. It has 3 more
patches than the -rc1 release, and two more than -rc2. I've tested this
on my machine, but I'm still getting some kernel log warnings from some
programs. Any VM developer who wishes to verify that I actually got my
backports correctly would be greatly appreciated as I am not that
comfortable with them.

As an example, look at patch 6/6, I could only figure out the /proc file
change, not the mlock range change, as the VM has changed a lot from .27
to today. Any help in porting commit
d7824370e26325c881b665350ce64fb0a4fde24a to the .27 tree "properly"
would be greatly appreciated.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v2.6/stable-review/patch-2.6.27.52-rc3.gz
and the diffstat can be found below.

thanks,

greg k-h

Makefile | 2 +-
arch/x86/mm/fault.c | 9 ++++++++-
fs/proc/task_mmu.c | 8 +++++++-
mm/memory.c | 26 +++++++++++++++++++++++++-
mm/mmap.c | 2 +-
5 files changed, 42 insertions(+), 5 deletions(-)

2010-08-18 15:07:38

by Greg KH

[permalink] [raw]

Subject: [1/6] mm: keep a guard page below a grow-down stack segment

2.6.27-stable review patch. If anyone has any objections, please let us know.

------------------

From: Linus Torvalds <[email protected]>

commit 320b2b8de12698082609ebbc1a17165727f4c893 upstream.

This is a rather minimally invasive patch to solve the problem of the
user stack growing into a memory mapped area below it. Whenever we fill
the first page of the stack segment, expand the segment down by one
page.

Now, admittedly some odd application might _want_ the stack to grow down
into the preceding memory mapping, and so we may at some point need to
make this a process tunable (some people might also want to have more
than a single page of guarding), but let's try the minimal approach
first.

Tested with trivial application that maps a single page just below the
stack, and then starts recursing. Without this, we will get a SIGSEGV
_after_ the stack has smashed the mapping. With this patch, we'll get a
nice SIGBUS just as the stack touches the page just above the mapping.

Requested-by: Keith Packard <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
mm/memory.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2396,6 +2396,26 @@ out_nomap:
}

/*
+ * This is like a special single-page "expand_downwards()",
+ * except we must first make sure that 'address-PAGE_SIZE'
+ * doesn't hit another vma.
+ *
+ * The "find_vma()" will do the right thing even if we wrap
+ */
+static inline int check_stack_guard_page(struct vm_area_struct *vma, unsigned long address)
+{
+ address &= PAGE_MASK;
+ if ((vma->vm_flags & VM_GROWSDOWN) && address == vma->vm_start) {
+ address -= PAGE_SIZE;
+ if (find_vma(vma->vm_mm, address) != vma)
+ return -ENOMEM;
+
+ expand_stack(vma, address);
+ }
+ return 0;
+}
+
+/*
* We enter with non-exclusive mmap_sem (to exclude vma changes,
* but allow concurrent faults), and pte mapped but not yet locked.
* We return with mmap_sem still held, but pte unmapped and unlocked.
@@ -2408,6 +2428,9 @@ static int do_anonymous_page(struct mm_s
spinlock_t *ptl;
pte_t entry;

+ if (check_stack_guard_page(vma, address) < 0)
+ return VM_FAULT_SIGBUS;
+
/* Allocate our own private page. */
pte_unmap(page_table);

2010-08-18 15:07:40

by Greg KH

[permalink] [raw]

Subject: [4/6] mm: pass correct mm when growing stack

2.6.27-stable review patch. If anyone has any objections, please let us know.

------------------

From: Hugh Dickins <[email protected]>

commit 05fa199d45c54a9bda7aa3ae6537253d6f097aa9 upstream.

Tetsuo Handa reports seeing the WARN_ON(current->mm == NULL) in
security_vm_enough_memory(), when do_execve() is touching the
target mm's stack, to set up its args and environment.

Yes, a UMH_NO_WAIT or UMH_WAIT_PROC call_usermodehelper() spawns
an mm-less kernel thread to do the exec. And in any case, that
vm_enough_memory check when growing stack ought to be done on the
target mm, not on the execer's mm (though apart from the warning,
it only makes a slight tweak to OVERCOMMIT_NEVER behaviour).

Reported-by: Tetsuo Handa <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
mm/mmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1573,7 +1573,7 @@ static int acct_stack_growth(struct vm_a
* Overcommit.. This must be the final test, as it will
* update security statistics.
*/
- if (security_vm_enough_memory(grow))
+ if (security_vm_enough_memory_mm(mm, grow))
return -ENOMEM;

/* Ok, everything looks good - let it rip */

2010-08-18 15:07:46

by Greg KH

[permalink] [raw]

Subject: [6/6] mm: fix up some user-visible effects of the stack guard page

2.6.27-stable review patch. If anyone has any objections, please let us know.

------------------

From: Linus Torvalds <[email protected]>

commit d7824370e26325c881b665350ce64fb0a4fde24a upstream.

This commit makes the stack guard page somewhat less visible to user
space. It does this by:

- not showing the guard page in /proc/<pid>/maps

It looks like lvm-tools will actually read /proc/self/maps to figure
out where all its mappings are, and effectively do a specialized
"mlockall()" in user space. By not showing the guard page as part of
the mapping (by just adding PAGE_SIZE to the start for grows-up
pages), lvm-tools ends up not being aware of it.

- by also teaching the _real_ mlock() functionality not to try to lock
the guard page.

That would just expand the mapping down to create a new guard page,
so there really is no point in trying to lock it in place.

It would perhaps be nice to show the guard page specially in
/proc/<pid>/maps (or at least mark grow-down segments some way), but
let's not open ourselves up to more breakage by user space from programs
that depends on the exact deails of the 'maps' file.

Special thanks to Henrique de Moraes Holschuh for diving into lvm-tools
source code to see what was going on with the whole new warning.

[Note, for .27, only the /proc change is done, mlock is not modified
here. - gregkh]

Reported-and-tested-by: François Valenduc <[email protected]
Reported-by: Henrique de Moraes Holschuh <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/proc/task_mmu.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -205,6 +205,7 @@ static void show_map_vma(struct seq_file
struct file *file = vma->vm_file;
int flags = vma->vm_flags;
unsigned long ino = 0;
+ unsigned long start;
dev_t dev = 0;
int len;

@@ -214,8 +215,13 @@ static void show_map_vma(struct seq_file
ino = inode->i_ino;
}

+ /* We don't show the stack guard page in /proc/maps */
+ start = vma->vm_start;
+ if (vma->vm_flags & VM_GROWSDOWN)
+ start += PAGE_SIZE;
+
seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu %n",
- vma->vm_start,
+ start,
vma->vm_end,
flags & VM_READ ? 'r' : '-',
flags & VM_WRITE ? 'w' : '-',

2010-08-18 15:08:11

by Greg KH

[permalink] [raw]

Subject: [3/6] x86: dont send SIGBUS for kernel page faults

2.6.27-stable review patch. If anyone has any objections, please let us know.

------------------

Based on commit 96054569190bdec375fe824e48ca1f4e3b53dd36 upstream,
authored by Linus Torvalds.

This is my backport to the .27 kernel tree, hopefully preserving
the same functionality.

Original commit message:
It's wrong for several reasons, but the most direct one is that the
fault may be for the stack accesses to set up a previous SIGBUS. When
we have a kernel exception, the kernel exception handler does all the
fixups, not some user-level signal handler.

Even apart from the nested SIGBUS issue, it's also wrong to give out
kernel fault addresses in the signal handler info block, or to send a
SIGBUS when a system call already returns EFAULT.

Cc: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/mm/fault.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -589,6 +589,7 @@ void __kprobes do_page_fault(struct pt_r
unsigned long address;
int write, si_code;
int fault;
+ int should_exit_no_context = 0;
#ifdef CONFIG_X86_64
unsigned long flags;
#endif
@@ -876,6 +877,9 @@ no_context:
oops_end(flags, regs, SIGKILL);
#endif

+ if (should_exit_no_context)
+ return;
+
/*
* We ran out of memory, or some other thing happened to us that made
* us unable to handle the page fault gracefully.
@@ -901,8 +905,11 @@ do_sigbus:
up_read(&mm->mmap_sem);

/* Kernel mode? Handle exceptions or die */
- if (!(error_code & PF_USER))
+ if (!(error_code & PF_USER)) {
+ should_exit_no_context = 1;
goto no_context;
+ }
+
#ifdef CONFIG_X86_32
/* User space => ok to do another page fault */
if (is_prefetch(regs, address, error_code))

2010-08-18 15:08:10

by Greg KH

[permalink] [raw]

Subject: [5/6] mm: fix page table unmap for stack guard page properly

2.6.27-stable review patch. If anyone has any objections, please let us know.

------------------

From: Linus Torvalds <[email protected]>

commit 11ac552477e32835cb6970bf0a70c210807f5673 upstream.

We do in fact need to unmap the page table _before_ doing the whole
stack guard page logic, because if it is needed (mainly 32-bit x86 with
PAE and CONFIG_HIGHPTE, but other architectures may use it too) then it
will do a kmap_atomic/kunmap_atomic.

And those kmaps will create an atomic region that we cannot do
allocations in. However, the whole stack expand code will need to do
anon_vma_prepare() and vma_lock_anon_vma() and they cannot do that in an
atomic region.

Now, a better model might actually be to do the anon_vma_prepare() when
_creating_ a VM_GROWSDOWN segment, and not have to worry about any of
this at page fault time. But in the meantime, this is the
straightforward fix for the issue.

See https://bugzilla.kernel.org/show_bug.cgi?id=16588 for details.

Reported-by: Wylda <[email protected]>
Reported-by: Sedat Dilek <[email protected]>
Reported-by: Mike Pagano <[email protected]>
Reported-by: François Valenduc <[email protected]>
Tested-by: Ed Tomlinson <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Greg KH <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
mm/memory.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2428,14 +2428,13 @@ static int do_anonymous_page(struct mm_s
spinlock_t *ptl;
pte_t entry;

- if (check_stack_guard_page(vma, address) < 0) {
- pte_unmap(page_table);
+ pte_unmap(page_table);
+
+ /* Check if we need to add a guard page to the stack */
+ if (check_stack_guard_page(vma, address) < 0)
return VM_FAULT_SIGBUS;
- }

/* Allocate our own private page. */
- pte_unmap(page_table);
-
if (unlikely(anon_vma_prepare(vma)))
goto oom;
page = alloc_zeroed_user_highpage_movable(vma, address);

2010-08-18 15:08:46

by Greg KH

[permalink] [raw]

Subject: [2/6] mm: fix missing page table unmap for stack guard page failure case

2.6.27-stable review patch. If anyone has any objections, please let us know.

------------------

From: Linus Torvalds <[email protected]>

commit 5528f9132cf65d4d892bcbc5684c61e7822b21e9 upstream.

.. which didn't show up in my tests because it's a no-op on x86-64 and
most other architectures. But we enter the function with the last-level
page table mapped, and should unmap it at exit.

Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
mm/memory.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2428,8 +2428,10 @@ static int do_anonymous_page(struct mm_s
spinlock_t *ptl;
pte_t entry;

- if (check_stack_guard_page(vma, address) < 0)
+ if (check_stack_guard_page(vma, address) < 0) {
+ pte_unmap(page_table);
return VM_FAULT_SIGBUS;
+ }

/* Allocate our own private page. */
pte_unmap(page_table);

2010-08-19 01:18:55

by Gabor Z. Papp

[permalink] [raw]

Subject: Re: [0/6] 2.6.27.52 stable review [try 3]

* Greg KH <[email protected]>:

| Here's the 3rd try for the 2.6.27.52 kernel release. It has 3 more
| patches than the -rc1 release, and two more than -rc2.

What about this?

kbuild: fix make incompatibility
author Sam Ravnborg <[email protected]>
Sat, 13 Dec 2008 22:00:45 +0000 (23:00 +0100)
committer Sam Ravnborg <[email protected]>
Sat, 13 Dec 2008 22:00:45 +0000 (23:00 +0100)
commit 31110ebbec8688c6e9597b641101afc94e1c762a
tree 208aaad7e40cbb86bc125760664911da8cd4eebb tree | snapshot
parent abf681ce5b6f83f0b8883e0f2c12d197a38543dd commit | diff
kbuild: fix make incompatibility

"Paul Smith" <[email protected]> reported that we would fail to build
with a new check that may be enabled in an upcoming version of make.

The error was:

Makefile:442: *** mixed implicit and normal rules. Stop.

2010-08-19 14:39:51

by Greg KH

[permalink] [raw]

Subject: Re: [stable] [0/6] 2.6.27.52 stable review [try 3]

On Thu, Aug 19, 2010 at 03:18:51AM +0200, Gabor Z. Papp wrote:
> * Greg KH <[email protected]>:
>
> | Here's the 3rd try for the 2.6.27.52 kernel release. It has 3 more
> | patches than the -rc1 release, and two more than -rc2.
>
> What about this?
>
> kbuild: fix make incompatibility

<snip>

That's for a later kernel release, I'm trying to fix the one issue now.
This patch is for something else, something that was not caused by any
previous release either, so it's not that essencial at the moment,
right?

thanks,

greg k-h

2010-08-23 22:47:21

by Greg KH

[permalink] [raw]

Subject: Re: [stable] [0/6] 2.6.27.52 stable review [try 3]

On Thu, Aug 19, 2010 at 07:24:58AM -0700, Greg KH wrote:
> On Thu, Aug 19, 2010 at 03:18:51AM +0200, Gabor Z. Papp wrote:
> > * Greg KH <[email protected]>:
> >
> > | Here's the 3rd try for the 2.6.27.52 kernel release. It has 3 more
> > | patches than the -rc1 release, and two more than -rc2.
> >
> > What about this?
> >
> > kbuild: fix make incompatibility

Now queued up.

thanks,

greg k-h