We've had a few bugs recently where device drivers were forgetting
to set vma->vm_flags:VM_IO in their mmap() methods.
This causes kernel deadlocks when applications which have the
relevant device mapped try to dump core (the pagefault handler
deadlocks on mmap_sem).
Failing to set VM_IO also causes kernel oopses when PTRACE_PEEKUSR
tries to read the device mapping - the region has no page structs,
but access_process_vm() acts as if it does.
This patch doesn't fix the PTRACE_PEEKUSR bug - for that we need
this patch as well as the patch Andrea, Manfred and I pieced
together - it's at http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.18pre7aa2/00_get_user_pages-2
I understand that Manfred will be sending you a version of that patch.
The drivers which are forgetting to set VM_IO include a whole bunch
of fbmem drivers, one or more AGP drivers and I don't know what else.
It's simpler to make VM_IO default to `on', and to only clear it
in places where we know that it's safe to dump the memory to a
core file, and where it's safe to PTRACE_PEEKUSR it. That's
what this patch does. We only clear VM_IO in generic_file_mmap()
and in ncp_mmap() and in coda_file_mmap().
Any filesystem which implements its own mmap() method, and which
does not call generic_file_mmap() needs to be changed to clear
VM_IO inside its mmap function. All in-kernel filesystems are
OK, as is XFS. And the only breakage this can cause to out-of-kernel
filesystems is failure to include mappings in core files, and
inability to use PEEKUSR.
With this patch in place we can go through and remove the setting of
VM_IO in all the drivers which _do_ remember to set it. But that's
a cleanup which can await 2.4.19-pre.
--- linux-2.4.18-pre8/mm/mmap.c Mon Nov 5 21:01:12 2001
+++ linux-akpm/mm/mmap.c Sat Feb 2 17:36:55 2002
@@ -534,6 +534,11 @@ munmap_back:
}
vma->vm_file = file;
get_file(file);
+ /*
+ * Subdrivers can clear VM_IO if their mappings are
+ * valid pages inside mem_map[]
+ */
+ vma->vm_flags |= VM_IO;
error = file->f_op->mmap(file, vma);
if (error)
goto unmap_and_free_vma;
--- linux-2.4.18-pre8/mm/filemap.c Tue Feb 5 00:33:05 2002
+++ linux-akpm/mm/filemap.c Sat Feb 2 17:36:55 2002
@@ -2111,6 +2111,7 @@ int generic_file_mmap(struct file * file
return -ENOEXEC;
UPDATE_ATIME(inode);
vma->vm_ops = &generic_file_vm_ops;
+ vma->vm_flags &= ~VM_IO;
return 0;
}
--- linux-2.4.18-pre8/fs/coda/file.c Tue Feb 5 00:33:05 2002
+++ linux-akpm/fs/coda/file.c Wed Feb 6 21:50:16 2002
@@ -97,6 +97,7 @@ coda_file_mmap(struct file *file, struct
if (!cfile->f_op || !cfile->f_op->mmap)
return -ENODEV;
+ vma->vm_flags &= ~VM_IO;
down(&inode->i_sem);
ret = cfile->f_op->mmap(cfile, vma);
UPDATE_ATIME(inode);
--- linux-2.4.18-pre8/fs/ncpfs/mmap.c Mon Sep 10 09:04:53 2001
+++ linux-akpm/fs/ncpfs/mmap.c Wed Feb 6 21:49:28 2002
@@ -119,5 +119,6 @@ int ncp_mmap(struct file *file, struct v
}
vma->vm_ops = &ncp_file_mmap;
+ vma->vm_flags &= ~VM_IO;
return 0;
}
From: Andrew Morton <[email protected]>
Date: Wed, 06 Feb 2002 22:14:28 -0800
It's simpler to make VM_IO default to `on', and to only clear it
in places where we know that it's safe to dump the memory to a
core file, and where it's safe to PTRACE_PEEKUSR it. That's
what this patch does.
I completely agree with this change.
Hi,
On Wed, 6 Feb 2002, Andrew Morton wrote:
> Any filesystem which implements its own mmap() method, and which
> does not call generic_file_mmap() needs to be changed to clear
> VM_IO inside its mmap function. All in-kernel filesystems are
> OK, as is XFS. And the only breakage this can cause to out-of-kernel
> filesystems is failure to include mappings in core files, and
> inability to use PEEKUSR.
You forgot shared memory via mm/shmem.c and ipc/shm.c.
Another possibility is to test whether the driver provides a nopage
function, as i/o areas are usually mapped with io_remap_page_range. The
only exception I found are drivers under drivers/sgi/char, which use some
really dirty tricks in their nopage function.
bye, Roman
On Thu, 7 Feb 2002, Roman Zippel wrote:
> Hi,
>
> On Wed, 6 Feb 2002, Andrew Morton wrote:
>
> > Any filesystem which implements its own mmap() method, and which
> > does not call generic_file_mmap() needs to be changed to clear
> > VM_IO inside its mmap function. All in-kernel filesystems are
> > OK, as is XFS. And the only breakage this can cause to out-of-kernel
> > filesystems is failure to include mappings in core files, and
> > inability to use PEEKUSR.
>
> You forgot shared memory via mm/shmem.c and ipc/shm.c.
Andrew, could you please send me an uptodated patch to fix that ?
> Another possibility is to test whether the driver provides a nopage
> function, as i/o areas are usually mapped with io_remap_page_range.
Eek, thats too kludgy. ;)
Roman Zippel wrote:
>
> Hi,
>
> On Wed, 6 Feb 2002, Andrew Morton wrote:
>
> > Any filesystem which implements its own mmap() method, and which
> > does not call generic_file_mmap() needs to be changed to clear
> > VM_IO inside its mmap function. All in-kernel filesystems are
> > OK, as is XFS. And the only breakage this can cause to out-of-kernel
> > filesystems is failure to include mappings in core files, and
> > inability to use PEEKUSR.
>
> You forgot shared memory via mm/shmem.c and ipc/shm.c.
> Another possibility is to test whether the driver provides a nopage
> function, as i/o areas are usually mapped with io_remap_page_range. The
> only exception I found are drivers under drivers/sgi/char, which use some
> really dirty tricks in their nopage function.
ugh. This just isn't working out.
Sure, inverting the default value of VM_IO is much safer, because
the penalty for getting it wrong is broken coredumps and ptrace,
rather than kernel crashes.
But there are still many areas which need changing, and thinking
about. Stuff like ftape_mmap, sound_mmap, packet_mmap, mmap_mem,
mmap_zero, ...
I'm wondering if we can automagically determine whether the
mapping is safe to peek and coredump within the mmap and mremap
functions themselves? All this would take is the ability to
determine that the entire vma is backed by valid page structs.
Is there a fast and portable way of doing this?
-
On Wed, 6 Feb 2002, Andrew Morton wrote:
> This patch doesn't fix the PTRACE_PEEKUSR bug - for that we need
> this patch as well as the patch Andrea, Manfred and I pieced
> together - it's at http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.18pre7aa2/00_get_user_pages-2
> I understand that Manfred will be sending you a version of that patch.
>
My patch is below.
The only difference between my and Andrea's version is one indentation and
a new comment that warns about possible cache coherency problems.
Tested only on i386 with -pre9, the PTRACE_PEEKUSR oops is fixed (ok, I've
tested pread from /proc/pid/mem, but that's the same code)
--
Manfred
<<<<<<<<<<<<<
--- 2.4/mm/memory.c Tue Dec 25 17:12:07 2001
+++ build-2.4/mm/memory.c Thu Feb 7 21:53:32 2002
@@ -442,6 +442,13 @@
return page;
}
+/*
+ * Please read Documentation/cachetlb.txt before using this function,
+ * accessing foreign memory spaces can cause cache coherency problems.
+ *
+ * Accessing a VM_IO area is even more dangerous, therefore the function
+ * fails if pages is != NULL and a VM_IO area is found.
+ */
int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start,
int len, int write, int force, struct page **pages, struct vm_area_struct **vmas)
{
@@ -453,6 +460,7 @@
vma = find_extend_vma(mm, start);
if ( !vma ||
+ (pages && vma->vm_flags & VM_IO) ||
(!force &&
((write && (!(vma->vm_flags & VM_WRITE))) ||
(!write && (!(vma->vm_flags & VM_READ))) ) )) {
@@ -486,8 +494,9 @@
/* FIXME: call the correct function,
* depending on the type of the found page
*/
- if (pages[i])
- page_cache_get(pages[i]);
+ if (!pages[i])
+ goto bad_page;
+ page_cache_get(pages[i]);
}
if (vmas)
vmas[i] = vma;
@@ -497,7 +506,19 @@
} while(len && start < vma->vm_end);
spin_unlock(&mm->page_table_lock);
} while(len);
+out:
return i;
+
+ /*
+ * We found an invalid page in the VMA. Release all we have
+ * so far and fail.
+ */
+bad_page:
+ spin_unlock(&mm->page_table_lock);
+ while (i--)
+ page_cache_release(pages[i]);
+ i = -EFAULT;
+ goto out;
}
/*
<<<<<<<<<<
On Thu, Feb 07, 2002 at 10:34:10PM +0100, Manfred Spraul wrote:
> On Wed, 6 Feb 2002, Andrew Morton wrote:
> > This patch doesn't fix the PTRACE_PEEKUSR bug - for that we need
> > this patch as well as the patch Andrea, Manfred and I pieced
> > together - it's at http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.18pre7aa2/00_get_user_pages-2
> > I understand that Manfred will be sending you a version of that patch.
> >
> My patch is below.
> The only difference between my and Andrea's version is one indentation and
> a new comment that warns about possible cache coherency problems.
Fine, thanks.
Andrea
Andrew Morton wrote:
>
> Is there a fast and portable way of doing this?
>
Seems that there isn't.
Here's what I have. We allow coredumps and ptrace peeking of:
- file mappings
- sysv ipc mappings
- shmem mappings
- /dev/mem if the mapping is page_struct backed
- /dev/kmem
- /dev/zero mappings
- socket mappings
- ftape's dma buffer
- many audio card DMA buffers
- videodev dma buffer mappings
Is anything missing?
--- linux-2.4.18-pre9/mm/mmap.c Mon Nov 5 21:01:12 2001
+++ linux-akpm/mm/mmap.c Thu Feb 7 19:21:02 2002
@@ -534,6 +534,11 @@ munmap_back:
}
vma->vm_file = file;
get_file(file);
+ /*
+ * Subdrivers can clear VM_IO if their mappings are
+ * valid pages inside mem_map[]
+ */
+ vma->vm_flags |= VM_IO;
error = file->f_op->mmap(file, vma);
if (error)
goto unmap_and_free_vma;
--- linux-2.4.18-pre9/mm/filemap.c Thu Feb 7 13:04:22 2002
+++ linux-akpm/mm/filemap.c Thu Feb 7 19:21:02 2002
@@ -2111,6 +2111,7 @@ int generic_file_mmap(struct file * file
return -ENOEXEC;
UPDATE_ATIME(inode);
vma->vm_ops = &generic_file_vm_ops;
+ vma->vm_flags &= ~VM_IO;
return 0;
}
--- linux-2.4.18-pre9/fs/ncpfs/mmap.c Mon Sep 10 09:04:53 2001
+++ linux-akpm/fs/ncpfs/mmap.c Thu Feb 7 19:21:02 2002
@@ -119,5 +119,6 @@ int ncp_mmap(struct file *file, struct v
}
vma->vm_ops = &ncp_file_mmap;
+ vma->vm_flags &= ~VM_IO;
return 0;
}
--- linux-2.4.18-pre9/ipc/shm.c Fri Dec 21 11:19:23 2001
+++ linux-akpm/ipc/shm.c Thu Feb 7 19:21:02 2002
@@ -159,6 +159,7 @@ static int shm_mmap(struct file * file,
{
UPDATE_ATIME(file->f_dentry->d_inode);
vma->vm_ops = &shm_vm_ops;
+ vma->vm_flags &= ~VM_IO;
shm_inc(file->f_dentry->d_inode->i_ino);
return 0;
}
--- linux-2.4.18-pre9/mm/shmem.c Fri Dec 21 11:19:23 2001
+++ linux-akpm/mm/shmem.c Thu Feb 7 19:21:02 2002
@@ -657,6 +657,7 @@ static int shmem_mmap(struct file * file
return -EACCES;
UPDATE_ATIME(inode);
vma->vm_ops = ops;
+ vma->vm_flags &= ~VM_IO;
return 0;
}
--- linux-2.4.18-pre9/drivers/char/mem.c Fri Dec 21 11:19:13 2001
+++ linux-akpm/drivers/char/mem.c Thu Feb 7 12:42:19 2002
@@ -198,10 +198,10 @@ static int mmap_mem(struct file * file,
vma->vm_flags |= VM_RESERVED;
/*
- * Don't dump addresses that are not real memory to a core file.
+ * Dump addresses that are real memory to a core file.
*/
- if (offset >= __pa(high_memory) || (file->f_flags & O_SYNC))
- vma->vm_flags |= VM_IO;
+ if (offset < __pa(high_memory) && !(file->f_flags & O_SYNC))
+ vma->vm_flags &= ~VM_IO;
if (remap_page_range(vma->vm_start, offset, vma->vm_end-vma->vm_start,
vma->vm_page_prot))
@@ -473,6 +473,7 @@ static int mmap_zero(struct file * file,
return shmem_zero_setup(vma);
if (zeromap_page_range(vma->vm_start, vma->vm_end - vma->vm_start, vma->vm_page_prot))
return -EAGAIN;
+ vma->vm_flags &= ~VM_IO;
return 0;
}
--- linux-2.4.18-pre9/net/socket.c Fri Dec 21 11:19:23 2001
+++ linux-akpm/net/socket.c Thu Feb 7 19:20:54 2002
@@ -705,6 +705,7 @@ static int sock_mmap(struct file * file,
{
struct socket *sock = socki_lookup(file->f_dentry->d_inode);
+ vma->vm_flags &= ~VM_IO;
return sock->ops->mmap(file, sock, vma);
}
--- linux-2.4.18-pre9/drivers/char/ftape/zftape/zftape-init.c Thu Sep 13 15:21:32 2001
+++ linux-akpm/drivers/char/ftape/zftape/zftape-init.c Thu Feb 7 19:29:19 2002
@@ -204,6 +204,7 @@ static int zft_mmap(struct file *filep,
sigfillset(¤t->blocked);
lock_kernel();
if ((result = ftape_mmap(vma)) >= 0) {
+ vma->vm_flags &= ~VM_IO;
#ifndef MSYNC_BUG_WAS_FIXED
static struct vm_operations_struct dummy = { NULL, };
vma->vm_ops = &dummy;
--- linux-2.4.18-pre9/drivers/sound/soundcard.c Sun Sep 30 12:26:08 2001
+++ linux-akpm/drivers/sound/soundcard.c Thu Feb 7 19:42:13 2002
@@ -481,6 +481,7 @@ static int sound_mmap(struct file *file,
return -EAGAIN;
}
+ vma->vm_flags &= ~VM_IO;
dmap->mapping_flags |= DMA_MAP_MAPPED;
if( audio_devs[dev]->d->mmap)
--- linux-2.4.18-pre9/drivers/sound/es1370.c Thu Feb 7 13:04:21 2002
+++ linux-akpm/drivers/sound/es1370.c Thu Feb 7 19:43:30 2002
@@ -1377,6 +1377,7 @@ static int es1370_mmap(struct file *file
ret = -EAGAIN;
goto out;
}
+ vma->vm_flags &= ~VM_IO;
db->mapped = 1;
out:
up(&s->sem);
--- linux-2.4.18-pre9/drivers/sound/maestro.c Sun Sep 30 12:26:08 2001
+++ linux-akpm/drivers/sound/maestro.c Thu Feb 7 19:44:18 2002
@@ -2512,6 +2512,7 @@ static int ess_mmap(struct file *file, s
ret = -EAGAIN;
if (remap_page_range(vma->vm_start, virt_to_phys(db->rawbuf), size, vma->vm_page_prot))
goto out;
+ vma->vm_flags &= ~VM_IO;
db->mapped = 1;
ret = 0;
out:
--- linux-2.4.18-pre9/drivers/sound/trident.c Thu Feb 7 13:04:21 2002
+++ linux-akpm/drivers/sound/trident.c Thu Feb 7 19:44:47 2002
@@ -2077,6 +2077,7 @@ static int trident_mmap(struct file *fil
if (remap_page_range(vma->vm_start, virt_to_phys(dmabuf->rawbuf),
size, vma->vm_page_prot))
goto out;
+ vma->vm_flags &= ~VM_IO;
dmabuf->mapped = 1;
ret = 0;
out:
--- linux-2.4.18-pre9/drivers/sound/es1371.c Thu Feb 7 13:04:21 2002
+++ linux-akpm/drivers/sound/es1371.c Thu Feb 7 19:45:54 2002
@@ -1566,6 +1566,7 @@ static int es1371_mmap(struct file *file
ret = -EAGAIN;
goto out;
}
+ vma->vm_flags &= ~VM_IO;
db->mapped = 1;
out:
up(&s->sem);
@@ -2133,6 +2134,7 @@ static int es1371_mmap_dac(struct file *
ret = -EAGAIN;
if (remap_page_range(vma->vm_start, virt_to_phys(s->dma_dac1.rawbuf), size, vma->vm_page_prot))
goto out;
+ vma->vm_flags &= ~VM_IO;
s->dma_dac1.mapped = 1;
ret = 0;
out:
--- linux-2.4.18-pre9/drivers/sound/cs4281/cs4281m.c Thu Feb 7 13:04:21 2002
+++ linux-akpm/drivers/sound/cs4281/cs4281m.c Thu Feb 7 19:46:24 2002
@@ -3239,6 +3239,7 @@ static int cs4281_mmap(struct file *file
if (remap_page_range
(vma->vm_start, virt_to_phys(db->rawbuf), size,
vma->vm_page_prot)) return -EAGAIN;
+ vma->vm_flags &= ~VM_IO;
db->mapped = 1;
CS_DBGOUT(CS_FUNCTION | CS_PARMS | CS_OPEN, 4,
--- linux-2.4.18-pre9/drivers/sound/cmpci.c Thu Nov 22 23:02:58 2001
+++ linux-akpm/drivers/sound/cmpci.c Thu Feb 7 19:46:51 2002
@@ -1754,6 +1754,7 @@ static int cm_mmap(struct file *file, st
ret = -EINVAL;
if (remap_page_range(vma->vm_start, virt_to_phys(db->rawbuf), size, vma->vm_page_prot))
goto out;
+ vma->vm_flags &= ~VM_IO;
db->mapped = 1;
ret = 0;
out:
--- linux-2.4.18-pre9/drivers/sound/i810_audio.c Thu Feb 7 13:04:21 2002
+++ linux-akpm/drivers/sound/i810_audio.c Thu Feb 7 19:47:19 2002
@@ -1672,6 +1672,7 @@ static int i810_mmap(struct file *file,
if (remap_page_range(vma->vm_start, virt_to_phys(dmabuf->rawbuf),
size, vma->vm_page_prot))
goto out;
+ vma->vm_flags &= ~VM_IO;
dmabuf->mapped = 1;
dmabuf->trigger = 0;
ret = 0;
--- linux-2.4.18-pre9/drivers/sound/sonicvibes.c Thu Feb 7 13:04:21 2002
+++ linux-akpm/drivers/sound/sonicvibes.c Thu Feb 7 19:47:35 2002
@@ -1551,6 +1551,7 @@ static int sv_mmap(struct file *file, st
ret = -EAGAIN;
if (remap_page_range(vma->vm_start, virt_to_phys(db->rawbuf), size, vma->vm_page_prot))
goto out;
+ vma->vm_flags &= ~VM_IO;
db->mapped = 1;
ret = 0;
out:
--- linux-2.4.18-pre9/drivers/sound/esssolo1.c Thu Feb 7 13:04:21 2002
+++ linux-akpm/drivers/sound/esssolo1.c Thu Feb 7 19:48:12 2002
@@ -1247,6 +1247,7 @@ static int solo1_mmap(struct file *file,
ret = -EAGAIN;
if (remap_page_range(vma->vm_start, virt_to_phys(db->rawbuf), size, vma->vm_page_prot))
goto out;
+ vma->vm_flags &= ~VM_IO;
db->mapped = 1;
ret = 0;
out:
--- linux-2.4.18-pre9/drivers/sound/maestro3.c Thu Nov 22 23:02:58 2001
+++ linux-akpm/drivers/sound/maestro3.c Thu Feb 7 19:48:49 2002
@@ -1557,6 +1557,7 @@ static int m3_mmap(struct file *file, st
ret = -EAGAIN;
if (remap_page_range(vma->vm_start, virt_to_phys(db->rawbuf), size, vma->vm_page_prot))
goto out;
+ vma->vm_flags &= ~VM_IO;
db->mapped = 1;
ret = 0;
--- linux-2.4.18-pre9/drivers/sound/ymfpci.c Thu Feb 7 13:04:21 2002
+++ linux-akpm/drivers/sound/ymfpci.c Thu Feb 7 19:49:14 2002
@@ -1507,6 +1507,7 @@ static int ymf_mmap(struct file *file, s
if (remap_page_range(vma->vm_start, virt_to_phys(dmabuf->rawbuf),
size, vma->vm_page_prot))
return -EAGAIN;
+ vma->vm_flags &= ~VM_IO;
dmabuf->mapped = 1;
/* P3 */ printk(KERN_INFO "ymfpci: using memory mapped sound, untested!\n");
--- linux-2.4.18-pre9/drivers/sound/cs46xx.c Thu Feb 7 13:04:21 2002
+++ linux-akpm/drivers/sound/cs46xx.c Thu Feb 7 19:50:02 2002
@@ -2468,6 +2468,7 @@ static int cs_mmap(struct file *file, st
ret = -EAGAIN;
goto out;
}
+ vma->vm_flags &= ~VM_IO;
dmabuf->mapped = 1;
CS_DBGOUT(CS_FUNCTION, 2, printk("cs46xx: cs_mmap()-\n") );
--- linux-2.4.18-pre9/drivers/sound/rme96xx.c Mon Nov 5 21:01:12 2001
+++ linux-akpm/drivers/sound/rme96xx.c Thu Feb 7 19:53:01 2002
@@ -1429,7 +1429,7 @@ static int rm96xx_mmap(struct file *file
/* this is the mapping */
-
+ vma->vm_flags &= ~VM_IO;
dma->mmapped = 1;
unlock_kernel();
return 0;
--- linux-2.4.18-pre9/drivers/sound/ite8172.c Mon Nov 5 21:01:12 2001
+++ linux-akpm/drivers/sound/ite8172.c Thu Feb 7 19:53:26 2002
@@ -1111,6 +1111,7 @@ static int it8172_mmap(struct file *file
unlock_kernel();
return -EAGAIN;
}
+ vma->vm_flags &= ~VM_IO;
db->mapped = 1;
unlock_kernel();
return 0;
--- linux-2.4.18-pre9/drivers/usb/audio.c Thu Feb 7 13:04:21 2002
+++ linux-akpm/drivers/usb/audio.c Thu Feb 7 19:57:42 2002
@@ -2341,6 +2341,7 @@ static int usb_audio_mmap(struct file *f
if (vma->vm_pgoff != 0)
goto out;
+ vma->vm_flags &= ~VM_IO;
ret = dmabuf_mmap(db, vma->vm_start, vma->vm_end - vma->vm_start, vma->vm_page_prot);
out:
unlock_kernel();
--- linux-2.4.18-pre9/drivers/media/video/videodev.c Thu Oct 11 09:14:32 2001
+++ linux-akpm/drivers/media/video/videodev.c Thu Feb 7 20:00:51 2002
@@ -208,6 +208,7 @@ int video_mmap(struct file *file, struct
lock_kernel();
ret = vfl->mmap(vfl, (char *)vma->vm_start,
(unsigned long)(vma->vm_end-vma->vm_start));
+ vma->vm_flags &= ~VM_IO;
unlock_kernel();
}
return ret;
From: Andrew Morton <[email protected]>
Date: Thu, 07 Feb 2002 21:05:05 -0800
Is anything missing?
ia64 perfmon mappings
>>>>> On Thu, 07 Feb 2002 21:16:02 -0800 (PST), "David S. Miller" <[email protected]> said:
DaveM> From: Andrew Morton <[email protected]> Date: Thu, 07 Feb
DaveM> 2002 21:05:05 -0800
DaveM> Is anything missing?
DaveM> ia64 perfmon mappings
I forwarded your comment to Stephane, the ia64 perfmon maintainer.
Thanks,
--david