In drivers/char/mem.c do_write_mem can return -EFAULT but write_kmem forgets
this and goes blindly.
Note: /dev/kmem can be written to only by root, so this *cannot* have security
implications.
Also, do_write_mem takes two unused params and is static - so I've removed
those. I actually double-checked this - however please test compilation on
Sparc/m68k, since there are some #ifdef.
CC me on replies as I'm not subscribed. Thanks.
--
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729
BlaisorBlade <[email protected]> wrote:
>
> In drivers/char/mem.c do_write_mem can return -EFAULT but write_kmem forgets
> this and goes blindly.
>
> Note: /dev/kmem can be written to only by root, so this *cannot* have security
> implications.
>
> Also, do_write_mem takes two unused params and is static - so I've removed
> those. I actually double-checked this - however please test compilation on
> Sparc/m68k, since there are some #ifdef.
>
It's a small thing, but:
> --- ./drivers/char/mem.c.fix 2004-02-20 16:27:21.000000000 +0100
> +++ ./drivers/char/mem.c 2004-03-08 12:17:23.000000000 +0100
> @@ -96,13 +96,14 @@
> }
> #endif
>
> -static ssize_t do_write_mem(struct file * file, void *p, unsigned long realp,
> - const char * buf, size_t count, loff_t *ppos)
> +static ssize_t do_write_mem(void *p, const char * buf, size_t count,
> + loff_t *ppos)
> {
> - ssize_t written;
> + ssize_t written = 0;
>
> - written = 0;
> #if defined(__sparc__) || (defined(__mc68000__) && defined(CONFIG_MMU))
> + unsigned long realp = *ppos;
> +
A thread which shares this fd can alter the value at *ppos at any time via
lseek() .
> /* we don't have page 0 mapped on sparc and m68k.. */
> if (realp < PAGE_SIZE) {
> unsigned long sz = PAGE_SIZE-realp;
> @@ -165,7 +166,7 @@
>
> if (!valid_phys_addr_range(p, &count))
> return -EFAULT;
> - return do_write_mem(file, __va(p), p, buf, count, ppos);
> + return do_write_mem(__va(p), buf, count, ppos);
> }
>
> static int mmap_mem(struct file * file, struct vm_area_struct * vma)
> @@ -276,7 +277,9 @@
> if (count > (unsigned long) high_memory - p)
> wrote = (unsigned long) high_memory - p;
Here we have applied a range check.
> - wrote = do_write_mem(file, (void*)p, p, buf, wrote, ppos);
> + wrote = do_write_mem((void*)p, buf, wrote, ppos);
But here we go off and use *ppos, which may now have a different value from
that which we just range-checked.
Alle 22:46, marted? 9 marzo 2004, Andrew Morton ha scritto:
> BlaisorBlade <[email protected]> wrote:
> > In drivers/char/mem.c do_write_mem can return -EFAULT but write_kmem
> > forgets this and goes blindly.
First: do not forget this first fix.
> > Also, do_write_mem takes two unused params and is static - so I've
> > removed those.
The "file" parameter is anyway unused - so it can be removed; for the ppos
parameter your reasoning holds. I'm posting the patch for these two things
against 2.6.4 (now I do not remove realp so I avoid the race you describe).
Thanks for answering me - you never lose a patch!
> > I actually double-checked this - however please test
> > compilation on Sparc/m68k, since there are some #ifdef.
>
> It's a small thing, but:
> > --- ./drivers/char/mem.c.fix 2004-02-20 16:27:21.000000000 +0100
> > +++ ./drivers/char/mem.c 2004-03-08 12:17:23.000000000 +0100
> > @@ -96,13 +96,14 @@
> > }
> > #endif
> >
> > -static ssize_t do_write_mem(struct file * file, void *p, unsigned long
> > realp, - const char * buf, size_t count, loff_t *ppos)
> > +static ssize_t do_write_mem(void *p, const char * buf, size_t count,
> > + loff_t *ppos)
> > {
> > - ssize_t written;
> > + ssize_t written = 0;
> >
> > - written = 0;
> > #if defined(__sparc__) || (defined(__mc68000__) && defined(CONFIG_MMU))
> > + unsigned long realp = *ppos;
> > +
>
> A thread which shares this fd can alter the value at *ppos at any time via
> lseek() .
Well, you are right (not found any locking). I've understood that lseek() is
known to be racy, and that the bug in my patch is just the range check that
does not work any more; but anyway the marked line in the 2.6.3/2.6.4 kernel
is racy, too, and I'm unsure if that is allowed; i.e. man 2 write claims even
that "POSIX requires that a
read() which can be proved to occur after a write() has returned
returns the new data. Note that not all file systems are POSIX con-
forming." which would imply that write is not racy. Or not?
static ssize_t do_write_mem(struct file * file, void *p, unsigned long realp,
const char * buf, size_t count, loff_t *ppos)
{
ssize_t written;
written = 0;
#if defined(__sparc__) || (defined(__mc68000__) && defined(CONFIG_MMU))
//[...]
#endif
if (copy_from_user(p, buf, count))
return -EFAULT;
written += count;
*ppos += written; /*If a second thread lseek()'d between when we
read realp and this line, we have a race*/
return written;
}
--
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729
BlaisorBlade <[email protected]> wrote:
>
> Alle 22:46, marted? 9 marzo 2004, Andrew Morton ha scritto:
> > BlaisorBlade <[email protected]> wrote:
> > > In drivers/char/mem.c do_write_mem can return -EFAULT but write_kmem
> > > forgets this and goes blindly.
>
> First: do not forget this first fix.
Actually, you converted this code from "wrong" to "still wrong". How's this?
- remove unused `file *' arg from do_write_mem()
- Add checking for copy_from_user() failures in do_write_mem()
- Return correct value from kmem writes() when a fault is encountered. A
write()-style syscall's return values are:
0: nothing was written and there was no error (someone tried to write
zero bytes)
>0: the number of bytes copied, whether or not there was an error.
Userspace detects errors by noting that the write() return value is less
than was requested.
<0: there was an error and no bytes were copied
---
25-akpm/drivers/char/mem.c | 35 ++++++++++++++++++++++++++---------
1 files changed, 26 insertions(+), 9 deletions(-)
diff -puN drivers/char/mem.c~do_write_mem-retval-check drivers/char/mem.c
--- 25/drivers/char/mem.c~do_write_mem-retval-check 2004-03-13 16:01:00.941435904 -0800
+++ 25-akpm/drivers/char/mem.c 2004-03-13 16:08:57.522984496 -0800
@@ -105,10 +105,11 @@ static inline int valid_phys_addr_range(
}
#endif
-static ssize_t do_write_mem(struct file * file, void *p, unsigned long realp,
+static ssize_t do_write_mem(void *p, unsigned long realp,
const char * buf, size_t count, loff_t *ppos)
{
ssize_t written;
+ unsigned long copied;
written = 0;
#if defined(__sparc__) || (defined(__mc68000__) && defined(CONFIG_MMU))
@@ -123,8 +124,14 @@ static ssize_t do_write_mem(struct file
written+=sz;
}
#endif
- if (copy_from_user(p, buf, count))
+ copied = copy_from_user(p, buf, count);
+ if (copied) {
+ ssize_t ret = written + (count - copied);
+
+ if (ret)
+ return ret;
return -EFAULT;
+ }
written += count;
*ppos += written;
return written;
@@ -174,7 +181,7 @@ static ssize_t write_mem(struct file * f
if (!valid_phys_addr_range(p, &count))
return -EFAULT;
- return do_write_mem(file, __va(p), p, buf, count, ppos);
+ return do_write_mem(__va(p), p, buf, count, ppos);
}
static int mmap_mem(struct file *file, struct vm_area_struct *vma)
@@ -274,15 +281,19 @@ static ssize_t write_kmem(struct file *
unsigned long p = *ppos;
ssize_t wrote = 0;
ssize_t virtr = 0;
+ ssize_t written;
char * kbuf; /* k-addr because vwrite() takes vmlist_lock rwlock */
if (p < (unsigned long) high_memory) {
+
wrote = count;
if (count > (unsigned long) high_memory - p)
wrote = (unsigned long) high_memory - p;
- wrote = do_write_mem(file, (void*)p, p, buf, wrote, ppos);
-
+ written = do_write_mem((void*)p, p, buf, wrote, ppos);
+ if (written != wrote)
+ return written;
+ wrote = written;
p += wrote;
buf += wrote;
count -= wrote;
@@ -291,15 +302,21 @@ static ssize_t write_kmem(struct file *
if (count > 0) {
kbuf = (char *)__get_free_page(GFP_KERNEL);
if (!kbuf)
- return -ENOMEM;
+ return wrote ? wrote : -ENOMEM;
while (count > 0) {
int len = count;
if (len > PAGE_SIZE)
len = PAGE_SIZE;
- if (len && copy_from_user(kbuf, buf, len)) {
- free_page((unsigned long)kbuf);
- return -EFAULT;
+ if (len) {
+ written = copy_from_user(kbuf, buf, len);
+ if (written != len) {
+ ssize_t ret;
+
+ free_page((unsigned long)kbuf);
+ ret = wrote + virtr + (len - written)
+ return ret ? ret : -EFAULT;
+ }
}
len = vwrite(kbuf, (char *)p, len);
count -= len;
_
Alle 01:15, domenica 14 marzo 2004, Andrew Morton ha scritto:
> BlaisorBlade <[email protected]> wrote:
> > Alle 22:46, marted? 9 marzo 2004, Andrew Morton ha scritto:
> > > BlaisorBlade <[email protected]> wrote:
> > > > In drivers/char/mem.c do_write_mem can return -EFAULT but write_kmem
> > > > forgets this and goes blindly.
>
> Actually, you converted this code from "wrong" to "still wrong". How's
> this?
[...]
Sorry for not answering soon, but there is a *bug*:
+ written = copy_from_user(kbuf, buf, len);
+ if (written != len) {
On success copy_from_user returns 0, so you should check
"if (written != 0)".
The attached patch addresses also this and renames the var to "unwritten".
> - Return correct value from kmem writes() when a fault is encountered. A
> write()-style syscall's return values are:
>
> 0: nothing was written and there was no error (someone tried to write
> zero bytes)
>
> >0: the number of bytes copied, whether or not there was an error.
>
> Userspace detects errors by noting that the write() return value is less
> than was requested.
- I've checked the code of 2.4 do_generic_file_write() which complies with
your description, but man 2 write is not clear on this (this is possibly why
who wrote that code missed this).
- Also, it seems read() should have the same return values (I checked 2.4 and
2.6 generic_file_read() ); so even the read_kmem() and read_mem must be
fixed. I'm attaching the fix which also creates a do_read_mem() like
do_write_mem().
- More important, this check is at least strange:
#if defined(__sparc__) || (defined(__mc68000__) && defined(CONFIG_MMU))
/* we don't have page 0 mapped on sparc and m68k.. */
if (p < PAGE_SIZE) {
This is checked both when p is a physical address and when it is a virtual
one, for both write() and read(). The non existing page 0 is the physical
page 0 or the virtual page 0? I've not changed it, however.
- Also, generic_file_write() (both 2.4 and 2.6) calls down(&inode->i_sem);,
while write_mem() and write_kmem() do not; shouldn't they be fixed?
- Finally, do_write_mem() does *ppos += written; since ppos could have been
changed, this is wrong, or not? It has no effect for write_kmem() that sets a
correct value after, but seems a bug for write_mem().
By the way: I've sent your first write() fix, just adapted a bit for 2.4, to
Marcelo Tosatti. I'll post this second one for 2.4 if it's Ok for you.
CC me as I'm not subscribed, please.
Thanks a lot for your attention!
--
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729