2010-01-15 01:49:40

by Fengguang Wu

[permalink] [raw]
Subject: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos

From: KAMEZAWA Hiroyuki <[email protected]>

Now, rw_verify_area() checsk f_pos is negative or not. And if
negative, returns -EINVAL.

But, some special files as /dev/(k)mem and /proc/<pid>/mem etc..
has negative offsets. And we can't do any access via read/write
to the file(device).

So introduce FMODE_NEG_OFFSET to allow negative file offsets.

Changelog: v5->v6
- use FMODE_NEG_OFFSET (suggested by Al)
- rebased onto 2.6.33-rc1

Changelog: v4->v5
- clean up patches dor /dev/mem.
- rebased onto 2.6.32-rc1

Changelog: v3->v4
- make changes in mem.c aligned.
- change __negative_fpos_check() to return int.
- fixed bug in "pos" check.
- added comments.

Changelog: v2->v3
- fixed bug in rw_verify_area (it cannot be compiled)

CC: Al Viro <[email protected]>
CC: Heiko Carstens <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
drivers/char/mem.c | 4 ++++
fs/proc/base.c | 2 ++
fs/read_write.c | 21 +++++++++++++++++++--
include/linux/fs.h | 3 +++
4 files changed, 28 insertions(+), 2 deletions(-)

--- linux.orig/fs/read_write.c 2010-01-14 21:28:00.000000000 +0800
+++ linux/fs/read_write.c 2010-01-14 21:30:41.000000000 +0800
@@ -205,6 +205,20 @@ bad:
}
#endif

+static int
+__negative_fpos_check(struct file *file, loff_t pos, size_t count)
+{
+ /*
+ * pos or pos+count is negative here, check overflow.
+ * too big "count" will be caught in rw_verify_area().
+ */
+ if ((pos < 0) && (pos + count < pos))
+ return -EOVERFLOW;
+ if (file->f_mode & FMODE_NEG_OFFSET)
+ return 0;
+ return -EINVAL;
+}
+
/*
* rw_verify_area doesn't like huge counts. We limit
* them to something that fits in "int" so that others
@@ -222,8 +236,11 @@ int rw_verify_area(int read_write, struc
if (unlikely((ssize_t) count < 0))
return retval;
pos = *ppos;
- if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
- return retval;
+ if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
+ retval = __negative_fpos_check(file, pos, count);
+ if (retval)
+ return retval;
+ }

if (unlikely(inode->i_flock && mandatory_lock(inode))) {
retval = locks_mandatory_area(
--- linux.orig/include/linux/fs.h 2010-01-14 21:28:00.000000000 +0800
+++ linux/include/linux/fs.h 2010-01-14 21:32:24.000000000 +0800
@@ -93,6 +93,9 @@ struct inodes_stat_t {
/* Expect random access pattern */
#define FMODE_RANDOM ((__force fmode_t)0x1000)

+/* File is huge (eg. /dev/kmem): treat loff_t as unsigned */
+#define FMODE_NEG_OFFSET ((__force fmode_t)0x2000)
+
/*
* The below are the various read and write types that we support. Some of
* them include behavioral modifiers that send information down to the
--- linux.orig/drivers/char/mem.c 2010-01-14 21:28:00.000000000 +0800
+++ linux/drivers/char/mem.c 2010-01-14 21:33:20.000000000 +0800
@@ -861,6 +861,10 @@ static int memory_open(struct inode *ino
if (dev->dev_info)
filp->f_mapping->backing_dev_info = dev->dev_info;

+ /* Is /dev/mem or /dev/kmem ? */
+ if (dev->dev_info == &directly_mappable_cdev_bdi)
+ filp->f_mode |= FMODE_NEG_OFFSET;
+
if (dev->fops->open)
return dev->fops->open(inode, filp);

--- linux.orig/fs/proc/base.c 2010-01-14 21:28:00.000000000 +0800
+++ linux/fs/proc/base.c 2010-01-14 21:37:08.000000000 +0800
@@ -861,6 +861,8 @@ static const struct file_operations proc
static int mem_open(struct inode* inode, struct file* file)
{
file->private_data = (void*)((long)current->self_exec_id);
+ /* OK to pass negative loff_t, we can catch out-of-range */
+ file->f_mode |= FMODE_NEG_OFFSET;
return 0;
}



2010-01-18 00:18:50

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos

On Sat, 16 Jan 2010 21:54:39 +0900
OGAWA Hirofumi <[email protected]> wrote:

> Wu Fengguang <[email protected]> writes:
>
> > +static int
> > +__negative_fpos_check(struct file *file, loff_t pos, size_t count)
> > +{
> > + /*
> > + * pos or pos+count is negative here, check overflow.
> > + * too big "count" will be caught in rw_verify_area().
> > + */
> > + if ((pos < 0) && (pos + count < pos))
> > + return -EOVERFLOW;
> > + if (file->f_mode & FMODE_NEG_OFFSET)
> > + return 0;
> > + return -EINVAL;
> > +}
> > +
> > /*
> > * rw_verify_area doesn't like huge counts. We limit
> > * them to something that fits in "int" so that others
> > @@ -222,8 +236,11 @@ int rw_verify_area(int read_write, struc
> > if (unlikely((ssize_t) count < 0))
> > return retval;
> > pos = *ppos;
> > - if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> > - return retval;
> > + if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
> > + retval = __negative_fpos_check(file, pos, count);
> > + if (retval)
> > + return retval;
> > + }
> >
> > if (unlikely(inode->i_flock && mandatory_lock(inode))) {
> > retval = locks_mandatory_area(
>
> Um... How do lseek() work? It sounds like to violate error code range.

This is for read-write. As far as I know,
- generic_file_llseek,
- default_llseek
- no_llseek

doesn't call this function.

Thanks,
-Kame

2010-01-18 01:18:05

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos

KAMEZAWA Hiroyuki <[email protected]> writes:

>> > +static int
>> > +__negative_fpos_check(struct file *file, loff_t pos, size_t count)
>> > +{
>> > + /*
>> > + * pos or pos+count is negative here, check overflow.
>> > + * too big "count" will be caught in rw_verify_area().
>> > + */
>> > + if ((pos < 0) && (pos + count < pos))
>> > + return -EOVERFLOW;
>> > + if (file->f_mode & FMODE_NEG_OFFSET)
>> > + return 0;
>> > + return -EINVAL;
>> > +}
>> > +
>> > /*
>> > * rw_verify_area doesn't like huge counts. We limit
>> > * them to something that fits in "int" so that others
>> > @@ -222,8 +236,11 @@ int rw_verify_area(int read_write, struc
>> > if (unlikely((ssize_t) count < 0))
>> > return retval;
>> > pos = *ppos;
>> > - if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
>> > - return retval;
>> > + if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
>> > + retval = __negative_fpos_check(file, pos, count);
>> > + if (retval)
>> > + return retval;
>> > + }
>> >
>> > if (unlikely(inode->i_flock && mandatory_lock(inode))) {
>> > retval = locks_mandatory_area(
>>
>> Um... How do lseek() work? It sounds like to violate error code range.
>
> This is for read-write. As far as I know,
> - generic_file_llseek,
> - default_llseek
> - no_llseek
>
> doesn't call this function.

It seems to allow to set negative value to ->f_pos, right? So, lseek()
returns (uses) it?

Thanks.
--
OGAWA Hirofumi <[email protected]>

2010-01-18 01:29:14

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos

On Mon, 18 Jan 2010 10:17:48 +0900
OGAWA Hirofumi <[email protected]> wrote:

> KAMEZAWA Hiroyuki <[email protected]> writes:
>
> >> > +static int
> >> > +__negative_fpos_check(struct file *file, loff_t pos, size_t count)
> >> > +{
> >> > + /*
> >> > + * pos or pos+count is negative here, check overflow.
> >> > + * too big "count" will be caught in rw_verify_area().
> >> > + */
> >> > + if ((pos < 0) && (pos + count < pos))
> >> > + return -EOVERFLOW;
> >> > + if (file->f_mode & FMODE_NEG_OFFSET)
> >> > + return 0;
> >> > + return -EINVAL;
> >> > +}
> >> > +
> >> > /*
> >> > * rw_verify_area doesn't like huge counts. We limit
> >> > * them to something that fits in "int" so that others
> >> > @@ -222,8 +236,11 @@ int rw_verify_area(int read_write, struc
> >> > if (unlikely((ssize_t) count < 0))
> >> > return retval;
> >> > pos = *ppos;
> >> > - if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> >> > - return retval;
> >> > + if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
> >> > + retval = __negative_fpos_check(file, pos, count);
> >> > + if (retval)
> >> > + return retval;
> >> > + }
> >> >
> >> > if (unlikely(inode->i_flock && mandatory_lock(inode))) {
> >> > retval = locks_mandatory_area(
> >>
> >> Um... How do lseek() work? It sounds like to violate error code range.
> >
> > This is for read-write. As far as I know,
> > - generic_file_llseek,
> > - default_llseek
> > - no_llseek
> >
> > doesn't call this function.
>
> It seems to allow to set negative value to ->f_pos, right?
yes. Some file (/dev/kmem) requires that.

> So, lseek() returns (uses) it?

lseek can return negative value, as far as I know.

Thanks,
-Kame

2010-01-18 01:32:55

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos

OGAWA Hirofumi <[email protected]> writes:

>>> Um... How do lseek() work? It sounds like to violate error code range.
>>
>> This is for read-write. As far as I know,
>> - generic_file_llseek,
>> - default_llseek
>> - no_llseek
>>
>> doesn't call this function.
>
> It seems to allow to set negative value to ->f_pos, right? So, lseek()
> returns (uses) it?

BTW, another concern by negative "pos" value is, the following like code

pos >> shift_bits

it will break the above. So, I think it should be checked if not yet.
--
OGAWA Hirofumi <[email protected]>

2010-01-18 01:38:32

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos

KAMEZAWA Hiroyuki <[email protected]> writes:

>> So, lseek() returns (uses) it?
>
> lseek can return negative value, as far as I know.

Umm..., how do you know the difference of -EOVERFLOW and fpos == -75?

Thanks.
--
OGAWA Hirofumi <[email protected]>

2010-01-18 01:52:40

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos

On Mon, 18 Jan 2010 10:32:49 +0900
OGAWA Hirofumi <[email protected]> wrote:

> OGAWA Hirofumi <[email protected]> writes:
>
> >>> Um... How do lseek() work? It sounds like to violate error code range.
> >>
> >> This is for read-write. As far as I know,
> >> - generic_file_llseek,
> >> - default_llseek
> >> - no_llseek
> >>
> >> doesn't call this function.
> >
> > It seems to allow to set negative value to ->f_pos, right? So, lseek()
> > returns (uses) it?
>
> BTW, another concern by negative "pos" value is, the following like code
>
> pos >> shift_bits
>
> it will break the above. So, I think it should be checked if not yet.

Where do we check ?

FMODE_NEG_OFFSET is just used by /dev/mem and /proc/<pid>/mem. And I don't
think there are no additonal users. So, I myself don't have has such concerns...


Thanks,
-Kame

2010-01-18 01:59:18

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos

KAMEZAWA Hiroyuki <[email protected]> writes:

>> BTW, another concern by negative "pos" value is, the following like code
>>
>> pos >> shift_bits
>>
>> it will break the above. So, I think it should be checked if not yet.
>
> Where do we check ?
>
> FMODE_NEG_OFFSET is just used by /dev/mem and /proc/<pid>/mem. And I don't
> think there are no additonal users. So, I myself don't have has such concerns...

Sorry, it's just my concern. I'm not checking real path (e.g. vfs) of
related to /dev/mem, if there is no user of such code, it's ok.

Thanks.
--
OGAWA Hirofumi <[email protected]>

2010-01-18 02:03:43

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos

On Mon, 18 Jan 2010 10:38:27 +0900
OGAWA Hirofumi <[email protected]> wrote:

> KAMEZAWA Hiroyuki <[email protected]> writes:
>
> >> So, lseek() returns (uses) it?
> >
> > lseek can return negative value, as far as I know.
>
> Umm..., how do you know the difference of -EOVERFLOW and fpos == -75?
>

Ah, sorry. I read wrong.

For /dev/mem, it uses its own lseek function which allows negative f_pos
value. Other usual file system doesn't allow negative f_pos.

It's ok not to return -EOVEFLOW for /dev/mem because there is no file end.

Thanks,
-Kame

2010-01-18 02:13:11

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos

KAMEZAWA Hiroyuki <[email protected]> writes:

> On Mon, 18 Jan 2010 10:38:27 +0900
> OGAWA Hirofumi <[email protected]> wrote:
>
>> KAMEZAWA Hiroyuki <[email protected]> writes:
>>
>> >> So, lseek() returns (uses) it?
>> >
>> > lseek can return negative value, as far as I know.
>>
>> Umm..., how do you know the difference of -EOVERFLOW and fpos == -75?
>>
>
> Ah, sorry. I read wrong.
>
> For /dev/mem, it uses its own lseek function which allows negative f_pos
> value. Other usual file system doesn't allow negative f_pos.
>
> It's ok not to return -EOVEFLOW for /dev/mem because there is no file end.

No, no. I think it has the problem.

E.g. /dev/mem returns -75 as fpos, so, lseek(2) returns -75 to
userland. Then the userland (e.g. glibc) convert it as
error. I.e. finally, errno == -75, and lseek(3) returns -1, right?

Thanks.
--
OGAWA Hirofumi <[email protected]>

2010-01-18 02:33:49

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos

On Mon, 18 Jan 2010 11:13:04 +0900
OGAWA Hirofumi <[email protected]> wrote:

> KAMEZAWA Hiroyuki <[email protected]> writes:
>
> > On Mon, 18 Jan 2010 10:38:27 +0900
> > OGAWA Hirofumi <[email protected]> wrote:
> >
> >> KAMEZAWA Hiroyuki <[email protected]> writes:
> >>
> >> >> So, lseek() returns (uses) it?
> >> >
> >> > lseek can return negative value, as far as I know.
> >>
> >> Umm..., how do you know the difference of -EOVERFLOW and fpos == -75?
> >>
> >
> > Ah, sorry. I read wrong.
> >
> > For /dev/mem, it uses its own lseek function which allows negative f_pos
> > value. Other usual file system doesn't allow negative f_pos.
> >
> > It's ok not to return -EOVEFLOW for /dev/mem because there is no file end.
>
> No, no. I think it has the problem.
>
> E.g. /dev/mem returns -75 as fpos, so, lseek(2) returns -75 to
> userland. Then the userland (e.g. glibc) convert it as
> error. I.e. finally, errno == -75, and lseek(3) returns -1, right?
>
Maybe possible.

Hmm. Then, /dev/mem's llseek need some fix not to return pos < -PAGESIZE.
Wu-san, could you add additional bug fix to lseek()'s f_pos handling in
/dev/mem ?

Thanks,
-Kame

2010-01-18 03:16:29

by Fengguang Wu

[permalink] [raw]
Subject: RE: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos

Hi,

[replying from webmail, sorry for top-posting]

memory_lseek() calls force_successful_syscall_return() to force success on negative vals.
However that is a no-op for x86.

My experiment shows that lseek() does return negative pos. However,
manual says that "a value of (off_t) -1 is returned" on error. So it's OK
as long as your program is written as "err == -1" instead of "err < 0".

code:
err = lseek64(fd, addr, SEEK_SET);
if (err == -1)
perror("seek " FILENAME);

output:
# kmem-rw 0xffffffffa0094000
addr=0xffffffffa0094000 val=0x441f0fe5894855

strace:
open("/dev/kmem", O_RDWR) = 3
lseek(3, 18446744072099545088, SEEK_SET) = 18446744072099545088
read(3, "UH\211\345\17\37D\0"..., 8) = 8

Thanks,
Fengguang
________________________________________
From: KAMEZAWA Hiroyuki [[email protected]]
Sent: Monday, January 18, 2010 10:30 AM
To: OGAWA Hirofumi
Cc: Wu, Fengguang; Andrew Morton; Al Viro; Heiko Carstens; Christoph Hellwig; LKML; Eric Paris; Nick Piggin; Andi Kleen; David Howells; Jonathan Corbet; [email protected]
Subject: Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos

On Mon, 18 Jan 2010 11:13:04 +0900
OGAWA Hirofumi <[email protected]> wrote:

> KAMEZAWA Hiroyuki <[email protected]> writes:
>
> > On Mon, 18 Jan 2010 10:38:27 +0900
> > OGAWA Hirofumi <[email protected]> wrote:
> >
> >> KAMEZAWA Hiroyuki <[email protected]> writes:
> >>
> >> >> So, lseek() returns (uses) it?
> >> >
> >> > lseek can return negative value, as far as I know.
> >>
> >> Umm..., how do you know the difference of -EOVERFLOW and fpos == -75?
> >>
> >
> > Ah, sorry. I read wrong.
> >
> > For /dev/mem, it uses its own lseek function which allows negative f_pos
> > value. Other usual file system doesn't allow negative f_pos.
> >
> > It's ok not to return -EOVEFLOW for /dev/mem because there is no file end.
>
> No, no. I think it has the problem.
>
> E.g. /dev/mem returns -75 as fpos, so, lseek(2) returns -75 to
> userland. Then the userland (e.g. glibc) convert it as
> error. I.e. finally, errno == -75, and lseek(3) returns -1, right?
>
Maybe possible.

Hmm. Then, /dev/mem's llseek need some fix not to return pos < -PAGESIZE.
Wu-san, could you add additional bug fix to lseek()'s f_pos handling in
/dev/mem ?

Thanks,
-Kame

2010-01-18 03:25:42

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos

On Mon, 18 Jan 2010 11:15:38 +0800
"Wu, Fengguang" <[email protected]> wrote:

> Hi,
>
> [replying from webmail, sorry for top-posting]
>
> memory_lseek() calls force_successful_syscall_return() to force success on negative vals.
> However that is a no-op for x86.
>
> My experiment shows that lseek() does return negative pos. However,
> manual says that "a value of (off_t) -1 is returned" on error. So it's OK
> as long as your program is written as "err == -1" instead of "err < 0".
>
On error, the kernel returns -EOVERFLOW (via %eax) and libc hides
it by
errno = EOVERFLOW
ret = -1

The problem discussed here is the kernel's return value. So, the kernel's
lseek should check that, I think.

Anyway, this lseek problem is not related to this patch itself and has
existed for very long time. Fixing it later by another patch is not very
bad, I think.
(I'm sorry I myself is not ready for writing a patch...)

Thaks,
-Kame

2010-01-18 05:29:11

by Fengguang Wu

[permalink] [raw]
Subject: RE: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos


> On error, the kernel returns -EOVERFLOW (via %eax) and libc hides
> it by
> errno = EOVERFLOW
> ret = -1

Ah got it. How about the attached patch?

Thanks,
Fengguang


Attachments:
mem-seek-fix (1.37 kB)
mem-seek-fix

2010-01-19 00:40:53

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos

On Mon, 18 Jan 2010 13:26:44 +0800
"Wu, Fengguang" <[email protected]> wrote:

>
> > On error, the kernel returns -EOVERFLOW (via %eax) and libc hides
> > it by
> > errno = EOVERFLOW
> > ret = -1
>
> Ah got it. How about the attached patch?
>

Seems good to me. Thank you very much.

Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>

2010-01-16 12:54:52

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos

Wu Fengguang <[email protected]> writes:

> +static int
> +__negative_fpos_check(struct file *file, loff_t pos, size_t count)
> +{
> + /*
> + * pos or pos+count is negative here, check overflow.
> + * too big "count" will be caught in rw_verify_area().
> + */
> + if ((pos < 0) && (pos + count < pos))
> + return -EOVERFLOW;
> + if (file->f_mode & FMODE_NEG_OFFSET)
> + return 0;
> + return -EINVAL;
> +}
> +
> /*
> * rw_verify_area doesn't like huge counts. We limit
> * them to something that fits in "int" so that others
> @@ -222,8 +236,11 @@ int rw_verify_area(int read_write, struc
> if (unlikely((ssize_t) count < 0))
> return retval;
> pos = *ppos;
> - if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> - return retval;
> + if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
> + retval = __negative_fpos_check(file, pos, count);
> + if (retval)
> + return retval;
> + }
>
> if (unlikely(inode->i_flock && mandatory_lock(inode))) {
> retval = locks_mandatory_area(

Um... How do lseek() work? It sounds like to violate error code range.
--
OGAWA Hirofumi <[email protected]>