2006-12-04 16:26:49

by Chen, Kenneth W

[permalink] [raw]
Subject: [patch] remove redundant iov segment check

The access_ok() and negative length check on each iov segment in function
generic_file_aio_read/write are redundant. They are all already checked
before calling down to these low level generic functions.

Vector I/O (both sync and async) are checked via rw_copy_check_uvector().
For single segment synchronous I/O, the checks are done in various places:
first access_ok() is checked in vfs_read/write, the negative length is
checked in rw_verify_area. For single segment AIO, access_ok() is done
in aio_setup_iocb, and negative length is checked in io_submit_one.

So it's not possible to call down to generic_file_aio_read/write with invalid
iov segment. Patch proposed to delete these redundant code. Also moved
negative length check for single segment AIO into aio_setup_single_vector
to somewhat mirror aio_setup_vectored_rw that they check illegal negative
length. It fits better there too because iocb->aio_nbytes is now double
duty as segment count for vectored AIO, and checking negative length in
the entry point of all AIO isn't very obvious.


Signed-off-by: Ken Chen <[email protected]>


diff -Nurp linux-2.6.19/fs/aio.c linux-2.6.19.ken/fs/aio.c
--- linux-2.6.19/fs/aio.c 2006-11-29 13:57:37.000000000 -0800
+++ linux-2.6.19.ken/fs/aio.c 2006-12-03 17:16:52.000000000 -0800
@@ -1416,6 +1416,8 @@ static ssize_t aio_setup_single_vector(s
kiocb->ki_nr_segs = 1;
kiocb->ki_cur_seg = 0;
kiocb->ki_nbytes = kiocb->ki_left;
+ if (unlikely((ssize_t) kiocb->ki_nbytes < 0))
+ return -EINVAL;
return 0;
}

@@ -1560,8 +1562,7 @@ int fastcall io_submit_one(struct kioctx
/* prevent overflows */
if (unlikely(
(iocb->aio_buf != (unsigned long)iocb->aio_buf) ||
- (iocb->aio_nbytes != (size_t)iocb->aio_nbytes) ||
- ((ssize_t)iocb->aio_nbytes < 0)
+ (iocb->aio_nbytes != (size_t)iocb->aio_nbytes)
)) {
pr_debug("EINVAL: io_submit: overflow check\n");
return -EINVAL;
diff -Nurp linux-2.6.19/mm/filemap.c linux-2.6.19.ken/mm/filemap.c
--- linux-2.6.19/mm/filemap.c 2006-11-29 13:57:37.000000000 -0800
+++ linux-2.6.19.ken/mm/filemap.c 2006-12-03 17:16:10.000000000 -0800
@@ -1143,29 +1143,9 @@ generic_file_aio_read(struct kiocb *iocb
struct file *filp = iocb->ki_filp;
ssize_t retval;
unsigned long seg;
- size_t count;
+ size_t count = iocb->ki_left;
loff_t *ppos = &iocb->ki_pos;

- count = 0;
- for (seg = 0; seg < nr_segs; seg++) {
- const struct iovec *iv = &iov[seg];
-
- /*
- * If any segment has a negative length, or the cumulative
- * length ever wraps negative then return -EINVAL.
- */
- count += iv->iov_len;
- if (unlikely((ssize_t)(count|iv->iov_len) < 0))
- return -EINVAL;
- if (access_ok(VERIFY_WRITE, iv->iov_base, iv->iov_len))
- continue;
- if (seg == 0)
- return -EFAULT;
- nr_segs = seg;
- count -= iv->iov_len; /* This segment is no good */
- break;
- }
-
/* coalesce the iovecs and go direct-to-BIO for O_DIRECT */
if (filp->f_flags & O_DIRECT) {
loff_t size;
@@ -2228,32 +2208,11 @@ __generic_file_aio_write_nolock(struct k
size_t ocount; /* original count */
size_t count; /* after file limit checks */
struct inode *inode = mapping->host;
- unsigned long seg;
loff_t pos;
ssize_t written;
ssize_t err;

- ocount = 0;
- for (seg = 0; seg < nr_segs; seg++) {
- const struct iovec *iv = &iov[seg];
-
- /*
- * If any segment has a negative length, or the cumulative
- * length ever wraps negative then return -EINVAL.
- */
- ocount += iv->iov_len;
- if (unlikely((ssize_t)(ocount|iv->iov_len) < 0))
- return -EINVAL;
- if (access_ok(VERIFY_READ, iv->iov_base, iv->iov_len))
- continue;
- if (seg == 0)
- return -EFAULT;
- nr_segs = seg;
- ocount -= iv->iov_len; /* This segment is no good */
- break;
- }
-
- count = ocount;
+ count = ocount = iocb->ki_left;
pos = *ppos;

vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);


2006-12-04 19:18:42

by Zach Brown

[permalink] [raw]
Subject: Re: [patch] remove redundant iov segment check


On Dec 4, 2006, at 8:26 AM, Chen, Kenneth W wrote:

> The access_ok() and negative length check on each iov segment in
> function
> generic_file_aio_read/write are redundant. They are all already
> checked
> before calling down to these low level generic functions.

...

> So it's not possible to call down to generic_file_aio_read/write
> with invalid
> iov segment.

Well, generic_file_aio_{read,write}() are exported to modules, so
anything's *possible*. :)

This change makes me nervous because it relies on our ability to
audit all code paths to ensure that it's correct. It'd be nice if
the code enforced the rules.

I wonder if it wouldn't be better to make this change as part of a
larger change that moves towards an explicit iovec container struct
rather than bare 'struct iov *' and 'nr_segs' arguments. The struct
could have a flag that expressed whether the elements had been
checked. A helper could be called by the upper and lower code paths
which does the checking, marks the flag, and avoids checking again if
the flag is set.

We've wanted an explicit struct in the past to avoid the multiple
walks of iovecs that various paths do for their own reasons. The
iovec walk that is checking for length wrapping could also be
building a bitmap of length alignment that O_DIRECT could be using to
test 512B alignment without having to walk the iovec again.

I started on such a patch set at some point in the murky past. It
got kind of gross when it got to paths that want to modify the iovec
in place. It's all doable, it'll just take some work. Christoph and
I have discussed it in the past, I wonder if he has any thing like
this going.

- z

2006-12-04 19:36:15

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch] remove redundant iov segment check

On Mon, 4 Dec 2006 08:26:36 -0800
"Chen, Kenneth W" <[email protected]> wrote:

> So it's not possible to call down to generic_file_aio_read/write with invalid
> iov segment. Patch proposed to delete these redundant code.

erp, please don't touch that code.

The writev deadlock fixes which went into 2.6.17 trashed writev()
performance and Nick and I are slowly trying to get it back, while fixing
the has-been-there-forever pagefault-in-write() deadlock.

This is all proving very hard to do, and we don't need sweeping code
cleanups happening under our feet ;)

I'll bring those patches back in next -mm, but not very confidently.

2006-12-04 19:47:07

by Chen, Kenneth W

[permalink] [raw]
Subject: RE: [patch] remove redundant iov segment check

Zach Brown wrote on Monday, December 04, 2006 11:19 AM
> On Dec 4, 2006, at 8:26 AM, Chen, Kenneth W wrote:
>
> > The access_ok() and negative length check on each iov segment in
> > function
> > generic_file_aio_read/write are redundant. They are all already
> > checked
> > before calling down to these low level generic functions.
>
> ...
>
> > So it's not possible to call down to generic_file_aio_read/write
> > with invalid
> > iov segment.
>
> Well, generic_file_aio_{read,write}() are exported to modules, so
> anything's *possible*. :)
>
> This change makes me nervous because it relies on our ability to
> audit all code paths to ensure that it's correct. It'd be nice if
> the code enforced the rules.

Maybe we should create another internal generic_file_aio_read/write
for in-core function? fs/read_write.c and fs/aio.c are not module-able
and the check is already there. For external module, we can do the
check and then calls down to the internal one.

I hate to see iov is being walked multiple times .... And this is
part of my effort to bring back O_DIRECT performance compares to a
3-years old vendor kernel based on 2.4 kernel.

- Ken

2006-12-04 19:50:32

by Zach Brown

[permalink] [raw]
Subject: Re: [patch] remove redundant iov segment check


> Maybe we should create another internal generic_file_aio_read/write
> for in-core function? fs/read_write.c and fs/aio.c are not module-able
> and the check is already there. For external module, we can do the
> check and then calls down to the internal one.

Maybe. I'd rather see fewer moving parts here, not more.

> I hate to see iov is being walked multiple times ....

Indeed, hence the desire to walk it once and pass down a summary of the
results in an explicit struct. The patch in this thread removes one
redundancy, but there are many. I think I counted 6 iovec walks in some
path? I can't remember which. I'd rather tackle them all in one go.

- z

2006-12-04 19:50:57

by Chen, Kenneth W

[permalink] [raw]
Subject: RE: [patch] remove redundant iov segment check

Andrew Morton wrote on Monday, December 04, 2006 11:36 AM
> On Mon, 4 Dec 2006 08:26:36 -0800
> "Chen, Kenneth W" <[email protected]> wrote:
>
> > So it's not possible to call down to generic_file_aio_read/write with invalid
> > iov segment. Patch proposed to delete these redundant code.
>
> erp, please don't touch that code.
>
> The writev deadlock fixes which went into 2.6.17 trashed writev()
> performance and Nick and I are slowly trying to get it back, while fixing
> the has-been-there-forever pagefault-in-write() deadlock.
>
> This is all proving very hard to do, and we don't need sweeping code
> cleanups happening under our feet ;)
>
> I'll bring those patches back in next -mm, but not very confidently.


OK, I will wait until that bug settles down and resubmit. I really don't
see the value of walking the iov multiple times doing the same thing.

I will also dig up the archive to see if I can help in any way.

- Ken

2007-01-02 11:01:55

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch] remove redundant iov segment check

> I wonder if it wouldn't be better to make this change as part of a
> larger change that moves towards an explicit iovec container struct
> rather than bare 'struct iov *' and 'nr_segs' arguments. The struct
> could have a flag that expressed whether the elements had been
> checked. A helper could be called by the upper and lower code paths
> which does the checking, marks the flag, and avoids checking again if
> the flag is set.
>
> We've wanted an explicit struct in the past to avoid the multiple
> walks of iovecs that various paths do for their own reasons. The
> iovec walk that is checking for length wrapping could also be
> building a bitmap of length alignment that O_DIRECT could be using to
> test 512B alignment without having to walk the iovec again.

I suspect it should be rather trivial to get this started. As a first
step we simply add a

struct iodesc {
int nr_segs;
struct iovec ioc[]
};

And then we can add fields where nessecary. First a full_length one
to avoid the loops to calculate thw whole I/O size, then flags for
the alignment check, etc..

2007-01-02 18:22:06

by Zach Brown

[permalink] [raw]
Subject: Re: [patch] remove redundant iov segment check

>> I wonder if it wouldn't be better to make this change as part of a
>> larger change that moves towards an explicit iovec container struct
>> rather than bare 'struct iov *' and 'nr_segs' arguments.

> I suspect it should be rather trivial to get this started. As a first
> step we simply add a
>
> struct iodesc {
> int nr_segs;
> struct iovec ioc[]
> };

Agreed, does anyone plan to try this in the near future? I can
always throw it at the bottom of the todo list :/.

- z

2007-01-02 18:25:58

by Chen, Kenneth W

[permalink] [raw]
Subject: RE: [patch] remove redundant iov segment check

Zach Brown wrote on Tuesday, January 02, 2007 10:22 AM
> >> I wonder if it wouldn't be better to make this change as part of a
> >> larger change that moves towards an explicit iovec container struct
> >> rather than bare 'struct iov *' and 'nr_segs' arguments.
>
> > I suspect it should be rather trivial to get this started. As a first
> > step we simply add a
> >
> > struct iodesc {
> > int nr_segs;
> > struct iovec ioc[]
> > };
>
> Agreed, does anyone plan to try this in the near future?

Yes, I will take a stab at it this week.