2012-05-31 04:14:29

by Junxiao Bi

[permalink] [raw]
Subject: [PATCH 1/2] aio: make kiocb->private NUll in init_sync_kiocb()

Ocfs2 uses kiocb.*private as a flag of unsigned long size. In
commit a11f7e6 ocfs2: serialize unaligned aio, the unaligned
io flag is involved in it to serialize the unaligned aio. As
*private is not initialized in init_sync_kiocb() of do_sync_write(),
this unaligned io flag may be unexpectly set in an aligned dio.
And this will cause OCFS2_I(inode)->ip_unaligned_aio decreased
to -1 in ocfs2_dio_end_io(), thus the following unaligned dio
will hang forever at ocfs2_aiodio_wait() in ocfs2_file_write_iter().

We can't initialized this flag in ocfs2_file_write_iter() since
it may be invoked several times by do_sync_write(). So we initialize
it in init_sync_kiocb(), it's also useful for other similiar use of
it in the future.

Signed-off-by: Junxiao Bi <[email protected]>
---
include/linux/aio.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/linux/aio.h b/include/linux/aio.h
index 2314ad8..b1a520e 100644
--- a/include/linux/aio.h
+++ b/include/linux/aio.h
@@ -140,6 +140,7 @@ struct kiocb {
(x)->ki_dtor = NULL; \
(x)->ki_obj.tsk = tsk; \
(x)->ki_user_data = 0; \
+ (x)->private = NULL; \
} while (0)

#define AIO_RING_MAGIC 0xa10a10a1
--
1.7.9.5


2012-05-31 04:14:34

by Junxiao Bi

[permalink] [raw]
Subject: [PATCH 2/2] ocfs2: clear unaligned io flag when dio fails

The unaligned io flag is set in the kiocb when an unaligned
dio is issued, it should be cleared even when the dio fails,
or it may affect the following io which are using the same
kiocb.

Signed-off-by: Junxiao Bi <[email protected]>
---
fs/ocfs2/file.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 061591a..98513c8 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -2422,8 +2422,10 @@ out_dio:
unaligned_dio = 0;
}

- if (unaligned_dio)
+ if (unaligned_dio) {
+ ocfs2_iocb_clear_unaligned_aio(iocb);
atomic_dec(&OCFS2_I(inode)->ip_unaligned_aio);
+ }

out:
if (rw_level != -1)
--
1.7.9.5

2012-05-31 04:36:34

by Joe Jin

[permalink] [raw]
Subject: Re: [PATCH 1/2] aio: make kiocb->private NUll in init_sync_kiocb()

Cc David.

On 05/31/12 12:12, Junxiao Bi wrote:
> Ocfs2 uses kiocb.*private as a flag of unsigned long size. In
> commit a11f7e6 ocfs2: serialize unaligned aio, the unaligned
> io flag is involved in it to serialize the unaligned aio. As
> *private is not initialized in init_sync_kiocb() of do_sync_write(),
> this unaligned io flag may be unexpectly set in an aligned dio.
> And this will cause OCFS2_I(inode)->ip_unaligned_aio decreased
> to -1 in ocfs2_dio_end_io(), thus the following unaligned dio
> will hang forever at ocfs2_aiodio_wait() in ocfs2_file_write_iter().
>
> We can't initialized this flag in ocfs2_file_write_iter() since
> it may be invoked several times by do_sync_write(). So we initialize
> it in init_sync_kiocb(), it's also useful for other similiar use of
> it in the future.
>
> Signed-off-by: Junxiao Bi <[email protected]>
> ---
> include/linux/aio.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/linux/aio.h b/include/linux/aio.h
> index 2314ad8..b1a520e 100644
> --- a/include/linux/aio.h
> +++ b/include/linux/aio.h
> @@ -140,6 +140,7 @@ struct kiocb {
> (x)->ki_dtor = NULL; \
> (x)->ki_obj.tsk = tsk; \
> (x)->ki_user_data = 0; \
> + (x)->private = NULL; \
> } while (0)
>
> #define AIO_RING_MAGIC 0xa10a10a1


--
Oracle <http://www.oracle.com>
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing

2012-05-31 04:37:06

by Joe Jin

[permalink] [raw]
Subject: Re: [PATCH 2/2] ocfs2: clear unaligned io flag when dio fails

Cc David.

On 05/31/12 12:12, Junxiao Bi wrote:
> The unaligned io flag is set in the kiocb when an unaligned
> dio is issued, it should be cleared even when the dio fails,
> or it may affect the following io which are using the same
> kiocb.
>
> Signed-off-by: Junxiao Bi <[email protected]>
> ---
> fs/ocfs2/file.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 061591a..98513c8 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -2422,8 +2422,10 @@ out_dio:
> unaligned_dio = 0;
> }
>
> - if (unaligned_dio)
> + if (unaligned_dio) {
> + ocfs2_iocb_clear_unaligned_aio(iocb);
> atomic_dec(&OCFS2_I(inode)->ip_unaligned_aio);
> + }
>
> out:
> if (rw_level != -1)


--
Oracle <http://www.oracle.com>
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing

2012-05-31 14:08:31

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 1/2] aio: make kiocb->private NUll in init_sync_kiocb()

Junxiao Bi <[email protected]> writes:

> Ocfs2 uses kiocb.*private as a flag of unsigned long size. In
> commit a11f7e6 ocfs2: serialize unaligned aio, the unaligned
> io flag is involved in it to serialize the unaligned aio. As
> *private is not initialized in init_sync_kiocb() of do_sync_write(),
> this unaligned io flag may be unexpectly set in an aligned dio.
> And this will cause OCFS2_I(inode)->ip_unaligned_aio decreased
> to -1 in ocfs2_dio_end_io(), thus the following unaligned dio
> will hang forever at ocfs2_aiodio_wait() in ocfs2_file_write_iter().

> We can't initialized this flag in ocfs2_file_write_iter() since
> it may be invoked several times by do_sync_write(). So we initialize
> it in init_sync_kiocb(), it's also useful for other similiar use of
> it in the future.

I don't see any ocfs2_file_write_iter in the upstream kernel.
ocfs2_file_aio_write most certainly could set ->private to 0, it
will only be called once for a given kiocb.

That point aside, I have no issues with setting private to NULL in
init_sync_kiocb. If you fix up the comment to reflect reality
w.r.t. the upstream kernel source, I'll ack the patch.

Cheers,
Jeff

2012-05-31 14:09:20

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 2/2] ocfs2: clear unaligned io flag when dio fails

Junxiao Bi <[email protected]> writes:

> The unaligned io flag is set in the kiocb when an unaligned
> dio is issued, it should be cleared even when the dio fails,
> or it may affect the following io which are using the same
> kiocb.

What code is re-using kiocbs, much less re-using them without
re-initializing them?

-Jeff

2012-06-01 01:42:21

by Junxiao Bi

[permalink] [raw]
Subject: Re: [PATCH 1/2] aio: make kiocb->private NUll in init_sync_kiocb()

On 05/31/2012 10:08 PM, Jeff Moyer wrote:
> Junxiao Bi <[email protected]> writes:
>
>> Ocfs2 uses kiocb.*private as a flag of unsigned long size. In
>> commit a11f7e6 ocfs2: serialize unaligned aio, the unaligned
>> io flag is involved in it to serialize the unaligned aio. As
>> *private is not initialized in init_sync_kiocb() of do_sync_write(),
>> this unaligned io flag may be unexpectly set in an aligned dio.
>> And this will cause OCFS2_I(inode)->ip_unaligned_aio decreased
>> to -1 in ocfs2_dio_end_io(), thus the following unaligned dio
>> will hang forever at ocfs2_aiodio_wait() in ocfs2_file_write_iter().
>> We can't initialized this flag in ocfs2_file_write_iter() since
>> it may be invoked several times by do_sync_write(). So we initialize
>> it in init_sync_kiocb(), it's also useful for other similiar use of
>> it in the future.
> I don't see any ocfs2_file_write_iter in the upstream kernel.
> ocfs2_file_aio_write most certainly could set ->private to 0, it
> will only be called once for a given kiocb.
>From sys_io_submit->..->io_submit_one->aio_run_iocb->aio_rw_vect_retry,
it seems that aio_write could be called two times. See the following
scenario.
1. There is a file opened with direct io flag, in aio_rw_vect_retry,
aio_write is called first time. If the direct io can
not be completed, it will fall back into buffer io, see line 2329 in
aio_write.
2. If the very buffer io is a partial write, then it will return back
to aio_rw_vect_retry and issue the second aio_write.
>
> That point aside, I have no issues with setting private to NULL in
> init_sync_kiocb. If you fix up the comment to reflect reality
> w.r.t. the upstream kernel source, I'll ack the patch.
OK, I will fix the comment.
>
> Cheers,
> Jeff

2012-06-01 01:45:01

by Junxiao Bi

[permalink] [raw]
Subject: Re: [PATCH 2/2] ocfs2: clear unaligned io flag when dio fails

On 05/31/2012 10:09 PM, Jeff Moyer wrote:
> Junxiao Bi <[email protected]> writes:
>
>> The unaligned io flag is set in the kiocb when an unaligned
>> dio is issued, it should be cleared even when the dio fails,
>> or it may affect the following io which are using the same
>> kiocb.
> What code is re-using kiocbs, much less re-using them without
> re-initializing them?
See my comment in another thread. aio_write seems called two times with
the same kiocb.
>
> -Jeff

2012-06-01 20:55:48

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 1/2] aio: make kiocb->private NUll in init_sync_kiocb()

Junxiao Bi <[email protected]> writes:

> On 05/31/2012 10:08 PM, Jeff Moyer wrote:
>> Junxiao Bi <[email protected]> writes:
>>
>>> Ocfs2 uses kiocb.*private as a flag of unsigned long size. In
>>> commit a11f7e6 ocfs2: serialize unaligned aio, the unaligned
>>> io flag is involved in it to serialize the unaligned aio. As
>>> *private is not initialized in init_sync_kiocb() of do_sync_write(),
>>> this unaligned io flag may be unexpectly set in an aligned dio.
>>> And this will cause OCFS2_I(inode)->ip_unaligned_aio decreased
>>> to -1 in ocfs2_dio_end_io(), thus the following unaligned dio
>>> will hang forever at ocfs2_aiodio_wait() in ocfs2_file_write_iter().
>>> We can't initialized this flag in ocfs2_file_write_iter() since
>>> it may be invoked several times by do_sync_write(). So we initialize
>>> it in init_sync_kiocb(), it's also useful for other similiar use of
>>> it in the future.
>> I don't see any ocfs2_file_write_iter in the upstream kernel.
>> ocfs2_file_aio_write most certainly could set ->private to 0, it
>> will only be called once for a given kiocb.
> From sys_io_submit->..->io_submit_one->aio_run_iocb->aio_rw_vect_retry,
> it seems that aio_write could be called two times. See the following
> scenario.
> 1. There is a file opened with direct io flag, in aio_rw_vect_retry,
> aio_write is called first time. If the direct io can
> not be completed, it will fall back into buffer io, see line 2329 in
> aio_write.

Huh? What's line 2329 in aio_write?

> 2. If the very buffer io is a partial write, then it will return back
> to aio_rw_vect_retry and issue the second aio_write.

For the generic case, the fallback to buffered I/O happens in
__generic_file_aio_write, without bouncing all the way back up the call
stack to aio_rw_vect_retry. I see in ocfs2, things are a bit different:

retry->aio_rw_vect_retry->ocfs2_file_aio_write->generic_file_direct_write
->ocfs2_direct_IO->__blockdev_direct_IO

That last function can return 0 if not all of the data was written via
direct I/O. At that point, you return all of the way up the chain to
aio_rw_vect_retry, which checks the return value (ret). If it was 0,
then it goes ahead and retries the complete I/O. How does that make any
progress?!

Cheers,
Jeff

2012-06-02 02:59:43

by Junxiao Bi

[permalink] [raw]
Subject: Re: [PATCH 1/2] aio: make kiocb->private NUll in init_sync_kiocb()

On 06/02/2012 04:55 AM, Jeff Moyer wrote:
> Junxiao Bi<[email protected]> writes:
>
>> On 05/31/2012 10:08 PM, Jeff Moyer wrote:
>>> Junxiao Bi<[email protected]> writes:
>>>
>>>> Ocfs2 uses kiocb.*private as a flag of unsigned long size. In
>>>> commit a11f7e6 ocfs2: serialize unaligned aio, the unaligned
>>>> io flag is involved in it to serialize the unaligned aio. As
>>>> *private is not initialized in init_sync_kiocb() of do_sync_write(),
>>>> this unaligned io flag may be unexpectly set in an aligned dio.
>>>> And this will cause OCFS2_I(inode)->ip_unaligned_aio decreased
>>>> to -1 in ocfs2_dio_end_io(), thus the following unaligned dio
>>>> will hang forever at ocfs2_aiodio_wait() in ocfs2_file_write_iter().
>>>> We can't initialized this flag in ocfs2_file_write_iter() since
>>>> it may be invoked several times by do_sync_write(). So we initialize
>>>> it in init_sync_kiocb(), it's also useful for other similiar use of
>>>> it in the future.
>>> I don't see any ocfs2_file_write_iter in the upstream kernel.
>>> ocfs2_file_aio_write most certainly could set ->private to 0, it
>>> will only be called once for a given kiocb.
>> From sys_io_submit->..->io_submit_one->aio_run_iocb->aio_rw_vect_retry,
>> it seems that aio_write could be called two times. See the following
>> scenario.
>> 1. There is a file opened with direct io flag, in aio_rw_vect_retry,
>> aio_write is called first time. If the direct io can
>> not be completed, it will fall back into buffer io, see line 2329 in
>> aio_write.
> Huh? What's line 2329 in aio_write?
See the following code.

2312 can_do_direct = direct_io;
2313 ret = ocfs2_prepare_inode_for_write(file, ppos,
2314 iocb->ki_left, appending,
2315 &can_do_direct, &has_refcount);
2316 if (ret < 0) {
2317 mlog_errno(ret);
2318 goto out;
2319 }
2320
2321 if (direct_io && !is_sync_kiocb(iocb))
2322 unaligned_dio = ocfs2_is_io_unaligned(inode,
iocb->ki_left,
2323 *ppos);
2324
2325 /*
2326 * We can't complete the direct I/O as requested, fall back to
2327 * buffered I/O.
2328 */
2329 if (direct_io && !can_do_direct) {
2330 ocfs2_rw_unlock(inode, rw_level);
2331
2332 have_alloc_sem = 0;
2333 rw_level = -1;
2334
2335 direct_io = 0;
2336 goto relock;
2337 }

The above is the source code how direct io falled back to buffer io. In
line 2313, in function ocfs2_prepare_inode_for_write(), it will judge
whether the direct io can be executed. If not, the variable
"can_do_direct" will be set to false, then the variable "direct_io" will
be set to 0 in line 2335. This means that generic_file_buffered_write()
will be called in the following code, not generic_file_direct_write(),
see the following code. So if the generic_file_buffered_write() is a
partial write, then its return value "written" will be made as the
return value of the aio_write, see line 2439. Then it return back to
aio_rw_vect_retry(), the condition (ret > 0 && iocb->ki_left > 0 &&
opcode == IOCB_CMD_PWRITEV) is true. Then aio_write will be called
second time. As the unaligned I/O flag may be set in the kiocb at the
first time call of aio_write, it may affect the second call of aio_write
if its direct IO is aligned.

2372 if (direct_io) {
2373 written = generic_file_direct_write(iocb, iov,
&nr_segs, *ppos,
2374 ppos, count,
ocount);
2375 if (written < 0) {
2376 ret = written;
2377 goto out_dio;
2378 }
2379 } else {
2380 current->backing_dev_info =
file->f_mapping->backing_dev_info;
2381 written = generic_file_buffered_write(iocb, iov,
nr_segs, *ppos,
2382 ppos, count, 0);
2383 current->backing_dev_info = NULL;
2384 }

2438 if (written)
2439 ret = written;
2440 return ret;
>
>> 2. If the very buffer io is a partial write, then it will return back
>> to aio_rw_vect_retry and issue the second aio_write.
> For the generic case, the fallback to buffered I/O happens in
> __generic_file_aio_write, without bouncing all the way back up the call
> stack to aio_rw_vect_retry. I see in ocfs2, things are a bit different:
>
> retry->aio_rw_vect_retry->ocfs2_file_aio_write->generic_file_direct_write
> ->ocfs2_direct_IO->__blockdev_direct_IO
>
> That last function can return 0 if not all of the data was written via
> direct I/O. At that point, you return all of the way up the chain to
> aio_rw_vect_retry, which checks the return value (ret). If it was 0,
> then it goes ahead and retries the complete I/O. How does that make any
> progress?!
>
> Cheers,
> Jeff