2017-12-08 10:11:00

by Changwei Ge

[permalink] [raw]
Subject: Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()

On 2017/12/8 14:21, alex chen wrote:
>
>
> On 2017/12/8 13:36, Ben Hutchings wrote:
>> On Fri, 2017-12-08 at 12:03 +0800, alex chen wrote:
>>>
>>> On 2017/12/8 10:26, Ben Hutchings wrote:
>>>> On Fri, 2017-12-08 at 08:39 +0800, alex chen wrote:
>>>>>
>>>>> On 2017/12/8 2:25, Ben Hutchings wrote:
>>>>>> On Wed, 2017-12-06 at 09:02 +0800, alex chen wrote:
>>>>>>> Hi Ben,
>>>>>>>
>>>>>>> Thanks for your reply.
>>>>>>>
>>>>>>> On 2017/12/5 23:49, Ben Hutchings wrote:
>>>>>>>> On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
>>>>>>>>> 4.4-stable review patch. If anyone has any objections,
>>>>>>>>> please let me know.
>>>>>>>>>
>>>>>>>>> ------------------
>>>>>>>>>
>>>>>>>>> From: alex chen <[email protected]>
>>>>>>>>>
>>>>>>>>> commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
>>>>>>>>>
>>>>>>>>> we should wait dio requests to finish before inode lock in
>>>>>>>>> ocfs2_setattr(), otherwise the following deadlock will
>>>>>>>>> happen:
>>>>>>>>
>>>>>>>> [...]
>>>>>>>>
>>>>>>>> I looked at the kernel-doc for inode_dio_wait():
>>>>>>>>
>>>>>>>> /**
>>>>>>>> * inode_dio_wait - wait for outstanding DIO requests to finish
>>>>>>>> * @inode: inode to wait for
>>>>>>>> *
>>>>>>>> * Waits for all pending direct I/O requests to finish so that we can
>>>>>>>> * proceed with a truncate or equivalent operation.
>>>>>>>> *
>>>>>>>> * Must be called under a lock that serializes taking new references
>>>>>>>> * to i_dio_count, usually by inode->i_mutex.
>>>>>>>> */
>>>>>>>>
>>>>>>>> Now that ocfs2_setattr() calls this outside of the inode locked region,
>>>>>>>> what prevents another task adding a new dio request immediately
>>>>>>>> afterward?
>>>>>>>>
>>>>>>>
>>>>>>> In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
>>>>>>> prevent another bio to be issued from this node.
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>> Yes but there seems to be a race condition - after the call to
>>>>>> inode_dio_wait() and before the call to inode_lock(), another dio
>>>>>> request can be added.
>>>>
>>>> Sorry, I've been mixing up inode_lock() and ocfs2_inode_lock().
>>>> However:
>>>>
>>>>> In the truncating file situation, the lock order is as follow:
>>>>> do_truncate()
>>>>> inode_lock()
>>>>> notify_change()
>>>>> ocfs2_setattr()
>>>>> inode_dio_wait()
>>>>> --here it is under the protect of inode_lock(), so another dio requests
>>>>> from another process will not be added.
>>>>
>>>> only DIO reads seem to take the inode lock.
>>>>
>>>
>>> I do not clearly understand what you mean.
>>> The inode_lock() will be called in ocfs2_file_write_iter().
>>
>> Oh I see. I didn't realise that was part of the call chain.
>>
>>> You mean only DIO writes seem to take the inode_lock()?
>>
>> I did mean reads, as do_blockdev_direct_IO() may call inode_lock() for
>> reads - but ocfs2 doesn't set the flag for that. Maybe that's OK?
>
> I think you are right, we should set the DIO_LOCKING flag in ocfs2_direct_IO().

So this is actually another problem which was NOT introduced by Alex's
patch, right?
Ocfs2 perhaps should depend on vfs to flush page cache to get rid of
stale data on disk.

Thank,
Changwei

>
> Thanks,
> Alex
>>
>>> BTW, in this patch, I just adjusted the inode_dio_wait() to the front of the ocfs2_rw_lock()
>>> and didn't adjust the order of inode_lock() and inode_dio_wait().
>>
>> Right. I think you've convinced me to stop worrying about this.
>>
>> Ben.
>>
>
>


2017-12-12 01:34:38

by alex chen

[permalink] [raw]
Subject: Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()



On 2017/12/8 18:04, Changwei Ge wrote:
> On 2017/12/8 14:21, alex chen wrote:
>>
>>
>> On 2017/12/8 13:36, Ben Hutchings wrote:
>>> On Fri, 2017-12-08 at 12:03 +0800, alex chen wrote:
>>>>
>>>> On 2017/12/8 10:26, Ben Hutchings wrote:
>>>>> On Fri, 2017-12-08 at 08:39 +0800, alex chen wrote:
>>>>>>
>>>>>> On 2017/12/8 2:25, Ben Hutchings wrote:
>>>>>>> On Wed, 2017-12-06 at 09:02 +0800, alex chen wrote:
>>>>>>>> Hi Ben,
>>>>>>>>
>>>>>>>> Thanks for your reply.
>>>>>>>>
>>>>>>>> On 2017/12/5 23:49, Ben Hutchings wrote:
>>>>>>>>> On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
>>>>>>>>>> 4.4-stable review patch. If anyone has any objections,
>>>>>>>>>> please let me know.
>>>>>>>>>>
>>>>>>>>>> ------------------
>>>>>>>>>>
>>>>>>>>>> From: alex chen <[email protected]>
>>>>>>>>>>
>>>>>>>>>> commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
>>>>>>>>>>
>>>>>>>>>> we should wait dio requests to finish before inode lock in
>>>>>>>>>> ocfs2_setattr(), otherwise the following deadlock will
>>>>>>>>>> happen:
>>>>>>>>>
>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>> I looked at the kernel-doc for inode_dio_wait():
>>>>>>>>>
>>>>>>>>> /**
>>>>>>>>> * inode_dio_wait - wait for outstanding DIO requests to finish
>>>>>>>>> * @inode: inode to wait for
>>>>>>>>> *
>>>>>>>>> * Waits for all pending direct I/O requests to finish so that we can
>>>>>>>>> * proceed with a truncate or equivalent operation.
>>>>>>>>> *
>>>>>>>>> * Must be called under a lock that serializes taking new references
>>>>>>>>> * to i_dio_count, usually by inode->i_mutex.
>>>>>>>>> */
>>>>>>>>>
>>>>>>>>> Now that ocfs2_setattr() calls this outside of the inode locked region,
>>>>>>>>> what prevents another task adding a new dio request immediately
>>>>>>>>> afterward?
>>>>>>>>>
>>>>>>>>
>>>>>>>> In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
>>>>>>>> prevent another bio to be issued from this node.
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>> Yes but there seems to be a race condition - after the call to
>>>>>>> inode_dio_wait() and before the call to inode_lock(), another dio
>>>>>>> request can be added.
>>>>>
>>>>> Sorry, I've been mixing up inode_lock() and ocfs2_inode_lock().
>>>>> However:
>>>>>
>>>>>> In the truncating file situation, the lock order is as follow:
>>>>>> do_truncate()
>>>>>> inode_lock()
>>>>>> notify_change()
>>>>>> ocfs2_setattr()
>>>>>> inode_dio_wait()
>>>>>> --here it is under the protect of inode_lock(), so another dio requests
>>>>>> from another process will not be added.
>>>>>
>>>>> only DIO reads seem to take the inode lock.
>>>>>
>>>>
>>>> I do not clearly understand what you mean.
>>>> The inode_lock() will be called in ocfs2_file_write_iter().
>>>
>>> Oh I see. I didn't realise that was part of the call chain.
>>>
>>>> You mean only DIO writes seem to take the inode_lock()?
>>>
>>> I did mean reads, as do_blockdev_direct_IO() may call inode_lock() for
>>> reads - but ocfs2 doesn't set the flag for that. Maybe that's OK?
>>
>> I think you are right, we should set the DIO_LOCKING flag in ocfs2_direct_IO().
>
> So this is actually another problem which was NOT introduced by Alex's
> patch, right?
> Ocfs2 perhaps should depend on vfs to flush page cache to get rid of
> stale data on disk.

Yes, I think we should set the DIO_LOCKING flag to synchronize direct I/O reads/writes
and truncate.
The following patch is being tested in my local environment.

Signed-off-by: Alex Chen <[email protected]>
---
fs/ocfs2/aops.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 7e1659d..d10632f 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2491,7 +2491,7 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)

return __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev,
iter, get_block,
- ocfs2_dio_end_io, NULL, 0);
+ ocfs2_dio_end_io, NULL, DIO_LOCKING);
}

const struct address_space_operations ocfs2_aops = {
--
1.9.5.msysgit.1

>
> Thank,
> Changwei
>
>>
>> Thanks,
>> Alex
>>>
>>>> BTW, in this patch, I just adjusted the inode_dio_wait() to the front of the ocfs2_rw_lock()
>>>> and didn't adjust the order of inode_lock() and inode_dio_wait().
>>>
>>> Right. I think you've convinced me to stop worrying about this.
>>>
>>> Ben.
>>>
>>
>>
>
>
> .
>