2017-12-05 15:50:09

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()

On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
> 4.4-stable review patch.  If anyone has any objections, please let me know.
>
> ------------------
>
> From: alex chen <[email protected]>
>
> commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
>
> we should wait dio requests to finish before inode lock in
> ocfs2_setattr(), otherwise the following deadlock will happen:
[...]

I looked at the kernel-doc for inode_dio_wait():

/**
 * inode_dio_wait - wait for outstanding DIO requests to finish
 * @inode: inode to wait for
 *
 * Waits for all pending direct I/O requests to finish so that we can
 * proceed with a truncate or equivalent operation.
 *
 * Must be called under a lock that serializes taking new references
 * to i_dio_count, usually by inode->i_mutex.
 */

Now that ocfs2_setattr() calls this outside of the inode locked region,
what prevents another task adding a new dio request immediately
afterward?

Also, ocfs2_dio_end_io_write() was introduced in 4.6 and it looks like
the dio completion path didn't previously take the inode lock. So it
doesn't look this fix is needed in 3.18 or 4.4.

Ben.

--
Ben Hutchings
Software Developer, Codethink Ltd.


2017-12-06 01:12:18

by alex chen

[permalink] [raw]
Subject: Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()

Hi Ben,

Thanks for your reply.

On 2017/12/5 23:49, Ben Hutchings wrote:
> On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
>> 4.4-stable review patch. If anyone has any objections, please let me know.
>>
>> ------------------
>>
>> From: alex chen <[email protected]>
>>
>> commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
>>
>> we should wait dio requests to finish before inode lock in
>> ocfs2_setattr(), otherwise the following deadlock will happen:
> [...]
>
> I looked at the kernel-doc for inode_dio_wait():
>
> /**
> * inode_dio_wait - wait for outstanding DIO requests to finish
> * @inode: inode to wait for
> *
> * Waits for all pending direct I/O requests to finish so that we can
> * proceed with a truncate or equivalent operation.
> *
> * Must be called under a lock that serializes taking new references
> * to i_dio_count, usually by inode->i_mutex.
> */
>
> Now that ocfs2_setattr() calls this outside of the inode locked region,
> what prevents another task adding a new dio request immediately
> afterward?
>

In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
prevent another bio to be issued from this node.
Furthermore, we use the ocfs2_rw_lock() and ocfs2_inode_lock() in ocfs2_setattr()
to guarantee no more bio will be issued from the other nodes in this cluster.

> Also, ocfs2_dio_end_io_write() was introduced in 4.6 and it looks like
> the dio completion path didn't previously take the inode lock. So it
> doesn't look this fix is needed in 3.18 or 4.4.

Yes, ocfs2_dio_end_io_write() was introduced in 4.6 and the problem this patch
fixes is only exist in the kernel 4.6 and above 4.6.

I'm sorry that I don't clearly point out which the stable version of kernel this patch
will fixes.

Thanks,
Alex

>
> Ben.
>

2017-12-06 16:36:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()

On Wed, Dec 06, 2017 at 09:02:42AM +0800, alex chen wrote:
> Hi Ben,
>
> Thanks for your reply.
>
> On 2017/12/5 23:49, Ben Hutchings wrote:
> > On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
> >> 4.4-stable review patch. If anyone has any objections, please let me know.
> >>
> >> ------------------
> >>
> >> From: alex chen <[email protected]>
> >>
> >> commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
> >>
> >> we should wait dio requests to finish before inode lock in
> >> ocfs2_setattr(), otherwise the following deadlock will happen:
> > [...]
> >
> > I looked at the kernel-doc for inode_dio_wait():
> >
> > /**
> > * inode_dio_wait - wait for outstanding DIO requests to finish
> > * @inode: inode to wait for
> > *
> > * Waits for all pending direct I/O requests to finish so that we can
> > * proceed with a truncate or equivalent operation.
> > *
> > * Must be called under a lock that serializes taking new references
> > * to i_dio_count, usually by inode->i_mutex.
> > */
> >
> > Now that ocfs2_setattr() calls this outside of the inode locked region,
> > what prevents another task adding a new dio request immediately
> > afterward?
> >
>
> In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
> prevent another bio to be issued from this node.
> Furthermore, we use the ocfs2_rw_lock() and ocfs2_inode_lock() in ocfs2_setattr()
> to guarantee no more bio will be issued from the other nodes in this cluster.
>
> > Also, ocfs2_dio_end_io_write() was introduced in 4.6 and it looks like
> > the dio completion path didn't previously take the inode lock. So it
> > doesn't look this fix is needed in 3.18 or 4.4.
>
> Yes, ocfs2_dio_end_io_write() was introduced in 4.6 and the problem this patch
> fixes is only exist in the kernel 4.6 and above 4.6.
>
> I'm sorry that I don't clearly point out which the stable version of kernel this patch
> will fixes.

Not a problem, now dropped.

greg k-h

2017-12-07 18:26:08

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()

On Wed, 2017-12-06 at 09:02 +0800, alex chen wrote:
> Hi Ben,
>
> Thanks for your reply.
>
> On 2017/12/5 23:49, Ben Hutchings wrote:
> > On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
> > > 4.4-stable review patch.  If anyone has any objections, please let me know.
> > >
> > > ------------------
> > >
> > > From: alex chen <[email protected]>
> > >
> > > commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
> > >
> > > we should wait dio requests to finish before inode lock in
> > > ocfs2_setattr(), otherwise the following deadlock will happen:
> >
> > [...]
> >
> > I looked at the kernel-doc for inode_dio_wait():
> >
> > /**
> >  * inode_dio_wait - wait for outstanding DIO requests to finish
> >  * @inode: inode to wait for
> >  *
> >  * Waits for all pending direct I/O requests to finish so that we can
> >  * proceed with a truncate or equivalent operation.
> >  *
> >  * Must be called under a lock that serializes taking new references
> >  * to i_dio_count, usually by inode->i_mutex.
> >  */
> >
> > Now that ocfs2_setattr() calls this outside of the inode locked region,
> > what prevents another task adding a new dio request immediately
> > afterward?
> >
>
> In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
> prevent another bio to be issued from this node.
[...]

Yes but there seems to be a race condition - after the call to
inode_dio_wait() and before the call to inode_lock(), another dio
request can be added.

Ben.

--
Ben Hutchings
Software Developer, Codethink Ltd.

2017-12-08 00:39:45

by alex chen

[permalink] [raw]
Subject: Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()



On 2017/12/8 2:25, Ben Hutchings wrote:
> On Wed, 2017-12-06 at 09:02 +0800, alex chen wrote:
>> Hi Ben,
>>
>> Thanks for your reply.
>>
>> On 2017/12/5 23:49, Ben Hutchings wrote:
>>> On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
>>>> 4.4-stable review patch. If anyone has any objections, please let me know.
>>>>
>>>> ------------------
>>>>
>>>> From: alex chen <[email protected]>
>>>>
>>>> commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
>>>>
>>>> we should wait dio requests to finish before inode lock in
>>>> ocfs2_setattr(), otherwise the following deadlock will happen:
>>>
>>> [...]
>>>
>>> I looked at the kernel-doc for inode_dio_wait():
>>>
>>> /**
>>> * inode_dio_wait - wait for outstanding DIO requests to finish
>>> * @inode: inode to wait for
>>> *
>>> * Waits for all pending direct I/O requests to finish so that we can
>>> * proceed with a truncate or equivalent operation.
>>> *
>>> * Must be called under a lock that serializes taking new references
>>> * to i_dio_count, usually by inode->i_mutex.
>>> */
>>>
>>> Now that ocfs2_setattr() calls this outside of the inode locked region,
>>> what prevents another task adding a new dio request immediately
>>> afterward?
>>>
>>
>> In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
>> prevent another bio to be issued from this node.
> [...]
>
> Yes but there seems to be a race condition - after the call to
> inode_dio_wait() and before the call to inode_lock(), another dio
> request can be added.

In the truncating file situation, the lock order is as follow:
do_truncate()
inode_lock()
notify_change()
ocfs2_setattr()
inode_dio_wait()
--here it is under the protect of inode_lock(), so another dio requests
from another process will not be added.
ocfs2_rw_lock()
ocfs2_inode_lock_tracker()
this function is used to prevent the inode from being modified by another
nodes in the cluster
inode_unlock()

>
> Ben.
>

2017-12-08 02:26:35

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()

On Fri, 2017-12-08 at 08:39 +0800, alex chen wrote:
>
> On 2017/12/8 2:25, Ben Hutchings wrote:
> > On Wed, 2017-12-06 at 09:02 +0800, alex chen wrote:
> > > Hi Ben,
> > >
> > > Thanks for your reply.
> > >
> > > On 2017/12/5 23:49, Ben Hutchings wrote:
> > > > On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
> > > > > 4.4-stable review patch.  If anyone has any objections,
> > > > > please let me know.
> > > > >
> > > > > ------------------
> > > > >
> > > > > From: alex chen <[email protected]>
> > > > >
> > > > > commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
> > > > >
> > > > > we should wait dio requests to finish before inode lock in
> > > > > ocfs2_setattr(), otherwise the following deadlock will
> > > > > happen:
> > > >
> > > > [...]
> > > >
> > > > I looked at the kernel-doc for inode_dio_wait():
> > > >
> > > > /**
> > > >  * inode_dio_wait - wait for outstanding DIO requests to finish
> > > >  * @inode: inode to wait for
> > > >  *
> > > >  * Waits for all pending direct I/O requests to finish so that we can
> > > >  * proceed with a truncate or equivalent operation.
> > > >  *
> > > >  * Must be called under a lock that serializes taking new references
> > > >  * to i_dio_count, usually by inode->i_mutex.
> > > >  */
> > > >
> > > > Now that ocfs2_setattr() calls this outside of the inode locked region,
> > > > what prevents another task adding a new dio request immediately
> > > > afterward?
> > > >
> > >
> > > In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
> > > prevent another bio to be issued from this node.
> >
> > [...]
> >
> > Yes but there seems to be a race condition - after the call to
> > inode_dio_wait() and before the call to inode_lock(), another dio
> > request can be added.

Sorry, I've been mixing up inode_lock() and ocfs2_inode_lock().
However:

> In the truncating file situation, the lock order is as follow:
> do_truncate()
>  inode_lock()
>  notify_change()
>   ocfs2_setattr()
>    inode_dio_wait()
>     --here it is under the protect of inode_lock(), so another dio requests
>       from another process will not be added.

only DIO reads seem to take the inode lock.

Ben.

>    ocfs2_rw_lock()
>    ocfs2_inode_lock_tracker()
>     this function is used to prevent the inode from being modified by another
>     nodes in the cluster
>  inode_unlock()
>
> >
> > Ben.
> >
>
>
--
Ben Hutchings
Software Developer, Codethink Ltd.

2017-12-08 04:04:22

by alex chen

[permalink] [raw]
Subject: Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()



On 2017/12/8 10:26, Ben Hutchings wrote:
> On Fri, 2017-12-08 at 08:39 +0800, alex chen wrote:
>>
>> On 2017/12/8 2:25, Ben Hutchings wrote:
>>> On Wed, 2017-12-06 at 09:02 +0800, alex chen wrote:
>>>> Hi Ben,
>>>>
>>>> Thanks for your reply.
>>>>
>>>> On 2017/12/5 23:49, Ben Hutchings wrote:
>>>>> On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
>>>>>> 4.4-stable review patch. If anyone has any objections,
>>>>>> please let me know.
>>>>>>
>>>>>> ------------------
>>>>>>
>>>>>> From: alex chen <[email protected]>
>>>>>>
>>>>>> commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
>>>>>>
>>>>>> we should wait dio requests to finish before inode lock in
>>>>>> ocfs2_setattr(), otherwise the following deadlock will
>>>>>> happen:
>>>>>
>>>>> [...]
>>>>>
>>>>> I looked at the kernel-doc for inode_dio_wait():
>>>>>
>>>>> /**
>>>>> * inode_dio_wait - wait for outstanding DIO requests to finish
>>>>> * @inode: inode to wait for
>>>>> *
>>>>> * Waits for all pending direct I/O requests to finish so that we can
>>>>> * proceed with a truncate or equivalent operation.
>>>>> *
>>>>> * Must be called under a lock that serializes taking new references
>>>>> * to i_dio_count, usually by inode->i_mutex.
>>>>> */
>>>>>
>>>>> Now that ocfs2_setattr() calls this outside of the inode locked region,
>>>>> what prevents another task adding a new dio request immediately
>>>>> afterward?
>>>>>
>>>>
>>>> In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
>>>> prevent another bio to be issued from this node.
>>>
>>> [...]
>>>
>>> Yes but there seems to be a race condition - after the call to
>>> inode_dio_wait() and before the call to inode_lock(), another dio
>>> request can be added.
>
> Sorry, I've been mixing up inode_lock() and ocfs2_inode_lock().
> However:
>
>> In the truncating file situation, the lock order is as follow:
>> do_truncate()
>> inode_lock()
>> notify_change()
>> ocfs2_setattr()
>> inode_dio_wait()
>> --here it is under the protect of inode_lock(), so another dio requests
>> from another process will not be added.
>
> only DIO reads seem to take the inode lock.
>

I do not clearly understand what you mean.
The inode_lock() will be called in ocfs2_file_write_iter().
You mean only DIO writes seem to take the inode_lock()?

BTW, in this patch, I just adjusted the inode_dio_wait() to the front of the ocfs2_rw_lock()
and didn't adjust the order of inode_lock() and inode_dio_wait().

Thanks,
Alex

> Ben.
>
>> ocfs2_rw_lock()
>> ocfs2_inode_lock_tracker()
>> this function is used to prevent the inode from being modified by another
>> nodes in the cluster
>> inode_unlock()
>>
>>>
>>> Ben.
>>>
>>
>>

2017-12-08 05:36:36

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()

On Fri, 2017-12-08 at 12:03 +0800, alex chen wrote:
>
> On 2017/12/8 10:26, Ben Hutchings wrote:
> > On Fri, 2017-12-08 at 08:39 +0800, alex chen wrote:
> > >
> > > On 2017/12/8 2:25, Ben Hutchings wrote:
> > > > On Wed, 2017-12-06 at 09:02 +0800, alex chen wrote:
> > > > > Hi Ben,
> > > > >
> > > > > Thanks for your reply.
> > > > >
> > > > > On 2017/12/5 23:49, Ben Hutchings wrote:
> > > > > > On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
> > > > > > > 4.4-stable review patch.  If anyone has any objections,
> > > > > > > please let me know.
> > > > > > >
> > > > > > > ------------------
> > > > > > >
> > > > > > > From: alex chen <[email protected]>
> > > > > > >
> > > > > > > commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
> > > > > > >
> > > > > > > we should wait dio requests to finish before inode lock in
> > > > > > > ocfs2_setattr(), otherwise the following deadlock will
> > > > > > > happen:
> > > > > >
> > > > > > [...]
> > > > > >
> > > > > > I looked at the kernel-doc for inode_dio_wait():
> > > > > >
> > > > > > /**
> > > > > >  * inode_dio_wait - wait for outstanding DIO requests to finish
> > > > > >  * @inode: inode to wait for
> > > > > >  *
> > > > > >  * Waits for all pending direct I/O requests to finish so that we can
> > > > > >  * proceed with a truncate or equivalent operation.
> > > > > >  *
> > > > > >  * Must be called under a lock that serializes taking new references
> > > > > >  * to i_dio_count, usually by inode->i_mutex.
> > > > > >  */
> > > > > >
> > > > > > Now that ocfs2_setattr() calls this outside of the inode locked region,
> > > > > > what prevents another task adding a new dio request immediately
> > > > > > afterward?
> > > > > >
> > > > >
> > > > > In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
> > > > > prevent another bio to be issued from this node.
> > > >
> > > > [...]
> > > >
> > > > Yes but there seems to be a race condition - after the call to
> > > > inode_dio_wait() and before the call to inode_lock(), another dio
> > > > request can be added.
> >
> > Sorry, I've been mixing up inode_lock() and ocfs2_inode_lock(). 
> > However:
> >
> > > In the truncating file situation, the lock order is as follow:
> > > do_truncate()
> > >  inode_lock()
> > >  notify_change()
> > >   ocfs2_setattr()
> > >    inode_dio_wait()
> > >     --here it is under the protect of inode_lock(), so another dio requests
> > >       from another process will not be added.
> >
> > only DIO reads seem to take the inode lock.
> >
>
> I do not clearly understand what you mean.
> The inode_lock() will be called in ocfs2_file_write_iter().

Oh I see. I didn't realise that was part of the call chain.

> You mean only DIO writes seem to take the inode_lock()?

I did mean reads, as do_blockdev_direct_IO() may call inode_lock() for
reads - but ocfs2 doesn't set the flag for that. Maybe that's OK?

> BTW, in this patch, I just adjusted the inode_dio_wait() to the front of the ocfs2_rw_lock()
> and didn't adjust the order of inode_lock() and inode_dio_wait().

Right. I think you've convinced me to stop worrying about this.

Ben.

--
Ben Hutchings
Software Developer, Codethink Ltd.

2017-12-08 06:20:23

by alex chen

[permalink] [raw]
Subject: Re: [PATCH 4.4 13/16] ocfs2: should wait dio before inode lock in ocfs2_setattr()



On 2017/12/8 13:36, Ben Hutchings wrote:
> On Fri, 2017-12-08 at 12:03 +0800, alex chen wrote:
>>
>> On 2017/12/8 10:26, Ben Hutchings wrote:
>>> On Fri, 2017-12-08 at 08:39 +0800, alex chen wrote:
>>>>
>>>> On 2017/12/8 2:25, Ben Hutchings wrote:
>>>>> On Wed, 2017-12-06 at 09:02 +0800, alex chen wrote:
>>>>>> Hi Ben,
>>>>>>
>>>>>> Thanks for your reply.
>>>>>>
>>>>>> On 2017/12/5 23:49, Ben Hutchings wrote:
>>>>>>> On Wed, 2017-11-22 at 11:12 +0100, Greg Kroah-Hartman wrote:
>>>>>>>> 4.4-stable review patch. If anyone has any objections,
>>>>>>>> please let me know.
>>>>>>>>
>>>>>>>> ------------------
>>>>>>>>
>>>>>>>> From: alex chen <[email protected]>
>>>>>>>>
>>>>>>>> commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.
>>>>>>>>
>>>>>>>> we should wait dio requests to finish before inode lock in
>>>>>>>> ocfs2_setattr(), otherwise the following deadlock will
>>>>>>>> happen:
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>> I looked at the kernel-doc for inode_dio_wait():
>>>>>>>
>>>>>>> /**
>>>>>>> * inode_dio_wait - wait for outstanding DIO requests to finish
>>>>>>> * @inode: inode to wait for
>>>>>>> *
>>>>>>> * Waits for all pending direct I/O requests to finish so that we can
>>>>>>> * proceed with a truncate or equivalent operation.
>>>>>>> *
>>>>>>> * Must be called under a lock that serializes taking new references
>>>>>>> * to i_dio_count, usually by inode->i_mutex.
>>>>>>> */
>>>>>>>
>>>>>>> Now that ocfs2_setattr() calls this outside of the inode locked region,
>>>>>>> what prevents another task adding a new dio request immediately
>>>>>>> afterward?
>>>>>>>
>>>>>>
>>>>>> In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
>>>>>> prevent another bio to be issued from this node.
>>>>>
>>>>> [...]
>>>>>
>>>>> Yes but there seems to be a race condition - after the call to
>>>>> inode_dio_wait() and before the call to inode_lock(), another dio
>>>>> request can be added.
>>>
>>> Sorry, I've been mixing up inode_lock() and ocfs2_inode_lock().
>>> However:
>>>
>>>> In the truncating file situation, the lock order is as follow:
>>>> do_truncate()
>>>> inode_lock()
>>>> notify_change()
>>>> ocfs2_setattr()
>>>> inode_dio_wait()
>>>> --here it is under the protect of inode_lock(), so another dio requests
>>>> from another process will not be added.
>>>
>>> only DIO reads seem to take the inode lock.
>>>
>>
>> I do not clearly understand what you mean.
>> The inode_lock() will be called in ocfs2_file_write_iter().
>
> Oh I see. I didn't realise that was part of the call chain.
>
>> You mean only DIO writes seem to take the inode_lock()?
>
> I did mean reads, as do_blockdev_direct_IO() may call inode_lock() for
> reads - but ocfs2 doesn't set the flag for that. Maybe that's OK?

I think you are right, we should set the DIO_LOCKING flag in ocfs2_direct_IO().

Thanks,
Alex
>
>> BTW, in this patch, I just adjusted the inode_dio_wait() to the front of the ocfs2_rw_lock()
>> and didn't adjust the order of inode_lock() and inode_dio_wait().
>
> Right. I think you've convinced me to stop worrying about this.
>
> Ben.
>