2020-11-29 06:08:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] exit: fix a race in release_task when flushing the dentry

On Sat, Nov 28, 2020 at 11:28:53PM +0800, Wen Yang wrote:
>
>
> 在 2020/11/28 下午10:05, Greg Kroah-Hartman 写道:
> > On Sat, Nov 28, 2020 at 09:59:09PM +0800, Wen Yang wrote:
> > >
> > >
> > > 在 2020/11/28 下午4:06, Greg Kroah-Hartman 写道:
> > > > On Sat, Nov 28, 2020 at 02:47:22PM +0800, Wen Yang wrote:
> > > > > [ Upstream commit 7bc3e6e55acf065500a24621f3b313e7e5998acf ]
> > > >
> > > > No, that is not this commit at all.
> > > >
> > > > What are you wanting to have happen here?
> > > >
> > > > confused,
> > > >
> > > > greg k-h
> > > >
> > >
> > > Thanks.
> > > Let's explain it briefly:
> > >
> > > The dentries such as /proc/<pid>/ns/ipc have the DCACHE_OP_DELETE flag, they
> > > should be deleted when the process exits.
> > > Suppose the following race appears:
> > >
> > > release_task dput
> > > -> proc_flush_task
> > > -> dentry->d_op->d_delete(dentry)
> > > -> __exit_signal
> > > -> dentry->d_lockref.count-- and return.
> > >
> > >
> > > In the proc_flush_task function, because another processe is using this
> > > dentry, it cannot be deleted;
> > > In the dput function, d_delete may be executed before __exit_signal (the pid
> > > has not been unhashed), so that d_delete returns false and the dentry can
> > > not be deleted.
> > >
> > > So this dentry is still caches (count is 0), and its parent dentries are
> > > also caches, and those dentries can only be deleted when drop_caches is
> > > manually triggered.
> > >
> > >
> > > In the release_task function, we should move proc_flush_task after the
> > > tasklist_lock is released(Just like the commit
> > > 7bc3e6e55acf065500a24621f3b313e7e5998acf did).
> >
> > I do not understand, is this a patch being submitted for the main kernel
> > tree, or for a stable kernel release?
> >
> > If stable, please read:
> > https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> > for how to do this properly.
> >
> > If main kernel tree, you can't have the "Upstream commit" line in the
> > changelog text as that makes no sense at all.
>
>
> Hi,
> This patch is submitted to the stable branches (from 4.9.y
> to 5.6.y).
>
> This problem can also be solved if the following patch could be ported to
> the stable branch:
> 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc")
> 26dbc60f385f ("proc: Generalize proc_sys_prune_dcache into
> proc_prune_siblings_dcache")
> f90f3cafe8d5 ("proc: Use d_invalidate in proc_prune_siblings_dcache")
>
> However, the above-mentioned patches modify too much code (more than 100
> lines), and there may also be some undiscovered bugs.
>
> So the safer method may be to apply this small patch(also ported from the
> equivalent fix already exist in Linus’ tree).
>
> We will reformat the patch later.

We always prefer to take the original, upstream patches, instead of
one-off changes as almost always, those one-off changes end up being
wrong and hard to work with over time.

So if we need more than one patch to solve this reported problem, that's
fine, can you test the above series of patches and provide a backported
set of them that we can use for this?

thanks,

gre gk-h


2020-11-29 14:47:56

by Wen Yang

[permalink] [raw]
Subject: Re: [PATCH] exit: fix a race in release_task when flushing the dentry



在 2020/11/29 下午2:05, Greg Kroah-Hartman 写道:
> On Sat, Nov 28, 2020 at 11:28:53PM +0800, Wen Yang wrote:
>>
>>
>> 在 2020/11/28 下午10:05, Greg Kroah-Hartman 写道:
>>> On Sat, Nov 28, 2020 at 09:59:09PM +0800, Wen Yang wrote:
>>>>
>>>>
>>>> 在 2020/11/28 下午4:06, Greg Kroah-Hartman 写道:
>>>>> On Sat, Nov 28, 2020 at 02:47:22PM +0800, Wen Yang wrote:
>>>>>> [ Upstream commit 7bc3e6e55acf065500a24621f3b313e7e5998acf ]
>>>>>
>>>>> No, that is not this commit at all.
>>>>>
>>>>> What are you wanting to have happen here?
>>>>>
>>>>> confused,
>>>>>
>>>>> greg k-h
>>>>>
>>>>
>>>> Thanks.
>>>> Let's explain it briefly:
>>>>
>>>> The dentries such as /proc/<pid>/ns/ipc have the DCACHE_OP_DELETE flag, they
>>>> should be deleted when the process exits.
>>>> Suppose the following race appears:
>>>>
>>>> release_task dput
>>>> -> proc_flush_task
>>>> -> dentry->d_op->d_delete(dentry)
>>>> -> __exit_signal
>>>> -> dentry->d_lockref.count-- and return.
>>>>
>>>>
>>>> In the proc_flush_task function, because another processe is using this
>>>> dentry, it cannot be deleted;
>>>> In the dput function, d_delete may be executed before __exit_signal (the pid
>>>> has not been unhashed), so that d_delete returns false and the dentry can
>>>> not be deleted.
>>>>
>>>> So this dentry is still caches (count is 0), and its parent dentries are
>>>> also caches, and those dentries can only be deleted when drop_caches is
>>>> manually triggered.
>>>>
>>>>
>>>> In the release_task function, we should move proc_flush_task after the
>>>> tasklist_lock is released(Just like the commit
>>>> 7bc3e6e55acf065500a24621f3b313e7e5998acf did).
>>>
>>> I do not understand, is this a patch being submitted for the main kernel
>>> tree, or for a stable kernel release?
>>>
>>> If stable, please read:
>>> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
>>> for how to do this properly.
>>>
>>> If main kernel tree, you can't have the "Upstream commit" line in the
>>> changelog text as that makes no sense at all.
>>
>>
>> Hi,
>> This patch is submitted to the stable branches (from 4.9.y
>> to 5.6.y).
>>
>> This problem can also be solved if the following patch could be ported to
>> the stable branch:
>> 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc")
>> 26dbc60f385f ("proc: Generalize proc_sys_prune_dcache into
>> proc_prune_siblings_dcache")
>> f90f3cafe8d5 ("proc: Use d_invalidate in proc_prune_siblings_dcache")
>>
>> However, the above-mentioned patches modify too much code (more than 100
>> lines), and there may also be some undiscovered bugs.
>>
>> So the safer method may be to apply this small patch(also ported from the
>> equivalent fix already exist in Linus’ tree).
>>
>> We will reformat the patch later.
>
> We always prefer to take the original, upstream patches, instead of
> one-off changes as almost always, those one-off changes end up being
> wrong and hard to work with over time.
>
> So if we need more than one patch to solve this reported problem, that's
> fine, can you test the above series of patches and provide a backported
> set of them that we can use for this?
>

Ok, we will follow your suggestions.
Thanks.

--
Best wishes,
Wen