> From: Linus Torvalds <[email protected]>
> Date: Wed, 1 Dec 2021 10:06:14 -0800
> Subject: fget: check that the fd still exists after getting a ref to it
>
> Jann Horn points out that there is another possible race wrt Unix domain
> socket garbage collection, somewhat reminiscent of the one fixed in
> commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK").
>
> See the extended comment about the garbage collection requirements added
> to unix_peek_fds() by that commit for details.
>
> The race comes from how we can locklessly look up a file descriptor just
> as it is in the process of being closed, and with the right artificial
> timing (Jann added a few strategic 'mdelay(500)' calls to do that), the
> Unix domain socket garbage collector could see the reference count
> decrement of the close() happen before fget() took its reference to the
> file and the file was attached onto a new file descriptor.
I analyzed this CVE and tried to reproduce it.
I guess he triggered it like the stack below.
close_fd |
pick_file |
| __fget_files
file = files_lookup_fd_rcu(files, fd); |
|
rcu_assign_pointer(fdt->fd[fd], NULL);
filp_close |
fput |
| get_file_rcu_many // ned ref>=1
fput_many(file, 1); |
file_free(file); |
| return file
| // read-after-free
If you want to successfully execute the get_file_rcu_many function,
the reference counting of the file is greater than or equal to 1 and
is greater than or equal to 2 after the execution.
However, close releases only one reference count and does not release
the file,
so read-after-free does not occur. So how is the race triggered here?
The question has been pondered for a long time without any results.
Could I get more details (e.g. reproduction methods or stacks) from you ?
I would appreciate it if you could help me.
> This is all (intentionally) correct on the 'struct file *' side, with
> RCU lookups and lockless reference counting very much part of the
> design. Getting that reference count out of order isn't a problem per
> se.
>
> But the garbage collector can get confused by seeing this situation of
> having seen a file not having any remaining external references and then
> seeing it being attached to an fd.
>
> In commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK") the
> fix was to serialize the file descriptor install with the garbage
> collector by taking and releasing the unix_gc_lock.
>
> That's not really an option here, but since this all happens when we are
> in the process of looking up a file descriptor, we can instead simply
> just re-check that the file hasn't been closed in the meantime, and just
> re-do the lookup if we raced with a concurrent close() of the same file
> descriptor.
>
> Reported-and-tested-by: Jann Horn <[email protected]>
> Acked-by: Miklos Szeredi <[email protected]>
> Signed-off-by: Linus Torvalds <[email protected]>
> ---
> fs/file.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/file.c b/fs/file.c
> index 8627dacfc4246..ad4a8bf3cf109 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -858,6 +858,10 @@ loop:
> file = NULL;
> else if (!get_file_rcu_many(file, refs))
> goto loop;
> + else if (files_lookup_fd_raw(files, fd) != file) {
> + fput_many(file, refs);
> + goto loop;
> + }
> }
> rcu_read_unlock();
> -- cgit 1.2.3-1.el7
Looking forward to hearing from you.
Thank you!
--
With Best Regards,
Baokun Li
.
Happy New Year!
ping
在 2021/12/22 18:32, libaokun (A) 写道:
>> From: Linus Torvalds <[email protected]>
>> Date: Wed, 1 Dec 2021 10:06:14 -0800
>> Subject: fget: check that the fd still exists after getting a ref to it
>>
>> Jann Horn points out that there is another possible race wrt Unix domain
>> socket garbage collection, somewhat reminiscent of the one fixed in
>> commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK").
>>
>> See the extended comment about the garbage collection requirements added
>> to unix_peek_fds() by that commit for details.
>>
>> The race comes from how we can locklessly look up a file descriptor just
>> as it is in the process of being closed, and with the right artificial
>> timing (Jann added a few strategic 'mdelay(500)' calls to do that), the
>> Unix domain socket garbage collector could see the reference count
>> decrement of the close() happen before fget() took its reference to the
>> file and the file was attached onto a new file descriptor.
>
> I analyzed this CVE and tried to reproduce it.
>
> I guess he triggered it like the stack below.
>
>
> close_fd |
> pick_file |
> | __fget_files
> file = files_lookup_fd_rcu(files, fd); |
> |
> rcu_assign_pointer(fdt->fd[fd], NULL);
> filp_close |
> fput |
> | get_file_rcu_many // ned ref>=1
> fput_many(file, 1); |
> file_free(file); |
> | return file
> | // read-after-free
>
>
>
> If you want to successfully execute the get_file_rcu_many function,
>
> the reference counting of the file is greater than or equal to 1 and
>
> is greater than or equal to 2 after the execution.
>
> However, close releases only one reference count and does not release
> the file,
>
> so read-after-free does not occur. So how is the race triggered here?
>
> The question has been pondered for a long time without any results.
>
> Could I get more details (e.g. reproduction methods or stacks) from you ?
>
> I would appreciate it if you could help me.
>
>
>> This is all (intentionally) correct on the 'struct file *' side, with
>> RCU lookups and lockless reference counting very much part of the
>> design. Getting that reference count out of order isn't a problem per
>> se.
>>
>> But the garbage collector can get confused by seeing this situation of
>> having seen a file not having any remaining external references and then
>> seeing it being attached to an fd.
>>
>> In commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK") the
>> fix was to serialize the file descriptor install with the garbage
>> collector by taking and releasing the unix_gc_lock.
>>
>> That's not really an option here, but since this all happens when we are
>> in the process of looking up a file descriptor, we can instead simply
>> just re-check that the file hasn't been closed in the meantime, and just
>> re-do the lookup if we raced with a concurrent close() of the same file
>> descriptor.
>>
>> Reported-and-tested-by: Jann Horn <[email protected]>
>> Acked-by: Miklos Szeredi <[email protected]>
>> Signed-off-by: Linus Torvalds <[email protected]>
>> ---
>> fs/file.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/fs/file.c b/fs/file.c
>> index 8627dacfc4246..ad4a8bf3cf109 100644
>> --- a/fs/file.c
>> +++ b/fs/file.c
>> @@ -858,6 +858,10 @@ loop:
>> file = NULL;
>> else if (!get_file_rcu_many(file, refs))
>> goto loop;
>> + else if (files_lookup_fd_raw(files, fd) != file) {
>> + fput_many(file, refs);
>> + goto loop;
>> + }
>> }
>> rcu_read_unlock();
>> -- cgit 1.2.3-1.el7
>
> Looking forward to hearing from you.
>
> Thank you!
>
On Wed, Dec 22, 2021 at 11:32 AM libaokun (A) <[email protected]> wrote:
> > From: Linus Torvalds <[email protected]>
> > Date: Wed, 1 Dec 2021 10:06:14 -0800
> > Subject: fget: check that the fd still exists after getting a ref to it
> >
> > Jann Horn points out that there is another possible race wrt Unix domain
> > socket garbage collection, somewhat reminiscent of the one fixed in
> > commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK").
> >
> > See the extended comment about the garbage collection requirements added
> > to unix_peek_fds() by that commit for details.
> >
> > The race comes from how we can locklessly look up a file descriptor just
> > as it is in the process of being closed, and with the right artificial
> > timing (Jann added a few strategic 'mdelay(500)' calls to do that), the
> > Unix domain socket garbage collector could see the reference count
> > decrement of the close() happen before fget() took its reference to the
> > file and the file was attached onto a new file descriptor.
>
> I analyzed this CVE and tried to reproduce it.
>
> I guess he triggered it like the stack below.
>
>
> close_fd |
> pick_file |
> | __fget_files
> file = files_lookup_fd_rcu(files, fd); |
> |
> rcu_assign_pointer(fdt->fd[fd], NULL);
> filp_close |
> fput |
> | get_file_rcu_many // ned ref>=1
> fput_many(file, 1); |
> file_free(file); |
> | return file
> | // read-after-free
The race is more complicated than that; you also need to add unix_gc()
to the race. And if you want to get to memory corruption, you need one
or two more races involving unix_stream_read_generic() on top of that.
> If you want to successfully execute the get_file_rcu_many function,
>
> the reference counting of the file is greater than or equal to 1 and
>
> is greater than or equal to 2 after the execution.
>
> However, close releases only one reference count and does not release
> the file,
>
> so read-after-free does not occur. So how is the race triggered here?
This bug does not lead to a UAF of the file, it leads to a locking
inconsistency between the unix stream read path and the GC.
> The question has been pondered for a long time without any results.
>
> Could I get more details (e.g. reproduction methods or stacks) from you ?
See https://bugs.chromium.org/p/project-zero/issues/detail?id=2247 for
the original bug report. I'm also working on a more detailed blog
post, but that isn't finished yet.
在 2022/1/10 17:09, Jann Horn 写道:
> On Wed, Dec 22, 2021 at 11:32 AM libaokun (A) <[email protected]> wrote:
>>> From: Linus Torvalds <[email protected]>
>>> Date: Wed, 1 Dec 2021 10:06:14 -0800
>>> Subject: fget: check that the fd still exists after getting a ref to it
>>>
>>> Jann Horn points out that there is another possible race wrt Unix domain
>>> socket garbage collection, somewhat reminiscent of the one fixed in
>>> commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK").
>>>
>>> See the extended comment about the garbage collection requirements added
>>> to unix_peek_fds() by that commit for details.
>>>
>>> The race comes from how we can locklessly look up a file descriptor just
>>> as it is in the process of being closed, and with the right artificial
>>> timing (Jann added a few strategic 'mdelay(500)' calls to do that), the
>>> Unix domain socket garbage collector could see the reference count
>>> decrement of the close() happen before fget() took its reference to the
>>> file and the file was attached onto a new file descriptor.
>> I analyzed this CVE and tried to reproduce it.
>>
>> I guess he triggered it like the stack below.
>>
>>
>> close_fd |
>> pick_file |
>> | __fget_files
>> file = files_lookup_fd_rcu(files, fd); |
>> |
>> rcu_assign_pointer(fdt->fd[fd], NULL);
>> filp_close |
>> fput |
>> | get_file_rcu_many // ned ref>=1
>> fput_many(file, 1); |
>> file_free(file); |
>> | return file
>> | // read-after-free
> The race is more complicated than that; you also need to add unix_gc()
> to the race. And if you want to get to memory corruption, you need one
> or two more races involving unix_stream_read_generic() on top of that.
>
>> If you want to successfully execute the get_file_rcu_many function,
>>
>> the reference counting of the file is greater than or equal to 1 and
>>
>> is greater than or equal to 2 after the execution.
>>
>> However, close releases only one reference count and does not release
>> the file,
>>
>> so read-after-free does not occur. So how is the race triggered here?
> This bug does not lead to a UAF of the file, it leads to a locking
> inconsistency between the unix stream read path and the GC.
>
>> The question has been pondered for a long time without any results.
>>
>> Could I get more details (e.g. reproduction methods or stacks) from you ?
> See https://bugs.chromium.org/p/project-zero/issues/detail?id=2247 for
> the original bug report. I'm also working on a more detailed blog
> post, but that isn't finished yet.
> .
Thank you very much for your reply!
With your help, I have understood the problem and successfully
reproduced it.
Thanks a million!
--
With Best Regards,
Baokun Li