TLDR: Different stages of 1 and 2 can race with each other causing UAF.
1. llcp_sock_sendmsg -> nfc_llcp_send_ui_frame -> loop call (nfc_alloc_send_skb(nfc_dev))
2. virtual_ncidev_close -> [... -> nfc_llcp_socket_release -> ...] -> [... -> nfc_free_device]
---
Hi,
I've been trying to fix this bug for some time but ending up getting
stuck every now and then. If someone could give more inputs or fix it,
it will be really helpful.
This bug is due to racing between sendmsg and freeing of nfc_dev.
For connectionless transmission, llcp_sock_sendmsg() codepath will
eventually call nfc_alloc_send_skb() which takes in an nfc_dev as
an argument for calculating the total size for skb allocation.
virtual_ncidev_close() codepath eventually releases socket by calling
nfc_llcp_socket_release() (which sets the sk->sk_state to LLCP_CLOSED)
and afterwards the nfc_dev will be eventually freed.
When an ndev gets freed, llcp_sock_sendmsg() will result in an
use-after-free as it
(1) doesn't have any checks in place for avoiding the datagram sending.
(1.1) Checking for LLCP_CLOSED in llcp_sock_sendmsg() does make
the racing less likely. For -smp 6 it did not trigger on
my PC, leading me to naively think that was the solution
until syzbot told me quite some time later that it isn't.
(2) calls nfc_llcp_send_ui_frame(), which also has a do-while loop which
can race with freeing (a msg with size of 4096 is sent in chunks of
128 in this repro).
(2.1) By this I mean just moving the nfc_dev access from
nfc_alloc_send_skb to inside this function, be it
inside or outside the loop, naturally doesn't work.
When an nfc_dev is freed and we happened to get headroom and tailroom,
PDU skb seems to be not allocated and ENXIO is returned.
I tried to look at other code in net subsystem to get an idea how other
places handle it, but accessing device later in the codepath does not
seem to not be a norm. So I am starting to think some refactoring of the
locking logic may be needed (or maybe RCU protect headroom and tailroom?).
I don't know if I'm correct, but anyways where does one start?
Thanks,
Siddh
On 16/11/2023 17:55, Siddh Raman Pant wrote:
> TLDR: Different stages of 1 and 2 can race with each other causing UAF.
>
> 1. llcp_sock_sendmsg -> nfc_llcp_send_ui_frame -> loop call (nfc_alloc_send_skb(nfc_dev))
>
> 2. virtual_ncidev_close -> [... -> nfc_llcp_socket_release -> ...] -> [... -> nfc_free_device]
>
> ---
>
> Hi,
>
> I've been trying to fix this bug for some time but ending up getting
> stuck every now and then. If someone could give more inputs or fix it,
> it will be really helpful.
>
> This bug is due to racing between sendmsg and freeing of nfc_dev.
>
> For connectionless transmission, llcp_sock_sendmsg() codepath will
> eventually call nfc_alloc_send_skb() which takes in an nfc_dev as
> an argument for calculating the total size for skb allocation.
>
> virtual_ncidev_close() codepath eventually releases socket by calling
> nfc_llcp_socket_release() (which sets the sk->sk_state to LLCP_CLOSED)
> and afterwards the nfc_dev will be eventually freed.
>
> When an ndev gets freed, llcp_sock_sendmsg() will result in an
> use-after-free as it
>
> (1) doesn't have any checks in place for avoiding the datagram sending.
> (1.1) Checking for LLCP_CLOSED in llcp_sock_sendmsg() does make
> the racing less likely. For -smp 6 it did not trigger on
> my PC, leading me to naively think that was the solution
> until syzbot told me quite some time later that it isn't.
>
> (2) calls nfc_llcp_send_ui_frame(), which also has a do-while loop which
> can race with freeing (a msg with size of 4096 is sent in chunks of
> 128 in this repro).
> (2.1) By this I mean just moving the nfc_dev access from
> nfc_alloc_send_skb to inside this function, be it
> inside or outside the loop, naturally doesn't work.
>
> When an nfc_dev is freed and we happened to get headroom and tailroom,
> PDU skb seems to be not allocated and ENXIO is returned.
>
> I tried to look at other code in net subsystem to get an idea how other
> places handle it, but accessing device later in the codepath does not
> seem to not be a norm. So I am starting to think some refactoring of the
> locking logic may be needed (or maybe RCU protect headroom and tailroom?).
>
> I don't know if I'm correct, but anyways where does one start?
Any checks would need to have proper locking. Or at least barriers...
Adding checks without locks usually does not solve race conditions.
Other start is proper ref counting, so the structures are not released
too early. We have several bugs like this in NFC before, so you can take
a look at their fixes.
Best regards,
Krzysztof
On Fri, 17 Nov 2023 18:18:56 +0530, Krzysztof Kozlowski wrote:
> Any checks would need to have proper locking. Or at least barriers...
> Adding checks without locks usually does not solve race conditions.
Yes of course. I just wanted to put whatever I tested out there.
> Other start is proper ref counting, so the structures are not released
> too early. We have several bugs like this in NFC before, so you can take
> a look at their fixes.
Sure.
Thanks,
Siddh