2022-01-16 06:45:23

by Krzysztof Kozlowski

[permalink] [raw]
Subject: [PATCH 0/7] nfc: llcp: fix and improvements

Hi,

Patch #1:
=========
Syzbot reported an easily reproducible NULL pointer dereference which I was
struggling to analyze:
https://syzkaller.appspot.com/bug?extid=7f23bcddf626e0593a39

Although direct fix is obvious, I could not actually find the exact race
condition scenario leading to it. The patch fixes the issue - at least under
my QEMU - however all this code looks racy, so I have a feeling I am plumbing
one leak without fixing root cause.

Therefore I would appreciate some more thoughts on first commit.

The rest of patches:
====================
These are improvements, rebased on top of #1, although should be independent.
They do not fix any experienced issue, just look correct to me from the code
point of view.

Testing
=======
Under QEMU only. The NFC/LLCP code was not really tested on a device.

Best regards,
Krzysztof

Krzysztof Kozlowski (7):
nfc: llcp: fix NULL error pointer dereference on sendmsg() after
failed bind()
nfc: llcp: nullify llcp_sock->dev on connect() error paths
nfc: llcp: simplify llcp_sock_connect() error paths
nfc: llcp: use centralized exiting of bind on errors
nfc: llcp: use test_bit()
nfc: llcp: protect nfc_llcp_sock_unlink() calls
nfc: llcp: Revert "NFC: Keep socket alive until the DISC PDU is
actually sent"

net/nfc/llcp.h | 1 -
net/nfc/llcp_core.c | 9 +-------
net/nfc/llcp_sock.c | 54 ++++++++++++++++++++++++---------------------
3 files changed, 30 insertions(+), 34 deletions(-)

--
2.32.0


2022-01-16 06:45:28

by Krzysztof Kozlowski

[permalink] [raw]
Subject: [PATCH 2/7] nfc: llcp: nullify llcp_sock->dev on connect() error paths

Nullify the llcp_sock->dev on llcp_sock_connect() error paths,
symmetrically to the code llcp_sock_bind(). The non-NULL value of
llcp_sock->dev is used in a few places to check whether the socket is
still valid.

There was no particular issue observed with missing NULL assignment in
connect() error path, however an similar case - in the bind() error path
- was triggereable. That one was fixed in commit 4ac06a1e013c ("nfc:
fix NULL ptr dereference in llcp_sock_getname() after failed connect"),
so the change here seems logical as well.

Signed-off-by: Krzysztof Kozlowski <[email protected]>
---
net/nfc/llcp_sock.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index 0b93a17b9f11..e92440c0c4c7 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -764,6 +764,7 @@ static int llcp_sock_connect(struct socket *sock, struct sockaddr *_addr,
llcp_sock->local = NULL;

put_dev:
+ llcp_sock->dev = NULL;
nfc_put_device(dev);

error:
--
2.32.0

2022-01-16 06:45:30

by Krzysztof Kozlowski

[permalink] [raw]
Subject: [PATCH 1/7] nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind()

Syzbot detected a NULL pointer dereference of nfc_llcp_sock->dev pointer
(which is a 'struct nfc_dev *') with calls to llcp_sock_sendmsg() after
a failed llcp_sock_bind(). The message being sent is a SOCK_DGRAM.

KASAN report:

BUG: KASAN: null-ptr-deref in nfc_alloc_send_skb+0x2d/0xc0
Read of size 4 at addr 00000000000005c8 by task llcp_sock_nfc_a/899

CPU: 5 PID: 899 Comm: llcp_sock_nfc_a Not tainted 5.16.0-rc6-next-20211224-00001-gc6437fbf18b0 #125
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x45/0x59
? nfc_alloc_send_skb+0x2d/0xc0
__kasan_report.cold+0x117/0x11c
? mark_lock+0x480/0x4f0
? nfc_alloc_send_skb+0x2d/0xc0
kasan_report+0x38/0x50
nfc_alloc_send_skb+0x2d/0xc0
nfc_llcp_send_ui_frame+0x18c/0x2a0
? nfc_llcp_send_i_frame+0x230/0x230
? __local_bh_enable_ip+0x86/0xe0
? llcp_sock_connect+0x470/0x470
? llcp_sock_connect+0x470/0x470
sock_sendmsg+0x8e/0xa0
____sys_sendmsg+0x253/0x3f0
...

The issue was visible only with multiple simultaneous calls to bind() and
sendmsg(), which resulted in most of the bind() calls to fail. The
bind() was failing on checking if there is available WKS/SDP/SAP
(respective bit in 'struct nfc_llcp_local' fields). When there was no
available WKS/SDP/SAP, the bind returned error but the sendmsg() to such
socket was able to trigger mentioned NULL pointer dereference of
nfc_llcp_sock->dev.

The code looks simply racy and currently it protects several paths
against race with checks for (!nfc_llcp_sock->local) which is NULL-ified
in error paths of bind(). The llcp_sock_sendmsg() did not have such
check but called function nfc_llcp_send_ui_frame() had, although not
protected with lock_sock().

Therefore the race could look like (same socket is used all the time):
CPU0 CPU1
==== ====
llcp_sock_bind()
- lock_sock()
- success
- release_sock()
- return 0
llcp_sock_sendmsg()
- lock_sock()
- release_sock()
llcp_sock_bind(), same socket
- lock_sock()
- error
- nfc_llcp_send_ui_frame()
- if (!llcp_sock->local)
- llcp_sock->local = NULL
- nfc_put_device(dev)
- dereference llcp_sock->dev
- release_sock()
- return -ERRNO

The nfc_llcp_send_ui_frame() checked llcp_sock->local outside of the
lock, which is racy and ineffective check. Instead, its caller
llcp_sock_sendmsg(), should perform the check inside lock_sock().

Reported-by: [email protected]
Fixes: b874dec21d1c ("NFC: Implement LLCP connection less Tx path")
Cc: <[email protected]>
Signed-off-by: Krzysztof Kozlowski <[email protected]>
---
net/nfc/llcp_sock.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index 6cfd30fc0798..0b93a17b9f11 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -789,6 +789,11 @@ static int llcp_sock_sendmsg(struct socket *sock, struct msghdr *msg,

lock_sock(sk);

+ if (!llcp_sock->local) {
+ release_sock(sk);
+ return -ENODEV;
+ }
+
if (sk->sk_type == SOCK_DGRAM) {
DECLARE_SOCKADDR(struct sockaddr_nfc_llcp *, addr,
msg->msg_name);
--
2.32.0

2022-01-16 06:45:33

by Krzysztof Kozlowski

[permalink] [raw]
Subject: [PATCH 6/7] nfc: llcp: protect nfc_llcp_sock_unlink() calls

nfc_llcp_sock_link() is called in all paths (bind/connect) as a last
action, still protected with lock_sock(). When cleaning up in
llcp_sock_release(), call nfc_llcp_sock_unlink() in a mirrored way:
earlier and still under the lock_sock().

Signed-off-by: Krzysztof Kozlowski <[email protected]>
---
net/nfc/llcp_sock.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index c9d5c427f035..5c5705f5028b 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -631,6 +631,11 @@ static int llcp_sock_release(struct socket *sock)
}
}

+ if (sock->type == SOCK_RAW)
+ nfc_llcp_sock_unlink(&local->raw_sockets, sk);
+ else
+ nfc_llcp_sock_unlink(&local->sockets, sk);
+
if (llcp_sock->reserved_ssap < LLCP_SAP_MAX)
nfc_llcp_put_ssap(llcp_sock->local, llcp_sock->ssap);

@@ -643,11 +648,6 @@ static int llcp_sock_release(struct socket *sock)
if (sk->sk_state == LLCP_DISCONNECTING)
return err;

- if (sock->type == SOCK_RAW)
- nfc_llcp_sock_unlink(&local->raw_sockets, sk);
- else
- nfc_llcp_sock_unlink(&local->sockets, sk);
-
out:
sock_orphan(sk);
sock_put(sk);
--
2.32.0

2022-01-16 06:45:34

by Krzysztof Kozlowski

[permalink] [raw]
Subject: [PATCH 4/7] nfc: llcp: use centralized exiting of bind on errors

Coding style encourages centralized exiting of functions, so rewrite
llcp_sock_bind() error paths to use such pattern. This reduces the
duplicated cleanup code, make success path visually shorter and also
cleans up the errors in proper order (in reversed way from
initialization).

No functional impact expected.

Signed-off-by: Krzysztof Kozlowski <[email protected]>
---
net/nfc/llcp_sock.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index fdf0856182c6..c9d5c427f035 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -108,21 +108,13 @@ static int llcp_sock_bind(struct socket *sock, struct sockaddr *addr, int alen)
llcp_sock->service_name_len,
GFP_KERNEL);
if (!llcp_sock->service_name) {
- nfc_llcp_local_put(llcp_sock->local);
- llcp_sock->local = NULL;
- llcp_sock->dev = NULL;
ret = -ENOMEM;
- goto put_dev;
+ goto sock_llcp_put_local;
}
llcp_sock->ssap = nfc_llcp_get_sdp_ssap(local, llcp_sock);
if (llcp_sock->ssap == LLCP_SAP_MAX) {
- nfc_llcp_local_put(llcp_sock->local);
- llcp_sock->local = NULL;
- kfree(llcp_sock->service_name);
- llcp_sock->service_name = NULL;
- llcp_sock->dev = NULL;
ret = -EADDRINUSE;
- goto put_dev;
+ goto free_service_name;
}

llcp_sock->reserved_ssap = llcp_sock->ssap;
@@ -132,6 +124,19 @@ static int llcp_sock_bind(struct socket *sock, struct sockaddr *addr, int alen)
pr_debug("Socket bound to SAP %d\n", llcp_sock->ssap);

sk->sk_state = LLCP_BOUND;
+ nfc_put_device(dev);
+ release_sock(sk);
+
+ return 0;
+
+free_service_name:
+ kfree(llcp_sock->service_name);
+ llcp_sock->service_name = NULL;
+
+sock_llcp_put_local:
+ nfc_llcp_local_put(llcp_sock->local);
+ llcp_sock->local = NULL;
+ llcp_sock->dev = NULL;

put_dev:
nfc_put_device(dev);
--
2.32.0

2022-01-16 06:45:34

by Krzysztof Kozlowski

[permalink] [raw]
Subject: [PATCH 3/7] nfc: llcp: simplify llcp_sock_connect() error paths

The llcp_sock_connect() error paths were using a mixed way of central
exit (goto) and cleanup

Signed-off-by: Krzysztof Kozlowski <[email protected]>
---
net/nfc/llcp_sock.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index e92440c0c4c7..fdf0856182c6 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -712,10 +712,8 @@ static int llcp_sock_connect(struct socket *sock, struct sockaddr *_addr,
llcp_sock->local = nfc_llcp_local_get(local);
llcp_sock->ssap = nfc_llcp_get_local_ssap(local);
if (llcp_sock->ssap == LLCP_SAP_MAX) {
- nfc_llcp_local_put(llcp_sock->local);
- llcp_sock->local = NULL;
ret = -ENOMEM;
- goto put_dev;
+ goto sock_llcp_put_local;
}

llcp_sock->reserved_ssap = llcp_sock->ssap;
@@ -760,11 +758,13 @@ static int llcp_sock_connect(struct socket *sock, struct sockaddr *_addr,

sock_llcp_release:
nfc_llcp_put_ssap(local, llcp_sock->ssap);
+
+sock_llcp_put_local:
nfc_llcp_local_put(llcp_sock->local);
llcp_sock->local = NULL;
+ llcp_sock->dev = NULL;

put_dev:
- llcp_sock->dev = NULL;
nfc_put_device(dev);

error:
--
2.32.0

2022-01-16 06:45:41

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [PATCH 1/7] nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind()

On 15/01/2022 13:26, Krzysztof Kozlowski wrote:
> Syzbot detected a NULL pointer dereference of nfc_llcp_sock->dev pointer
> (which is a 'struct nfc_dev *') with calls to llcp_sock_sendmsg() after
> a failed llcp_sock_bind(). The message being sent is a SOCK_DGRAM.
>
> KASAN report:
>
> BUG: KASAN: null-ptr-deref in nfc_alloc_send_skb+0x2d/0xc0
> Read of size 4 at addr 00000000000005c8 by task llcp_sock_nfc_a/899
>
> CPU: 5 PID: 899 Comm: llcp_sock_nfc_a Not tainted 5.16.0-rc6-next-20211224-00001-gc6437fbf18b0 #125
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> Call Trace:
> <TASK>
> dump_stack_lvl+0x45/0x59
> ? nfc_alloc_send_skb+0x2d/0xc0
> __kasan_report.cold+0x117/0x11c
> ? mark_lock+0x480/0x4f0
> ? nfc_alloc_send_skb+0x2d/0xc0
> kasan_report+0x38/0x50
> nfc_alloc_send_skb+0x2d/0xc0
> nfc_llcp_send_ui_frame+0x18c/0x2a0
> ? nfc_llcp_send_i_frame+0x230/0x230
> ? __local_bh_enable_ip+0x86/0xe0
> ? llcp_sock_connect+0x470/0x470
> ? llcp_sock_connect+0x470/0x470
> sock_sendmsg+0x8e/0xa0
> ____sys_sendmsg+0x253/0x3f0
> ...
>
> The issue was visible only with multiple simultaneous calls to bind() and
> sendmsg(), which resulted in most of the bind() calls to fail. The
> bind() was failing on checking if there is available WKS/SDP/SAP
> (respective bit in 'struct nfc_llcp_local' fields). When there was no
> available WKS/SDP/SAP, the bind returned error but the sendmsg() to such
> socket was able to trigger mentioned NULL pointer dereference of
> nfc_llcp_sock->dev.
>
> The code looks simply racy and currently it protects several paths
> against race with checks for (!nfc_llcp_sock->local) which is NULL-ified
> in error paths of bind(). The llcp_sock_sendmsg() did not have such
> check but called function nfc_llcp_send_ui_frame() had, although not
> protected with lock_sock().
>
> Therefore the race could look like (same socket is used all the time):
> CPU0 CPU1
> ==== ====
> llcp_sock_bind()
> - lock_sock()
> - success
> - release_sock()
> - return 0
> llcp_sock_sendmsg()
> - lock_sock()
> - release_sock()
> llcp_sock_bind(), same socket
> - lock_sock()
> - error
> - nfc_llcp_send_ui_frame()
> - if (!llcp_sock->local)
> - llcp_sock->local = NULL
> - nfc_put_device(dev)
> - dereference llcp_sock->dev
> - release_sock()
> - return -ERRNO
>
> The nfc_llcp_send_ui_frame() checked llcp_sock->local outside of the
> lock, which is racy and ineffective check. Instead, its caller
> llcp_sock_sendmsg(), should perform the check inside lock_sock().
>
> Reported-by: [email protected]

Syzbot confirmed fix, so this could be replaced with:

Reported-and-tested-by: [email protected]


Best regards,
Krzysztof

2022-01-16 06:46:12

by Krzysztof Kozlowski

[permalink] [raw]
Subject: [PATCH 5/7] nfc: llcp: use test_bit()

Use test_bit() instead of open-coding it.

Signed-off-by: Krzysztof Kozlowski <[email protected]>
---
net/nfc/llcp_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/nfc/llcp_core.c b/net/nfc/llcp_core.c
index 5ad5157aa9c5..b70d5042bf74 100644
--- a/net/nfc/llcp_core.c
+++ b/net/nfc/llcp_core.c
@@ -383,7 +383,7 @@ u8 nfc_llcp_get_sdp_ssap(struct nfc_llcp_local *local,
pr_debug("WKS %d\n", ssap);

/* This is a WKS, let's check if it's free */
- if (local->local_wks & BIT(ssap)) {
+ if (test_bit(ssap, &local->local_wks)) {
mutex_unlock(&local->sdp_lock);

return LLCP_SAP_MAX;
--
2.32.0

2022-01-16 06:46:13

by Krzysztof Kozlowski

[permalink] [raw]
Subject: [PATCH 7/7] nfc: llcp: Revert "NFC: Keep socket alive until the DISC PDU is actually sent"

This reverts commit 17f7ae16aef1f58bc4af4c7a16b8778a91a30255.

The commit brought a new socket state LLCP_DISCONNECTING, which was
never set, only read, so socket could never set to such state.

Remove the dead code.

Signed-off-by: Krzysztof Kozlowski <[email protected]>
---
net/nfc/llcp.h | 1 -
net/nfc/llcp_core.c | 7 -------
net/nfc/llcp_sock.c | 7 -------
3 files changed, 15 deletions(-)

diff --git a/net/nfc/llcp.h b/net/nfc/llcp.h
index d49d4bf2e37c..c1d9be636933 100644
--- a/net/nfc/llcp.h
+++ b/net/nfc/llcp.h
@@ -6,7 +6,6 @@
enum llcp_state {
LLCP_CONNECTED = 1, /* wait_for_packet() wants that */
LLCP_CONNECTING,
- LLCP_DISCONNECTING,
LLCP_CLOSED,
LLCP_BOUND,
LLCP_LISTEN,
diff --git a/net/nfc/llcp_core.c b/net/nfc/llcp_core.c
index b70d5042bf74..3364caabef8b 100644
--- a/net/nfc/llcp_core.c
+++ b/net/nfc/llcp_core.c
@@ -737,13 +737,6 @@ static void nfc_llcp_tx_work(struct work_struct *work)
print_hex_dump_debug("LLCP Tx: ", DUMP_PREFIX_OFFSET,
16, 1, skb->data, skb->len, true);

- if (ptype == LLCP_PDU_DISC && sk != NULL &&
- sk->sk_state == LLCP_DISCONNECTING) {
- nfc_llcp_sock_unlink(&local->sockets, sk);
- sock_orphan(sk);
- sock_put(sk);
- }
-
if (ptype == LLCP_PDU_I)
copy_skb = skb_copy(skb, GFP_ATOMIC);

diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c
index 5c5705f5028b..4ca35791c93b 100644
--- a/net/nfc/llcp_sock.c
+++ b/net/nfc/llcp_sock.c
@@ -641,13 +641,6 @@ static int llcp_sock_release(struct socket *sock)

release_sock(sk);

- /* Keep this sock alive and therefore do not remove it from the sockets
- * list until the DISC PDU has been actually sent. Otherwise we would
- * reply with DM PDUs before sending the DISC one.
- */
- if (sk->sk_state == LLCP_DISCONNECTING)
- return err;
-
out:
sock_orphan(sk);
sock_put(sk);
--
2.32.0

2022-01-17 06:56:26

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 0/7] nfc: llcp: fix and improvements


Please don't mix cleanups and bug fixes.

Thank you.

2022-01-17 08:50:27

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [PATCH 1/7] nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind()

On 16/01/2022 14:41, Hillf Danton wrote:
> Hey Krzysztof
>
> On Sat, 15 Jan 2022 13:26:44 +0100 Krzysztof Kozlowski wrote:
>> +++ b/net/nfc/llcp_sock.c
>> @@ -789,6 +789,11 @@ static int llcp_sock_sendmsg(struct socket *sock, struct msghdr *msg,
>>
>> lock_sock(sk);
>>
>> + if (!llcp_sock->local) {
>> + release_sock(sk);
>> + return -ENODEV;
>> + }
>> +
>> if (sk->sk_type == SOCK_DGRAM) {
>> DECLARE_SOCKADDR(struct sockaddr_nfc_llcp *, addr,
>> msg->msg_name);
>> --
>> 2.32.0
>
> Given the same check for llcp local in nfc_llcp_send_ui_frame(), adding
> another check does not help.

Helps, because other is not protected with lock. The other could be
removed, because it is simply wrong, but I did not check it.

The patch fixes the report and reproducible race, but maybe does not
necessarily fix entirely the race (which maybe this is what you meant by
"does not help"?).


Best regards,
Krzysztof

2022-01-17 09:28:36

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [PATCH 0/7] nfc: llcp: fix and improvements

On 16/01/2022 13:32, David Miller wrote:
>
> Please don't mix cleanups and bug fixes.

The fix is the first patch, so it is easy to apply. Do you wish me to
resend it?


Best regards,
Krzysztof

2022-01-21 07:24:10

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH 0/7] nfc: llcp: fix and improvements

On Sun, 16 Jan 2022 17:58:28 +0100 Krzysztof Kozlowski wrote:
> On 16/01/2022 13:32, David Miller wrote:
> >
> > Please don't mix cleanups and bug fixes.
>
> The fix is the first patch, so it is easy to apply. Do you wish me to
> resend it?

Yes, please. 99% sure Dave is expecting you to do so.

FWIW the scripts I use for tag normalization and adding Links won't
work when picking one patch out of entire series so repost is best.
Thanks!