There are two scenarios that might cause refcount leak
issues of ax25_dev.
Scenario one:
The refcount of ax25_dev potentially increase more
than once in ax25_addr_ax25dev(), which will cause
memory leak.
In order to fix the above issue, only increase the
refcount of ax25_dev once, when the res is not null.
Scenario two:
The original code sets the refcount of ax25_dev to 1
in the initial stage and then increase the refcount
when the ax25_dev is added to the ax25_dev_list. As a
result, the refcount of ax25_dev is 2. But when the
device is shutting down. The ax25_dev_device_down()
drops the refcount once or twice depending on if we
goto unlock_put or not, which will cause memory leak.
In order to mitigate the above issues, only increase
the refcount of ax25_dev when the ax25_dev is added
to the ax25_dev_list and decrease the refcount of
ax25_dev after it is removed from the ax25_dev_list.
What's more, the ax25_dev should not be deallocated
directly by kfree() in ax25_dev_free(), replace it
with ax25_dev_put() instead.
Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs")
Reported by: Dan Carpenter <[email protected]>
Signed-off-by: Duoming Zhou <[email protected]>
---
net/ax25/ax25_dev.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/net/ax25/ax25_dev.c b/net/ax25/ax25_dev.c
index 282ec581c07..0e6dd98d3fa 100644
--- a/net/ax25/ax25_dev.c
+++ b/net/ax25/ax25_dev.c
@@ -37,8 +37,9 @@ ax25_dev *ax25_addr_ax25dev(ax25_address *addr)
for (ax25_dev = ax25_dev_list; ax25_dev != NULL; ax25_dev = ax25_dev->next)
if (ax25cmp(addr, (const ax25_address *)ax25_dev->dev->dev_addr) == 0) {
res = ax25_dev;
- ax25_dev_hold(ax25_dev);
}
+ if (res)
+ ax25_dev_hold(res);
spin_unlock_bh(&ax25_dev_lock);
return res;
@@ -58,7 +59,6 @@ void ax25_dev_device_up(struct net_device *dev)
return;
}
- refcount_set(&ax25_dev->refcount, 1);
dev->ax25_ptr = ax25_dev;
ax25_dev->dev = dev;
netdev_hold(dev, &ax25_dev->dev_tracker, GFP_KERNEL);
@@ -88,7 +88,7 @@ void ax25_dev_device_up(struct net_device *dev)
ax25_dev->next = ax25_dev_list;
ax25_dev_list = ax25_dev;
spin_unlock_bh(&ax25_dev_lock);
- ax25_dev_hold(ax25_dev);
+ refcount_set(&ax25_dev->refcount, 1);
ax25_register_dev_sysctl(ax25_dev);
}
@@ -135,7 +135,6 @@ void ax25_dev_device_down(struct net_device *dev)
unlock_put:
spin_unlock_bh(&ax25_dev_lock);
- ax25_dev_put(ax25_dev);
dev->ax25_ptr = NULL;
netdev_put(dev, &ax25_dev->dev_tracker);
ax25_dev_put(ax25_dev);
@@ -208,7 +207,7 @@ void __exit ax25_dev_free(void)
s = ax25_dev;
netdev_put(ax25_dev->dev, &ax25_dev->dev_tracker);
ax25_dev = ax25_dev->next;
- kfree(s);
+ ax25_dev_put(s);
}
ax25_dev_list = NULL;
spin_unlock_bh(&ax25_dev_lock);
--
2.17.1
…
> In order to mitigate the above issues, only increase
> the refcount of ax25_dev when the ax25_dev is added
> to the ax25_dev_list and decrease the refcount of
> ax25_dev after it is removed from the ax25_dev_list.
…
* I suggest to use more than 53 characters in lines of such a change description.
* Can it be nicer to mention also the term “reference counting” for
an improved commit message?
Regards,
Markus
I'm always happy to take credit for stuff but the Reported by should go
to Lars and Miroslav.
Reported-by: Lars Kellogg-Stedman <[email protected]>
Reported-by: Miroslav Skoric <[email protected]>
Lars, could you test this please and let us know if it helps?
regards,
dan carpenter
On Wed, May 01, 2024 at 02:02:18PM +0800, Duoming Zhou wrote:
> There are two scenarios that might cause refcount leak
> issues of ax25_dev.
This patch doesn't address the refcount leaks I reported earlier and
resolved in the patch I posted [1] last week.
Assume we have the following two interfaces configured on a system:
$ cat /etc/ax25/axports
udp0 test0-0 9600 255 2 axudp0
udp1 test0-1 9600 255 2 axudp1
And we have ax25d listening on both interfaces:
[udp0]
default * * * * * * - root /usr/sbin/axwrapper axwrapper -- /bin/sh sh /etc/ax25/example-output.sh
[udp1]
default * * * * * * - root /usr/sbin/axwrapper axwrapper -- /bin/sh sh /etc/ax25/example-output.sh
Using the 'ax-devs' and 'ax-sockets' gdb commands shown at the end of
this message, we start with:
(gdb) ax-devs
ax1 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
ax0 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
(gdb) ax-sockets
0xffff8881002b6800 if:ax1 state:0 refcnt:2 dev_tracker:0xffff888100ded200
0xffff888101ac4e00 if:ax0 state:0 refcnt:2 dev_tracker:0xffff888100dec4c0
We initiate a connection from ax0 to ax1:
call -r udp0 test0-1
When we first enter ax25_rcv, we have:
(gdb) ax-devs
ax1 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
ax0 ax_refcnt:3 dev_refcnt:10 dev_untracked:1 dev_notrack:1
(gdb) ax-sockets
0xffff888101ac8000 if:ax0 state:1 refcnt:2 dev_tracker:0xffff888100dedb80
0xffff8881002b6800 if:ax1 state:0 refcnt:2 dev_tracker:0xffff888100ded200
0xffff888101ac4e00 if:ax0 state:0 refcnt:2 dev_tracker:0xffff888100dec4c0
After we reach line 413 (in net/ax25/ax25_in.c) and add a new control
block:
ax25_cb_add(ax25)
We have:
(gdb) ax-devs
ax1 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
ax0 ax_refcnt:3 dev_refcnt:10 dev_untracked:1 dev_notrack:1
(gdb) ax-sockets
0xffff88810245ac00 if:ax1 state:3 refcnt:2 dev_tracker:0x0 <fixed_percpu_data>
0xffff88810245ba00 if:ax0 state:1 refcnt:2 dev_tracker:0xffff88810136c800
0xffff888100c79e00 if:ax1 state:0 refcnt:2 dev_tracker:0xffff88810136c6e0
0xffff8881018e9800 if:ax0 state:0 refcnt:2 dev_tracker:0xffff88810170c860
Note that (a) ax25->dev_tracker is NULL, and (b) we have incremeted the
refcount on ax0 (the source interface), but not on ax1 (the destination
interface). When we call ax25_release for this control block, we get to:
netdev_put(ax25_dev->dev, &ax25->dev_tracker);
ax25_dev_put(ax25_dev);
With:
(gdb) ax-devs
ax1 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
ax0 ax_refcnt:3 dev_refcnt:10 dev_untracked:1 dev_notrack:1
After the calls to netdev_put() and ax25_dev_put(), we have:
(gdb) ax-devs
ax1 ax_refcnt:1 dev_refcnt:8 dev_untracked:-1073741824 dev_notrack:1
ax0 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
You can see that (a) ax25_dev->dev->refcnt_tracker->untracked is now
invalid, and ax25_dev->dev->dev_refcnt is in trouble: it decrements by
one for each closed connection, even though it was never incremented
when we accepted the connection. The underflow in
..refcnt_tracker->untracked yields the traceback with:
refcount_t: decrement hit 0; leaking memory.
Additional connections will eventually trigger more problems; we will
ultimately underflow ax25_dev->dev->dev_refcnt, but we may also run into
memory corruption because of the invalid tracker data, resulting in:
BUG: unable to handle page fault for address: 00000010000003b0
The patch I submitted last week resolves all of the above issues and has
no refcount leaks for this particular code path. In order to avoid the
refcount leaks, those _put() calls in ax25_release need to be balanced
by _hold() calls when accepting a new connection (or we need to wrap
them in a conditional so that they're not called when ax25->dev_tracker
is NULL).
GDB commands:
define ax-devs
set $x = ax25_dev_list
while ($x != 0)
printf "%s ax_refcnt:%d dev_refcnt:%d dev_untracked:%d dev_notrack:%d\n", $x->dev->name, \
$x->refcount->refs->counter, \
$x->dev->dev_refcnt->refs->counter, \
$x->dev->refcnt_tracker->untracked->refs->counter, \
$x->dev->refcnt_tracker->no_tracker->refs->counter
set $x = $x->next
end
end
define ax-sockets
set $x = ax25_list->first
while ($x != 0)
set $cb = (ax25_cb *)($x)
printf "%s if:%s state:%d refcnt:%d dev_tracker:%s\n", \
$_as_string($cb), \
$cb->ax25_dev->dev->name, \
$cb->state, \
$cb->refcount->refs->counter, \
$_as_string($cb->dev_tracker)
set $x = $x->next
end
end
[1]: https://marc.info/?l=linux-hams&m=171447153903965&w=2
--
Lars Kellogg-Stedman <[email protected]> | larsks @ {irc,twitter,github}
http://blog.oddbit.com/ | N1LKS
On Wed, 1 May 2024 20:43:37 +0300 Dan Carpenter wrote:
> I'm always happy to take credit for stuff but the Reported by should go
> to Lars and Miroslav.
>
> Reported-by: Lars Kellogg-Stedman <[email protected]>
> Reported-by: Miroslav Skoric <[email protected]>
This patch is not related with the problem raised by Lars Kellogg-Stedman
and Miroslav Skoric, it only solves the reference counting leak issues of
ax25_dev in ax25_addr_ax25dev() and ax25_dev_device_down(). So I think
there is no need to change the "Reported by" label.
Best regards,
Duoming Zhou
On Thu, May 02, 2024 at 12:35:44PM +0800, [email protected] wrote:
> On Wed, 1 May 2024 20:43:37 +0300 Dan Carpenter wrote:
> > I'm always happy to take credit for stuff but the Reported by should go
> > to Lars and Miroslav.
> >
> > Reported-by: Lars Kellogg-Stedman <[email protected]>
> > Reported-by: Miroslav Skoric <[email protected]>
>
> This patch is not related with the problem raised by Lars Kellogg-Stedman
> and Miroslav Skoric, it only solves the reference counting leak issues of
> ax25_dev in ax25_addr_ax25dev() and ax25_dev_device_down(). So I think
> there is no need to change the "Reported by" label.
>
Ah... I was really hoping it was related to the other bugs.
Okay, what about we separate this into different patches one for each
bug? The changes to ax25_addr_ax25dev() and ax25_dev_free() are
obvious and could go in as-is but as two separate patches.
The changes to ax25_dev_device_up/down() are more subtle.
The ax25_dev_list stuff is frustrating. It would be so much easier if
it were a normal list and you could just do:
/*
* Remove any packet forwarding that points to this device.
*/
list_for_each_entry(s, ax25_dev_list, list) {
if (s->forward == dev)
s->forward = NULL;
}
list_for_each_entry(s, ax25_dev_list, list) {
if (s == ax25_dev) {
list_del(s);
free_net = true;
break;
}
}
spin_unlock_bh(&ax25_dev_lock);
dev->ax25_ptr = NULL;
if (free_net)
netdev_put(dev, &ax25_dev->dev_tracker);
ax25_dev_put(ax25_dev);
}
Why do we call netdev_put() on that one path? Btw, here is an untested
conversion to lists...
regards,
dan carpenter
diff --git a/include/net/ax25.h b/include/net/ax25.h
index 0d939e5aee4e..c2a85fd3f5ea 100644
--- a/include/net/ax25.h
+++ b/include/net/ax25.h
@@ -216,7 +216,7 @@ typedef struct {
struct ctl_table;
typedef struct ax25_dev {
- struct ax25_dev *next;
+ struct list_head list;
struct net_device *dev;
netdevice_tracker dev_tracker;
@@ -330,7 +330,6 @@ int ax25_addr_size(const ax25_digi *);
void ax25_digi_invert(const ax25_digi *, ax25_digi *);
/* ax25_dev.c */
-extern ax25_dev *ax25_dev_list;
extern spinlock_t ax25_dev_lock;
#if IS_ENABLED(CONFIG_AX25)
diff --git a/net/ax25/ax25_dev.c b/net/ax25/ax25_dev.c
index 282ec581c072..b632af38f1e1 100644
--- a/net/ax25/ax25_dev.c
+++ b/net/ax25/ax25_dev.c
@@ -22,11 +22,12 @@
#include <net/sock.h>
#include <linux/uaccess.h>
#include <linux/fcntl.h>
+#include <linux/list.h>
#include <linux/mm.h>
#include <linux/interrupt.h>
#include <linux/init.h>
-ax25_dev *ax25_dev_list;
+static struct list_head ax25_dev_list;
DEFINE_SPINLOCK(ax25_dev_lock);
ax25_dev *ax25_addr_ax25dev(ax25_address *addr)
@@ -34,11 +35,12 @@ ax25_dev *ax25_addr_ax25dev(ax25_address *addr)
ax25_dev *ax25_dev, *res = NULL;
spin_lock_bh(&ax25_dev_lock);
- for (ax25_dev = ax25_dev_list; ax25_dev != NULL; ax25_dev = ax25_dev->next)
+ list_for_each_entry(ax25_dev, &ax25_dev_list, list) {
if (ax25cmp(addr, (const ax25_address *)ax25_dev->dev->dev_addr) == 0) {
res = ax25_dev;
ax25_dev_hold(ax25_dev);
}
+ }
spin_unlock_bh(&ax25_dev_lock);
return res;
@@ -52,6 +54,10 @@ void ax25_dev_device_up(struct net_device *dev)
{
ax25_dev *ax25_dev;
+ // FIXME: do call this in probe or something
+ if (!ax25_dev_list.next)
+ INIT_LIST_HEAD(&ax25_dev_list);
+
ax25_dev = kzalloc(sizeof(*ax25_dev), GFP_KERNEL);
if (!ax25_dev) {
printk(KERN_ERR "AX.25: ax25_dev_device_up - out of memory\n");
@@ -85,8 +91,7 @@ void ax25_dev_device_up(struct net_device *dev)
#endif
spin_lock_bh(&ax25_dev_lock);
- ax25_dev->next = ax25_dev_list;
- ax25_dev_list = ax25_dev;
+ list_add(&ax25_dev->list, &ax25_dev_list);
spin_unlock_bh(&ax25_dev_lock);
ax25_dev_hold(ax25_dev);
@@ -111,23 +116,18 @@ void ax25_dev_device_down(struct net_device *dev)
/*
* Remove any packet forwarding that points to this device.
*/
- for (s = ax25_dev_list; s != NULL; s = s->next)
+ list_for_each_entry(s, &ax25_dev_list, list) {
if (s->forward == dev)
s->forward = NULL;
-
- if ((s = ax25_dev_list) == ax25_dev) {
- ax25_dev_list = s->next;
- goto unlock_put;
}
- while (s != NULL && s->next != NULL) {
- if (s->next == ax25_dev) {
- s->next = ax25_dev->next;
+ list_for_each_entry(s, &ax25_dev_list, list) {
+ if (s == ax25_dev) {
+ list_del(&s->list);
goto unlock_put;
}
-
- s = s->next;
}
+
spin_unlock_bh(&ax25_dev_lock);
dev->ax25_ptr = NULL;
ax25_dev_put(ax25_dev);
@@ -200,16 +200,13 @@ struct net_device *ax25_fwd_dev(struct net_device *dev)
*/
void __exit ax25_dev_free(void)
{
- ax25_dev *s, *ax25_dev;
+ ax25_dev *s, *n;
spin_lock_bh(&ax25_dev_lock);
- ax25_dev = ax25_dev_list;
- while (ax25_dev != NULL) {
- s = ax25_dev;
- netdev_put(ax25_dev->dev, &ax25_dev->dev_tracker);
- ax25_dev = ax25_dev->next;
+ list_for_each_entry_safe(s, n, &ax25_dev_list, list) {
+ netdev_put(s->dev, &s->dev_tracker);
+ list_del(&s->list);
kfree(s);
}
- ax25_dev_list = NULL;
spin_unlock_bh(&ax25_dev_lock);
}
On Thu, 2024-05-02 at 10:56 +0300, Dan Carpenter wrote:
> On Thu, May 02, 2024 at 12:35:44PM +0800, [email protected] wrote:
> > On Wed, 1 May 2024 20:43:37 +0300 Dan Carpenter wrote:
> > > I'm always happy to take credit for stuff but the Reported by should go
> > > to Lars and Miroslav.
> > >
> > > Reported-by: Lars Kellogg-Stedman <[email protected]>
> > > Reported-by: Miroslav Skoric <[email protected]>
> >
> > This patch is not related with the problem raised by Lars Kellogg-Stedman
> > and Miroslav Skoric, it only solves the reference counting leak issues of
> > ax25_dev in ax25_addr_ax25dev() and ax25_dev_device_down(). So I think
> > there is no need to change the "Reported by" label.
> >
>
> Ah... I was really hoping it was related to the other bugs.
>
> Okay, what about we separate this into different patches one for each
> bug? The changes to ax25_addr_ax25dev() and ax25_dev_free() are
> obvious and could go in as-is but as two separate patches.
I agree it would be better to split this up, the changelog itself hints
at 2 separated issues and fixes.
Thanks,
Paolo
Could you test this diff?
regards,
dan carpenter
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 558e158c98d0..a7f96a4ceff4 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1129,8 +1129,10 @@ static int ax25_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
/*
* User already set interface with SO_BINDTODEVICE
*/
- if (ax25->ax25_dev != NULL)
+ if (ax25->ax25_dev != NULL) {
+ ax25_dev_hold(ax25->ax25_dev);
goto done;
+ }
if (addr_len > sizeof(struct sockaddr_ax25) && addr->fsa_ax25.sax25_ndigis == 1) {
if (ax25cmp(&addr->fsa_digipeater[0], &null_ax25_address) != 0 &&
On Fri, May 03, 2024 at 11:36:37PM +0300, Dan Carpenter wrote:
> Could you test this diff?
With that diff applied, there is no kernel panic, but I see the same
refcount errors that I saw before the latest series of patches from
Duoming:
refcount_t: decrement hit 0; leaking memory.
refcount_t: underflow; use-after-free.
--
Lars Kellogg-Stedman <[email protected]> | larsks @ {irc,twitter,github}
http://blog.oddbit.com/ | N1LKS
On Wed, May 01, 2024 at 02:02:18PM +0800, Duoming Zhou wrote:
> @@ -58,7 +59,6 @@ void ax25_dev_device_up(struct net_device *dev)
> return;
> }
>
> - refcount_set(&ax25_dev->refcount, 1);
Let's keep this here, and just delete the ax25_dev_hold(). It makes
the diff smaller and I like setting the refcount earlier anyway.
> dev->ax25_ptr = ax25_dev;
Let's move this assignment under the spinlock where ax25_dev_hold() was.
> ax25_dev->dev = dev;
> netdev_hold(dev, &ax25_dev->dev_tracker, GFP_KERNEL);
> @@ -88,7 +88,7 @@ void ax25_dev_device_up(struct net_device *dev)
> ax25_dev->next = ax25_dev_list;
> ax25_dev_list = ax25_dev;
> spin_unlock_bh(&ax25_dev_lock);
> - ax25_dev_hold(ax25_dev);
> + refcount_set(&ax25_dev->refcount, 1);
>
> ax25_register_dev_sysctl(ax25_dev);
> }
> @@ -135,7 +135,6 @@ void ax25_dev_device_down(struct net_device *dev)
>
> unlock_put:
> spin_unlock_bh(&ax25_dev_lock);
> - ax25_dev_put(ax25_dev);
> dev->ax25_ptr = NULL;
> netdev_put(dev, &ax25_dev->dev_tracker);
> ax25_dev_put(ax25_dev);
So far as I can see, the ax25_dev should be on the list. Also, I think
the dev->ax25_ptr = NULL; assignment should be under the lock. So this
code should just look like:
list_for_each_entry(s, &ax25_dev_list, list) {
if (s->forward == dev)
s->forward = NULL;
}
list_for_each_entry(s, &ax25_dev_list, list) {
if (s == ax25_dev) {
list_del(&s->list);
break;
}
}
dev->ax25_ptr = NULL;
spin_unlock_bh(&ax25_dev_lock);
netdev_put(dev, &ax25_dev->dev_tracker);
ax25_dev_put(ax25_dev);
}
Also it should just be on the list once... In fact, it's impossible for
one pointer to be on a list twice. So it would be nice to add a break;
in ax25_addr_ax25dev(). It doesn't change the code, it just makes it
more obvious.
ax25_dev *ax25_addr_ax25dev(ax25_address *addr)
{
ax25_dev *ax25_dev, *res = NULL;
spin_lock_bh(&ax25_dev_lock);
list_for_each_entry(ax25_dev, &ax25_dev_list, list) {
if (ax25cmp(addr, (const ax25_address *)ax25_dev->dev->dev_addr) == 0) {
res = ax25_dev;
ax25_dev_hold(res);
break;
}
}
spin_unlock_bh(&ax25_dev_lock);
return res;
}
regards,
dan carpenter
On Fri, May 03, 2024 at 07:40:32PM -0400, Lars Kellogg-Stedman wrote:
> On Fri, May 03, 2024 at 11:36:37PM +0300, Dan Carpenter wrote:
> > Could you test this diff?
>
> With that diff applied, there is no kernel panic, but I see the same
> refcount errors that I saw before the latest series of patches from
> Duoming:
Wait, which panic is this? The NULL dereference introduce by the
"ax25_dev" vs "res" typo?
>
> refcount_t: decrement hit 0; leaking memory.
> refcount_t: underflow; use-after-free.
Hm... Is there a missing netdev_hold() in ax25_bind() on the
"User already set interface with SO_BINDTODEVICE" path? That would
fit with the commit 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by
ax25_cb_del()") which introduced the bug.
I'm not really sure I understand how netdev_hold() works.
(My patch here is correct, but apparently that's not the bug).
regards,
dan carpenter
On Sat, May 04, 2024 at 03:16:55PM +0300, Dan Carpenter wrote:
> Wait, which panic is this? The NULL dereference introduce by the
> "ax25_dev" vs "res" typo?
Right, that one. Your diff was on top of Duoming's patches, which had
earlier introduced that kernel panic. Just confirming that things are
working (for some value of "working") with your latest change.
> > refcount_t: decrement hit 0; leaking memory.
> > refcount_t: underflow; use-after-free.
>
> Hm... Is there a missing netdev_hold() in ax25_bind() on the
> "User already set interface with SO_BINDTODEVICE" path?
There's a missing netdev_hold() in the path for *inbound* packets
(ax25_rcv -> ax25_accept).
I added some tracepoints [1] so that we can see calls to
netdev_{put,hold} in a function graph. Without my patches, the graph for
an incoming connections looks like [2]:
# tracer: function_graph
#
# CPU TASK/PID DURATION FUNCTION CALLS
# | | | | | | | | |
------------------------------------------
0) <idle>-0 => kworker-29
------------------------------------------
0) kworker-29 | | ax25_kiss_rcv() {
0) kworker-29 | | ax25_rcv.isra.0() {
0) kworker-29 | 0.168 us | ax25_addr_parse();
0) kworker-29 | 0.094 us | ax25_addr_size();
0) kworker-29 | 0.136 us | ax25cmp();
0) kworker-29 | 0.137 us | ax25_digi_invert();
0) kworker-29 | | ax25_find_cb() {
0) kworker-29 | 0.223 us | ax25cmp();
0) kworker-29 | 0.194 us | ax25cmp();
0) kworker-29 | 0.087 us | ax25cmp();
0) kworker-29 | 0.975 us | }
0) kworker-29 | | ax25_find_listener() {
0) kworker-29 | 0.092 us | ax25cmp();
0) kworker-29 | 0.084 us | ax25cmp();
0) kworker-29 | 0.526 us | }
0) kworker-29 | | ax25_make_new() {
0) kworker-29 | | ax25_create_cb() {
0) kworker-29 | 0.117 us | ax25_setup_timers();
0) kworker-29 | 0.478 us | }
0) kworker-29 | 1.813 us | }
0) kworker-29 | | ax25_send_control() {
0) kworker-29 | | ax25_transmit_buffer() {
0) kworker-29 | 0.086 us | ax25_addr_size();
0) kworker-29 | 0.094 us | ax25_addr_build();
0) kworker-29 | 0.081 us | ax25_fwd_dev();
0) kworker-29 | 4.854 us | }
0) kworker-29 | 5.604 us | }
0) kworker-29 | 0.108 us | ax25_cb_add();
0) kworker-29 | 0.410 us | ax25_start_heartbeat();
0) kworker-29 | 0.218 us | ax25_start_t3timer();
0) kworker-29 | 0.099 us | ax25_start_idletimer();
0) kworker-29 | + 14.957 us | }
0) kworker-29 | + 16.116 us | }
------------------------------------------
1) ax25ipd-63 => ax25d-77
------------------------------------------
1) ax25d-77 | 1.580 us | ax25_accept();
1) ax25d-77 | 0.211 us | ax25_getname();
1) ax25d-77 | 0.275 us | ax25_getname();
------------------------------------------
0) kworker-29 => axwrapp-83
------------------------------------------
0) axwrapp-83 | | ax25_sendmsg() {
0) axwrapp-83 | | ax25_output() {
0) axwrapp-83 | | ax25_kick.part.0() {
0) axwrapp-83 | | ax25_send_iframe() {
0) axwrapp-83 | 0.679 us | ax25_start_idletimer();
0) axwrapp-83 | | ax25_transmit_buffer() {
0) axwrapp-83 | 0.093 us | ax25_addr_size();
0) axwrapp-83 | 0.102 us | ax25_addr_build();
0) axwrapp-83 | 0.145 us | ax25_fwd_dev();
0) axwrapp-83 | 9.311 us | }
0) axwrapp-83 | + 10.464 us | }
0) axwrapp-83 | 0.096 us | ax25_t1timer_running();
0) axwrapp-83 | 0.332 us | ax25_stop_t3timer();
0) axwrapp-83 | 0.093 us | ax25_calculate_t1();
0) axwrapp-83 | 0.439 us | ax25_start_t1timer();
0) axwrapp-83 | + 18.839 us | }
0) axwrapp-83 | + 19.479 us | }
0) axwrapp-83 | + 22.647 us | }
------------------------------------------
0) ax25ipd-63 => axwrapp-83
------------------------------------------
0) axwrapp-83 | | ax25_sendmsg() {
0) axwrapp-83 | | ax25_output() {
0) axwrapp-83 | | ax25_kick.part.0() {
0) axwrapp-83 | | ax25_send_iframe() {
0) axwrapp-83 | 0.339 us | ax25_start_idletimer();
0) axwrapp-83 | | ax25_transmit_buffer() {
0) axwrapp-83 | 0.092 us | ax25_addr_size();
0) axwrapp-83 | 0.097 us | ax25_addr_build();
0) axwrapp-83 | 0.136 us | ax25_fwd_dev();
0) axwrapp-83 | 9.116 us | }
0) axwrapp-83 | 9.908 us | }
0) axwrapp-83 | 0.091 us | ax25_t1timer_running();
0) axwrapp-83 | + 10.810 us | }
0) axwrapp-83 | + 11.168 us | }
0) axwrapp-83 | + 14.362 us | }
------------------------------------------
0) axwrapp-83 => kworker-10
------------------------------------------
0) kworker-10 | | ax25_kiss_rcv() {
0) kworker-10 | | ax25_rcv.isra.0() {
0) kworker-10 | 0.116 us | ax25_addr_parse();
0) kworker-10 | 0.090 us | ax25_addr_size();
0) kworker-10 | 0.144 us | ax25cmp();
0) kworker-10 | 0.091 us | ax25_digi_invert();
0) kworker-10 | | ax25_find_cb() {
0) kworker-10 | 0.091 us | ax25cmp();
0) kworker-10 | 0.087 us | ax25cmp();
0) kworker-10 | 0.550 us | }
0) kworker-10 | | ax25_std_frame_in() {
0) kworker-10 | 0.105 us | ax25_decode();
0) kworker-10 | 0.117 us | ax25_validate_nr();
0) kworker-10 | | ax25_check_iframes_acked() {
0) kworker-10 | 0.670 us | ax25_frames_acked();
0) kworker-10 | | ax25_calculate_rtt() {
0) kworker-10 | 0.086 us | ax25_t1timer_running();
0) kworker-10 | 0.085 us | ax25_display_timer();
0) kworker-10 | 0.465 us | }
0) kworker-10 | 0.185 us | ax25_stop_t1timer();
0) kworker-10 | 0.358 us | ax25_start_t3timer();
0) kworker-10 | 2.186 us | }
0) kworker-10 | | ax25_kick() {
0) kworker-10 | 0.093 us | ax25_kick.part.0();
0) kworker-10 | 0.277 us | }
0) kworker-10 | 3.320 us | }
0) kworker-10 | 5.927 us | }
0) kworker-10 | 6.563 us | }
------------------------------------------
0) kworker-10 => axwrapp-83
------------------------------------------
0) axwrapp-83 | | ax25_release() {
0) axwrapp-83 | 0.158 us | ax25_clear_queues();
0) axwrapp-83 | | ax25_send_control() {
0) axwrapp-83 | | ax25_transmit_buffer() {
0) axwrapp-83 | 0.086 us | ax25_addr_size();
0) axwrapp-83 | 0.086 us | ax25_addr_build();
0) axwrapp-83 | 0.085 us | ax25_fwd_dev();
0) axwrapp-83 | 3.972 us | }
0) axwrapp-83 | 4.356 us | }
0) axwrapp-83 | 0.142 us | ax25_stop_t2timer();
0) axwrapp-83 | 0.145 us | ax25_stop_t3timer();
0) axwrapp-83 | 0.094 us | ax25_stop_idletimer();
0) axwrapp-83 | 0.093 us | ax25_calculate_t1();
0) axwrapp-83 | 0.227 us | ax25_start_t1timer();
0) axwrapp-83 | | /* netdev_put: dev=ax0 */
------------------------------------------
1) ax25d-77 => kworker-10
------------------------------------------
1) kworker-10 | | ax25_kiss_rcv() {
1) kworker-10 | | ax25_rcv.isra.0() {
1) kworker-10 | 0.188 us | ax25_addr_parse();
1) kworker-10 | 0.089 us | ax25_addr_size();
1) kworker-10 | 0.177 us | ax25cmp();
1) kworker-10 | 0.101 us | ax25_digi_invert();
1) kworker-10 | | ax25_find_cb() {
1) kworker-10 | 0.092 us | ax25cmp();
1) kworker-10 | 0.085 us | ax25cmp();
1) kworker-10 | 0.731 us | }
1) kworker-10 | | ax25_std_frame_in() {
1) kworker-10 | 0.098 us | ax25_decode();
1) kworker-10 | | ax25_disconnect() {
1) kworker-10 | 0.311 us | ax25_stop_t1timer();
1) kworker-10 | 0.093 us | ax25_stop_t2timer();
1) kworker-10 | 0.093 us | ax25_stop_t3timer();
1) kworker-10 | 0.086 us | ax25_stop_idletimer();
1) kworker-10 | 0.415 us | ax25_link_failed();
1) kworker-10 | 1.778 us | }
1) kworker-10 | 0.090 us | ax25_kick();
1) kworker-10 | 2.523 us | }
1) kworker-10 | + 10.078 us | }
1) kworker-10 | + 12.054 us | }
0) axwrapp-83 | * 13227.13 us | }
------------------------------------------
0) axwrapp-83 => <idle>-0
------------------------------------------
0) <idle>-0 | | ax25_heartbeat_expiry() {
0) <idle>-0 | | ax25_std_heartbeat_expiry() {
0) <idle>-0 | | ax25_destroy_socket() {
0) <idle>-0 | 0.249 us | ax25_cb_del();
0) <idle>-0 | 0.107 us | ax25_stop_heartbeat();
0) <idle>-0 | 0.140 us | ax25_stop_t1timer();
0) <idle>-0 | 0.090 us | ax25_stop_t2timer();
0) <idle>-0 | 0.146 us | ax25_stop_t3timer();
0) <idle>-0 | 0.089 us | ax25_stop_idletimer();
0) <idle>-0 | 0.740 us | ax25_clear_queues();
0) <idle>-0 | 2.703 us | }
0) <idle>-0 | 0.901 us | ax25_free_sock();
0) <idle>-0 | 5.219 us | }
0) <idle>-0 | 7.220 us | }
Note the call to netdev_put() in ax25_release(), and note that there is
never a corresponding call to netdev_hold(), which is why we end up with
a refcount problem.
My original patch corrected this by adding the call to netdev_hold()
right next to the ax25_cb_add() in ax25_rcv(), which solves this
problem. If it seems weird to have this login in ax25_rcv, we could move
it to ax25_accept, right around line 1430 [3]; that would look
something like:
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 8df2b00526e..e1ce1eeed7d 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1383,6 +1383,8 @@ static int ax25_accept(struct socket *sock, struct socket *newsock, int flags,
DEFINE_WAIT(wait);
struct sock *sk;
int err = 0;
+ ax25_cb *ax25;
+ ax25_dev *ax25_dev;
if (sock->state != SS_UNCONNECTED)
return -EINVAL;
@@ -1436,6 +1438,10 @@ static int ax25_accept(struct socket *sock, struct socket *newsock, int flags,
kfree_skb(skb);
sk_acceptq_removed(sk);
newsock->state = SS_CONNECTED;
+ ax25 = sk_to_ax25(newsk);
+ ax25_dev = ax25->ax25_dev;
+ netdev_hold(ax25_dev->dev, &ax25->dev_tracker, GFP_ATOMIC);
+ ax25_dev_hold(ax25_dev);
out:
release_sock(sk);
This has the advantage that now ax25_accept() mirrors the behavior of
ax25_bind() in terms of declaring a reference on the ax25 device, which
makes a certain amount of sense.
This seems to work out just as well as the previous patch; both
eliminiate the refcount imbalance. With the above diff in place, the
function graph looks like [4], with the call to netdev_hold() in
ax25_accept:
# tracer: function_graph
#
# CPU TASK/PID DURATION FUNCTION CALLS
# | | | | | | | | |
------------------------------------------
1) <idle>-0 => kworker-29
------------------------------------------
1) kworker-29 | | ax25_kiss_rcv() {
1) kworker-29 | | ax25_rcv.isra.0() {
1) kworker-29 | 0.152 us | ax25_addr_parse();
1) kworker-29 | 0.111 us | ax25_addr_size();
1) kworker-29 | 0.149 us | ax25cmp();
1) kworker-29 | 0.139 us | ax25_digi_invert();
1) kworker-29 | | ax25_find_cb() {
1) kworker-29 | 0.229 us | ax25cmp();
1) kworker-29 | 0.243 us | ax25cmp();
1) kworker-29 | 0.088 us | ax25cmp();
1) kworker-29 | 1.076 us | }
1) kworker-29 | | ax25_find_listener() {
1) kworker-29 | 0.092 us | ax25cmp();
1) kworker-29 | 0.085 us | ax25cmp();
1) kworker-29 | 0.632 us | }
1) kworker-29 | | ax25_make_new() {
1) kworker-29 | | ax25_create_cb() {
1) kworker-29 | 0.132 us | ax25_setup_timers();
1) kworker-29 | 0.547 us | }
1) kworker-29 | 1.965 us | }
1) kworker-29 | | ax25_send_control() {
1) kworker-29 | | ax25_transmit_buffer() {
1) kworker-29 | 0.090 us | ax25_addr_size();
1) kworker-29 | 0.089 us | ax25_addr_build();
1) kworker-29 | 0.084 us | ax25_fwd_dev();
1) kworker-29 | 5.682 us | }
1) kworker-29 | 6.037 us | }
1) kworker-29 | 0.100 us | ax25_cb_add();
1) kworker-29 | 0.449 us | ax25_start_heartbeat();
1) kworker-29 | 0.200 us | ax25_start_t3timer();
1) kworker-29 | 0.104 us | ax25_start_idletimer();
1) kworker-29 | + 16.757 us | }
1) kworker-29 | + 18.240 us | }
------------------------------------------
0) ax25ipd-63 => ax25d-77
------------------------------------------
0) ax25d-77 | | ax25_accept() {
0) ax25d-77 | | /* netdev_hold: dev=ax0 */
0) ax25d-77 | 2.067 us | }
0) ax25d-77 | 0.140 us | ax25_getname();
0) ax25d-77 | 0.189 us | ax25_getname();
------------------------------------------
1) kworker-29 => axwrapp-82
------------------------------------------
1) axwrapp-82 | | ax25_sendmsg() {
1) axwrapp-82 | | ax25_output() {
1) axwrapp-82 | | ax25_kick.part.0() {
1) axwrapp-82 | | ax25_send_iframe() {
1) axwrapp-82 | 0.343 us | ax25_start_idletimer();
1) axwrapp-82 | | ax25_transmit_buffer() {
1) axwrapp-82 | 0.119 us | ax25_addr_size();
1) axwrapp-82 | 0.118 us | ax25_addr_build();
1) axwrapp-82 | 0.156 us | ax25_fwd_dev();
1) axwrapp-82 | 9.965 us | }
1) axwrapp-82 | + 10.844 us | }
1) axwrapp-82 | 0.107 us | ax25_t1timer_running();
1) axwrapp-82 | 0.345 us | ax25_stop_t3timer();
1) axwrapp-82 | 0.112 us | ax25_calculate_t1();
1) axwrapp-82 | 0.499 us | ax25_start_t1timer();
1) axwrapp-82 | + 21.187 us | }
1) axwrapp-82 | + 21.620 us | }
1) axwrapp-82 | + 25.665 us | }
------------------------------------------
1) ax25ipd-63 => axwrapp-82
------------------------------------------
1) axwrapp-82 | | ax25_sendmsg() {
1) axwrapp-82 | | ax25_output() {
1) axwrapp-82 | | ax25_kick.part.0() {
1) axwrapp-82 | | ax25_send_iframe() {
1) axwrapp-82 | 1.027 us | ax25_start_idletimer();
1) axwrapp-82 | | ax25_transmit_buffer() {
1) axwrapp-82 | 0.505 us | ax25_addr_size();
1) axwrapp-82 | 0.522 us | ax25_addr_build();
1) axwrapp-82 | 0.550 us | ax25_fwd_dev();
1) axwrapp-82 | + 32.110 us | }
1) axwrapp-82 | + 35.118 us | }
1) axwrapp-82 | 0.526 us | ax25_t1timer_running();
1) axwrapp-82 | + 38.770 us | }
1) axwrapp-82 | + 40.359 us | }
1) axwrapp-82 | + 50.019 us | }
------------------------------------------
0) ax25d-77 => kworker-10
------------------------------------------
0) kworker-10 | | ax25_kiss_rcv() {
0) kworker-10 | | ax25_rcv.isra.0() {
0) kworker-10 | 0.632 us | ax25_addr_parse();
0) kworker-10 | 0.467 us | ax25_addr_size();
0) kworker-10 | 0.600 us | ax25cmp();
0) kworker-10 | 0.506 us | ax25_digi_invert();
0) kworker-10 | | ax25_find_cb() {
0) kworker-10 | 0.572 us | ax25cmp();
0) kworker-10 | 0.484 us | ax25cmp();
0) kworker-10 | 2.737 us | }
0) kworker-10 | | ax25_std_frame_in() {
0) kworker-10 | 0.569 us | ax25_decode();
0) kworker-10 | 0.607 us | ax25_validate_nr();
0) kworker-10 | | ax25_check_iframes_acked() {
0) kworker-10 | 4.171 us | ax25_frames_acked();
0) kworker-10 | | ax25_calculate_rtt() {
0) kworker-10 | 0.442 us | ax25_t1timer_running();
0) kworker-10 | 0.500 us | ax25_display_timer();
0) kworker-10 | 2.463 us | }
0) kworker-10 | 0.889 us | ax25_stop_t1timer();
0) kworker-10 | 1.559 us | ax25_start_t3timer();
0) kworker-10 | + 11.534 us | }
0) kworker-10 | | ax25_kick() {
0) kworker-10 | 0.480 us | ax25_kick.part.0();
0) kworker-10 | 1.467 us | }
0) kworker-10 | + 16.917 us | }
0) kworker-10 | + 27.062 us | }
0) kworker-10 | + 30.583 us | }
1) axwrapp-82 | | ax25_release() {
1) axwrapp-82 | 0.932 us | ax25_clear_queues();
1) axwrapp-82 | | ax25_send_control() {
1) axwrapp-82 | | ax25_transmit_buffer() {
1) axwrapp-82 | 0.516 us | ax25_addr_size();
1) axwrapp-82 | 0.484 us | ax25_addr_build();
1) axwrapp-82 | 0.451 us | ax25_fwd_dev();
1) axwrapp-82 | + 91.128 us | }
1) axwrapp-82 | + 94.596 us | }
1) axwrapp-82 | 1.554 us | ax25_stop_t2timer();
1) axwrapp-82 | 1.151 us | ax25_stop_t3timer();
1) axwrapp-82 | 0.758 us | ax25_stop_idletimer();
1) axwrapp-82 | 0.676 us | ax25_calculate_t1();
1) axwrapp-82 | 1.642 us | ax25_start_t1timer();
1) axwrapp-82 | | /* netdev_put: dev=ax0 */
1) axwrapp-82 | ! 123.696 us | }
------------------------------------------
1) axwrapp-82 => kworker-10
------------------------------------------
1) kworker-10 | | ax25_kiss_rcv() {
1) kworker-10 | | ax25_rcv.isra.0() {
1) kworker-10 | 0.568 us | ax25_addr_parse();
1) kworker-10 | 0.428 us | ax25_addr_size();
1) kworker-10 | 0.504 us | ax25cmp();
1) kworker-10 | 0.453 us | ax25_digi_invert();
1) kworker-10 | | ax25_find_cb() {
1) kworker-10 | 0.425 us | ax25cmp();
1) kworker-10 | 0.429 us | ax25cmp();
1) kworker-10 | 2.511 us | }
1) kworker-10 | | ax25_std_frame_in() {
1) kworker-10 | 0.518 us | ax25_decode();
1) kworker-10 | | ax25_disconnect() {
1) kworker-10 | 0.840 us | ax25_stop_t1timer();
1) kworker-10 | 0.459 us | ax25_stop_t2timer();
1) kworker-10 | 0.444 us | ax25_stop_t3timer();
1) kworker-10 | 0.413 us | ax25_stop_idletimer();
1) kworker-10 | 1.123 us | ax25_link_failed();
1) kworker-10 | 6.202 us | }
1) kworker-10 | 0.428 us | ax25_kick();
1) kworker-10 | 9.214 us | }
1) kworker-10 | + 18.606 us | }
1) kworker-10 | + 21.675 us | }
------------------------------------------
0) kworker-10 => <idle>-0
------------------------------------------
0) <idle>-0 | | ax25_heartbeat_expiry() {
0) <idle>-0 | | ax25_std_heartbeat_expiry() {
0) <idle>-0 | | ax25_destroy_socket() {
0) <idle>-0 | 0.202 us | ax25_cb_del();
0) <idle>-0 | 0.116 us | ax25_stop_heartbeat();
0) <idle>-0 | 0.100 us | ax25_stop_t1timer();
0) <idle>-0 | 0.101 us | ax25_stop_t2timer();
0) <idle>-0 | 0.158 us | ax25_stop_t3timer();
0) <idle>-0 | 0.101 us | ax25_stop_idletimer();
0) <idle>-0 | 0.321 us | ax25_clear_queues();
0) <idle>-0 | 2.483 us | }
0) <idle>-0 | 0.531 us | ax25_free_sock();
0) <idle>-0 | 4.912 us | }
0) <idle>-0 | 6.545 us | }
[1]: https://gist.github.com/larsks/b658c0bd766648b16c31c8ed0fc1dc1f#file-0001-trace-netdev_hold-and-netdev_put-patch
[2]: https://gist.github.com/larsks/b658c0bd766648b16c31c8ed0fc1dc1f#file-trace1-inbound-txt
[3]: https://github.com/torvalds/linux/blob/7367539ad4b0f8f9b396baf02110962333719a48/net/ax25/af_ax25.c#L1430
[4]: https://gist.github.com/larsks/b658c0bd766648b16c31c8ed0fc1dc1f#file-trace2-inbound-txt
--
Lars Kellogg-Stedman <[email protected]> | larsks @ {irc,twitter,github}
http://blog.oddbit.com/ | N1LKS
On Sat, May 04, 2024 at 06:16:14PM GMT, Lars Kellogg-Stedman wrote:
> My original patch corrected this by adding the call to netdev_hold()
> right next to the ax25_cb_add() in ax25_rcv(), which solves this
> problem. If it seems weird to have this login in ax25_rcv, we could move
> it to ax25_accept, right around line 1430 [3]; that would look
> something like:
The same patch applies cleanly against the Raspberry Pi 6.6.30 kernel,
and clears up the frequeny crashes I was experiencing in that
environment as well.
--
Lars Kellogg-Stedman <[email protected]> | larsks @ {irc,twitter,github}
http://blog.oddbit.com/ | N1LKS
On Fri, May 03, 2024 at 11:36:37PM +0300, Dan Carpenter wrote:
> Could you test this diff?
>
> regards,
> dan carpenter
>
> diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
> index 558e158c98d0..a7f96a4ceff4 100644
> --- a/net/ax25/af_ax25.c
> +++ b/net/ax25/af_ax25.c
> @@ -1129,8 +1129,10 @@ static int ax25_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
> /*
> * User already set interface with SO_BINDTODEVICE
> */
> - if (ax25->ax25_dev != NULL)
> + if (ax25->ax25_dev != NULL) {
> + ax25_dev_hold(ax25->ax25_dev);
> goto done;
> + }
>
> if (addr_len > sizeof(struct sockaddr_ax25) && addr->fsa_ax25.sax25_ndigis == 1) {
> if (ax25cmp(&addr->fsa_digipeater[0], &null_ax25_address) != 0 &&
This commit is wrong.
regards,
dan carpenter
On Mon, May 06, 2024 at 11:18:06PM -0400, Lars Kellogg-Stedman wrote:
> On Sat, May 04, 2024 at 06:16:14PM GMT, Lars Kellogg-Stedman wrote:
> > My original patch corrected this by adding the call to netdev_hold()
> > right next to the ax25_cb_add() in ax25_rcv(), which solves this
> > problem. If it seems weird to have this login in ax25_rcv, we could move
> > it to ax25_accept, right around line 1430 [3]; that would look
> > something like:
>
> The same patch applies cleanly against the Raspberry Pi 6.6.30 kernel,
> and clears up the frequeny crashes I was experiencing in that
> environment as well.
I have reviewed this code some more. My theory is:
ax25_dev_device_up() <- sets refcount to 1
ax25_dev_device_down() <- set refcount to 0 and frees it
If the refcount is not 1 at ax25_dev_device_down() then something is
screwed up. So why do we even need more refcounting than that? But
apparently we do. I don't really understand networking that well so
maybe we can have lingering connections after the device is down.
So the next rule is if we set ax25->ax25_dev from NULL to non-NULL then
bump the refcount and decrement it if we set it back to NULL or we free
ax25. Right now that's happening in ax25_bind() and ax25_release(). And
also in ax25_kill_by_device() but not consistently.
But it needs to happen *everywhere* we set ax25->ax25_dev and we need to
decrement it when ax25 is freed in ax25_cb_put(). My patch is similar
to yours in that every ax25_rcv() will now bump the reference through
calling ax25_fillin_cb() or ax25_make_new(). The send path also bumps
the reference.
There are a few questions I don't know how to answer. I added two
-EBUSY paths to this patch. I'm not sure if this is correct.
Second, I don't understand the netdev_put(ax25_dev->dev, &s->dev_tracker);
stuff. Maybe that should be done in ax25_dev_hold/put().
This patch might not work because of the netdev_hold/put() thing...
I used the Smatch cross function database to show where ax25->ax25_dev
is set to NULL/non-NULL.
$ smdb.py where ax25_cb ax25_dev | grep -v "min-max"
net/ax25/ax25_route.c | ax25_rt_autobind | (struct ax25_cb)->ax25_dev | 0-u64max
net/ax25/af_ax25.c | ax25_kill_by_device | (struct ax25_cb)->ax25_dev | 0-u64max
net/ax25/af_ax25.c | ax25_fillin_cb | (struct ax25_cb)->ax25_dev | 0-u64max
net/ax25/af_ax25.c | ax25_setsockopt | (struct ax25_cb)->ax25_dev | 0-u64max
net/ax25/af_ax25.c | ax25_make_new | (struct ax25_cb)->ax25_dev | 0-u64max
net/ax25/af_ax25.c | ax25_bind | (struct ax25_cb)->ax25_dev | 4096-ptr_max
net/ax25/ax25_in.c | ax25_rcv | (struct ax25_cb)->ax25_dev | 0-u64max
net/ax25/ax25_out.c | ax25_send_frame | (struct ax25_cb)->ax25_dev | 0-u64max
regards,
dan carpenter
diff --git a/include/net/ax25.h b/include/net/ax25.h
index eb9cee8252c8..2cc94352b13d 100644
--- a/include/net/ax25.h
+++ b/include/net/ax25.h
@@ -275,25 +275,30 @@ static inline struct ax25_cb *sk_to_ax25(const struct sock *sk)
#define ax25_cb_hold(__ax25) \
refcount_inc(&((__ax25)->refcount))
-static __inline__ void ax25_cb_put(ax25_cb *ax25)
+static inline ax25_dev *ax25_dev_hold(ax25_dev *ax25_dev)
{
- if (refcount_dec_and_test(&ax25->refcount)) {
- kfree(ax25->digipeat);
- kfree(ax25);
- }
+ if (ax25_dev)
+ refcount_inc(&ax25_dev->refcount);
+ return ax25_dev;
}
-static inline void ax25_dev_hold(ax25_dev *ax25_dev)
+static inline void ax25_dev_put(ax25_dev *ax25_dev)
{
- refcount_inc(&ax25_dev->refcount);
+ if (ax25_dev && refcount_dec_and_test(&ax25_dev->refcount)) {
+ kfree(ax25_dev);
+ }
}
-static inline void ax25_dev_put(ax25_dev *ax25_dev)
+static __inline__ void ax25_cb_put(ax25_cb *ax25)
{
- if (refcount_dec_and_test(&ax25_dev->refcount)) {
- kfree(ax25_dev);
+ if (refcount_dec_and_test(&ax25->refcount)) {
+ if (ax25->ax25_dev)
+ ax25_dev_put(ax25->ax25_dev);
+ kfree(ax25->digipeat);
+ kfree(ax25);
}
}
+
static inline __be16 ax25_type_trans(struct sk_buff *skb, struct net_device *dev)
{
skb->dev = dev;
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 9169efb2f43a..4d1ab296d52c 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -92,6 +92,7 @@ static void ax25_kill_by_device(struct net_device *dev)
spin_unlock_bh(&ax25_list_lock);
ax25_disconnect(s, ENETUNREACH);
s->ax25_dev = NULL;
+ ax25_dev_put(ax25_dev);
ax25_cb_del(s);
spin_lock_bh(&ax25_list_lock);
goto again;
@@ -101,11 +102,8 @@ static void ax25_kill_by_device(struct net_device *dev)
lock_sock(sk);
ax25_disconnect(s, ENETUNREACH);
s->ax25_dev = NULL;
- if (sk->sk_socket) {
- netdev_put(ax25_dev->dev,
- &s->dev_tracker);
- ax25_dev_put(ax25_dev);
- }
+ netdev_put(ax25_dev->dev, &s->dev_tracker);
+ ax25_dev_put(ax25_dev);
ax25_cb_del(s);
release_sock(sk);
spin_lock_bh(&ax25_list_lock);
@@ -496,6 +494,7 @@ void ax25_fillin_cb(ax25_cb *ax25, ax25_dev *ax25_dev)
ax25->ax25_dev = ax25_dev;
if (ax25->ax25_dev != NULL) {
+ ax25_dev_hold(ax25->ax25_dev);
ax25_fillin_cb_from_dev(ax25, ax25_dev);
return;
}
@@ -685,6 +684,11 @@ static int ax25_setsockopt(struct socket *sock, int level, int optname,
break;
}
+ if (ax25->ax25_dev) {
+ rtnl_unlock();
+ res = -EBUSY;
+ break;
+ }
ax25->ax25_dev = ax25_dev_ax25dev(dev);
if (!ax25->ax25_dev) {
rtnl_unlock();
@@ -961,7 +965,7 @@ struct sock *ax25_make_new(struct sock *osk, struct ax25_dev *ax25_dev)
ax25->paclen = oax25->paclen;
ax25->window = oax25->window;
- ax25->ax25_dev = ax25_dev;
+ ax25->ax25_dev = ax25_dev_hold(ax25_dev);
ax25->source_addr = oax25->source_addr;
if (oax25->digipeat != NULL) {
@@ -995,6 +999,11 @@ static int ax25_release(struct socket *sock)
sock_orphan(sk);
ax25 = sk_to_ax25(sk);
ax25_dev = ax25->ax25_dev;
+ /*
+ * The ax25_destroy_socket() function decrements the reference but we
+ * need to keep a reference until the end of the function.
+ */
+ ax25_dev_hold(ax25_dev);
if (sk->sk_type == SOCK_SEQPACKET) {
switch (ax25->state) {
@@ -1147,6 +1156,12 @@ static int ax25_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
if (ax25_dev) {
ax25_fillin_cb(ax25, ax25_dev);
+ /*
+ * both ax25_addr_ax25dev() and ax25_fillin_cb() take a
+ * reference but we only want to take one reference so drop
+ * the extra reference.
+ */
+ ax25_dev_put(ax25_dev);
netdev_hold(ax25_dev->dev, &ax25->dev_tracker, GFP_ATOMIC);
}
diff --git a/net/ax25/ax25_route.c b/net/ax25/ax25_route.c
index b7c4d656a94b..d7f6d9f4f20c 100644
--- a/net/ax25/ax25_route.c
+++ b/net/ax25/ax25_route.c
@@ -406,6 +406,10 @@ int ax25_rt_autobind(ax25_cb *ax25, ax25_address *addr)
ax25_route_lock_unuse();
return -EHOSTUNREACH;
}
+ if (ax25->ax25_dev) {
+ err = -EBUSY;
+ goto put;
+ }
if ((ax25->ax25_dev = ax25_dev_ax25dev(ax25_rt->dev)) == NULL) {
err = -EHOSTUNREACH;
goto put;
On Tue, 7 May 2024 11:08:14 +0300 Dan Carpenter wrote:
> I have reviewed this code some more. My theory is:
>
> ax25_dev_device_up() <- sets refcount to 1
> ax25_dev_device_down() <- set refcount to 0 and frees it
>
> If the refcount is not 1 at ax25_dev_device_down() then something is
> screwed up. So why do we even need more refcounting than that? But
> apparently we do. I don't really understand networking that well so
> maybe we can have lingering connections after the device is down.
We do need more reference count. Because there is a race condition
between ax25_bind() and the cleanup routine.
The cleanup routine is consisted of three parts: ax25_kill_by_device(),
ax25_rt_device_down() and ax25_dev_device_down(). The ax25_kill_by_device()
is used to cleanup the connections and the ax25_dev_device_down() is used
to cleanup the device. If we call ax25_bind() and ax25_connect() between
the window of ax25_kill_by_device() and ax25_dev_device_down(), the ax25_dev
is freed in ax25_dev_device_down(). When we call ax25_release() to release
the connections, the UAF bugs will happen.
Best regards,
Duoming Zhou
On Tue, 7 May 2024 11:08:14 +0300 Dan Carpenter wrote:
> diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
> index 9169efb2f43a..4d1ab296d52c 100644
> --- a/net/ax25/af_ax25.c
> +++ b/net/ax25/af_ax25.c
> @@ -92,6 +92,7 @@ static void ax25_kill_by_device(struct net_device *dev)
> spin_unlock_bh(&ax25_list_lock);
> ax25_disconnect(s, ENETUNREACH);
> s->ax25_dev = NULL;
> + ax25_dev_put(ax25_dev);
> ax25_cb_del(s);
> spin_lock_bh(&ax25_list_lock);
> goto again;
> @@ -101,11 +102,8 @@ static void ax25_kill_by_device(struct net_device *dev)
> lock_sock(sk);
> ax25_disconnect(s, ENETUNREACH);
> s->ax25_dev = NULL;
> - if (sk->sk_socket) {
> - netdev_put(ax25_dev->dev,
> - &s->dev_tracker);
> - ax25_dev_put(ax25_dev);
> - }
> + netdev_put(ax25_dev->dev, &s->dev_tracker);
> + ax25_dev_put(ax25_dev);
We should not decrease the refcount without checking "if (sk->sk_socket)", because
there is a race condition between ax25_kill_by_device() and ax25_release(), if we
decrease the refcount in ax25_release(), we should not decrease it here, otherwise
the refcount underflow will happen.
Best regards,
Duoming Zhou
On Wed, 1 May 2024 21:29:16 -0400 Lars Kellogg-Stedman wrote:
> Assume we have the following two interfaces configured on a system:
>
> $ cat /etc/ax25/axports
> udp0 test0-0 9600 255 2 axudp0
> udp1 test0-1 9600 255 2 axudp1
>
> And we have ax25d listening on both interfaces:
>
> [udp0]
> default * * * * * * - root /usr/sbin/axwrapper axwrapper -- /bin/sh sh /etc/ax25/example-output.sh
> [udp1]
> default * * * * * * - root /usr/sbin/axwrapper axwrapper -- /bin/sh sh /etc/ax25/example-output.sh
>
> Using the 'ax-devs' and 'ax-sockets' gdb commands shown at the end of
> this message, we start with:
>
> (gdb) ax-devs
> ax1 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
> ax0 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
> (gdb) ax-sockets
> 0xffff8881002b6800 if:ax1 state:0 refcnt:2 dev_tracker:0xffff888100ded200
> 0xffff888101ac4e00 if:ax0 state:0 refcnt:2 dev_tracker:0xffff888100dec4c0
>
> We initiate a connection from ax0 to ax1:
>
> call -r udp0 test0-1
>
> When we first enter ax25_rcv, we have:
>
> (gdb) ax-devs
> ax1 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
> ax0 ax_refcnt:3 dev_refcnt:10 dev_untracked:1 dev_notrack:1
> (gdb) ax-sockets
> 0xffff888101ac8000 if:ax0 state:1 refcnt:2 dev_tracker:0xffff888100dedb80
> 0xffff8881002b6800 if:ax1 state:0 refcnt:2 dev_tracker:0xffff888100ded200
> 0xffff888101ac4e00 if:ax0 state:0 refcnt:2 dev_tracker:0xffff888100dec4c0
>
> After we reach line 413 (in net/ax25/ax25_in.c) and add a new control
> block:
>
> ax25_cb_add(ax25)
>
> We have:
>
> (gdb) ax-devs
> ax1 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
> ax0 ax_refcnt:3 dev_refcnt:10 dev_untracked:1 dev_notrack:1
> (gdb) ax-sockets
> 0xffff88810245ac00 if:ax1 state:3 refcnt:2 dev_tracker:0x0 <fixed_percpu_data>
> 0xffff88810245ba00 if:ax0 state:1 refcnt:2 dev_tracker:0xffff88810136c800
> 0xffff888100c79e00 if:ax1 state:0 refcnt:2 dev_tracker:0xffff88810136c6e0
> 0xffff8881018e9800 if:ax0 state:0 refcnt:2 dev_tracker:0xffff88810170c860
>
> Note that (a) ax25->dev_tracker is NULL, and (b) we have incremeted the
> refcount on ax0 (the source interface), but not on ax1 (the destination
> interface). When we call ax25_release for this control block, we get to:
>
> netdev_put(ax25_dev->dev, &ax25->dev_tracker);
> ax25_dev_put(ax25_dev);
>
> With:
>
> (gdb) ax-devs
> ax1 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
> ax0 ax_refcnt:3 dev_refcnt:10 dev_untracked:1 dev_notrack:1
>
> After the calls to netdev_put() and ax25_dev_put(), we have:
>
> (gdb) ax-devs
> ax1 ax_refcnt:1 dev_refcnt:8 dev_untracked:-1073741824 dev_notrack:1
> ax0 ax_refcnt:2 dev_refcnt:9 dev_untracked:1 dev_notrack:1
>
> You can see that (a) ax25_dev->dev->refcnt_tracker->untracked is now
> invalid, and ax25_dev->dev->dev_refcnt is in trouble: it decrements by
> one for each closed connection, even though it was never incremented
> when we accepted the connection. The underflow in
> ...refcnt_tracker->untracked yields the traceback with:
>
> refcount_t: decrement hit 0; leaking memory.
>
> Additional connections will eventually trigger more problems; we will
> ultimately underflow ax25_dev->dev->dev_refcnt, but we may also run into
> memory corruption because of the invalid tracker data, resulting in:
>
> BUG: unable to handle page fault for address: 00000010000003b0
Do you know how to trigger this bug? Could you share the POC?
Best regards,
Duoming Zhou
Dan,
Apologies, I missed the patch when you first posted it. Thanks for the
pointer.
On Tue, May 07, 2024 at 11:08:14AM GMT, Dan Carpenter wrote:
> This patch might not work because of the netdev_hold/put() thing...
I think that's the case. It's the netdev_hold/netdev_put imbalance that
is causing the kernel issues; with your patch applied, we still see a
failure in ax25_release:
refcount_t: decrement hit 0; leaking memory.
The patch I've posted resolves this issue and runs without any errors.
The complete trace is:
------------[ cut here ]------------
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 1 PID: 88 at lib/refcount.c:31 refcount_warn_saturate+0x109/0x120
CPU: 1 PID: 88 Comm: axwrapper Not tainted 6.9.0-ax25-09869-g6d35085a1f38 #140
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
RIP: 0010:refcount_warn_saturate+0x109/0x120
Code: f2 33 82 c6 05 34 62 f2 00 01 e8 22 14 9d ff 0f 0b 5d c3 cc cc cc cc 48 c7 c7 58 f2 33 82 c6 05 17 62 f2 00 01 e8 07 14 9d ff <0f> 0b 5d c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
RSP: 0018:ffffc90000447d00 EFLAGS: 00010292
RAX: 000000000000002c RBX: ffff888101142510 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffc90000447b88 RDI: 00000000ffffefff
The system is gRBP: ffffc90000447d00 R08: 00000000ffffefff R09: ffffffff824a4b88
R10: ffffffff8244cbe0 R11: ffffc90000447ad8 R12: 0000000000000000
oing down NOW!R13: ffffc90000447d18 R14: ffff888101142000 R15: ffff88810222a0c0
FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sent SIGTERM toCR2: 000055d73c5f5040 CR3: 000000000242c000 CR4: 00000000000006b0
Call Trace:
all processes <TASK>
? show_regs.part.0+0x22/0x30
? show_regs.cold+0x8/0xd
? refcount_warn_saturate+0x109/0x120
? __warn.cold+0x97/0xd5
? refcount_warn_saturate+0x109/0x120
? report_bug+0x114/0x160
? console_unlock+0x55/0xd0
? handle_bug+0x42/0x80
? exc_invalid_op+0x1c/0x70
? asm_exc_invalid_op+0x1f/0x30
? refcount_warn_saturate+0x109/0x120
ref_tracker_free+0x163/0x170
ax25_release+0x129/0x3c0
sock_close+0x45/0xb0
__fput+0x94/0x2a0
____fput+0x12/0x20
task_work_run+0x61/0x90
do_exit+0x2f5/0x9f0
? handle_mm_fault+0x197/0x300
do_group_exit+0x38/0x90
__x64_sys_exit_group+0x1c/0x20
x64_sys_call+0x1269/0x1d00
do_syscall_64+0x55/0x120
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f8e7711abce
Code: Unable to access opcode bytes at 0x7f8e7711aba4.
RSP: 002b:00007ffed9c905d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f8e7711abce
RDX: 00007f8e7711ae66 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 00007ffed9c90628 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000055d73c5f2030
R13: 00007ffed9c90658 R14: 0000000000000000 R15: 00007ffed9c90620
</TASK>
---[ end trace 0000000000000000 ]---
--
Lars Kellogg-Stedman <[email protected]> | larsks @ {irc,twitter,github}
http://blog.oddbit.com/ | N1LKS