Hello,
memory leak in gsmld_ioctl was reported by syzbot, and this issue seems to be
raised from race condition involving gsm->dlci[addr].
https://syzkaller.appspot.com/bug?extid=6e3e8f30f269f5028e5d
When dlci object is checked and initialized in gsmld_ioctl(), it may be
assigned multiple times because there is no lock which guards entering
the critical section in gsm_dlci_alloc() from multiple threads.
For example, when multiple ioctl() are called concurrently, the following
scenario is possible:
Thread 0 Thread 1
ioctl(GSMIOC_SETCONF_DLCI) ioctl(GSMIOC_GETCONF_DLCI)
gsmld_ioctl(): gsmld_ioctl():
dlci = gsm->dlci[addr];
dlci = gsm->dlci[addr];
if (!dlci) {
gsm_dlci_alloc(gsm, addr):
...
gsm->dlci[addr] = dlci;
if (!dlci) {
gsm_dlci_alloc(gsm, addr):
...
// overwritten & memory leak
gsm->dlci[addr] = dlci;
We think either (1) gsm_dlci_alloc() should hold a lock(mutex) and do
internal check about whether gsm->dlci[addr] is NUll or not, OR
(2) all callers of gsm_dlci_alloc() should hold gsm->mutex and check
whether gsm->dlci[addr] is NUll or not (like gsmtty_install()).
Could you check this? If it makes sense, we will write a patch following
one of the suggestions.
Best Regards,
Yewon Choi
Hello & thanks.
To: Daniel
On 16. 04. 24, 12:57, Yewon Choi wrote:
> Hello,
>
> memory leak in gsmld_ioctl was reported by syzbot, and this issue seems to be
> raised from race condition involving gsm->dlci[addr].
>
> https://syzkaller.appspot.com/bug?extid=6e3e8f30f269f5028e5d
>
> When dlci object is checked and initialized in gsmld_ioctl(), it may be
> assigned multiple times because there is no lock which guards entering
> the critical section in gsm_dlci_alloc() from multiple threads.
>
> For example, when multiple ioctl() are called concurrently, the following
> scenario is possible:
>
> Thread 0 Thread 1
> ioctl(GSMIOC_SETCONF_DLCI) ioctl(GSMIOC_GETCONF_DLCI)
>
> gsmld_ioctl(): gsmld_ioctl():
> dlci = gsm->dlci[addr];
> dlci = gsm->dlci[addr];
> if (!dlci) {
> gsm_dlci_alloc(gsm, addr):
> ...
> gsm->dlci[addr] = dlci;
> if (!dlci) {
> gsm_dlci_alloc(gsm, addr):
> ...
> // overwritten & memory leak
> gsm->dlci[addr] = dlci;
>
>
> We think either (1) gsm_dlci_alloc() should hold a lock(mutex) and do
> internal check about whether gsm->dlci[addr] is NUll or not, OR
> (2) all callers of gsm_dlci_alloc() should hold gsm->mutex and check
> whether gsm->dlci[addr] is NUll or not (like gsmtty_install()).
>
> Could you check this? If it makes sense, we will write a patch following
> one of the suggestions.
>
> Best Regards,
> Yewon Choi
--
js
> We think either (1) gsm_dlci_alloc() should hold a lock(mutex) and do
> internal check about whether gsm->dlci[addr] is NUll or not, OR
> (2) all callers of gsm_dlci_alloc() should hold gsm->mutex and check
> whether gsm->dlci[addr] is NUll or not (like gsmtty_install()).
>
> Could you check this? If it makes sense, we will write a patch
> following one of the suggestions.
We are currently dealing with multiple race conditions in the n_gsm. Not
only for the syzkaller reports but in recent exploits for example, too.
I believe we need an overall concept/solution for these.
Currently, the following actors can race:
- ioctl (e.g. resetting the mux or one of its DLCIs)
- ldisc callbacks (e.g. receiving SABM or DISC in gsm_queue())
- tty callbacks (e.g. gsmtty_hangup())
- internal write task (gsmld_write_task())
- internal timers (e.g. gsm_control_keep_alive())
- driver removal
Each with another and ioctl's from multiple threads.
Guarding these is not trivial for the following reasons:
- execution context may not allow sleep (timers, write task, tty callbacks?)
- creating an atomic section may conflict inner sleeps (e.g. ioctl)
- dead-locking needs to be considered
- locking may introduce high performance bottlenecks
Still, not guarding one of the racing actors does not appears to be
possible as I see it.
Unfortunately, for the same reason the sleep while atomic issue when using
a console on a virtual tty has not been fixed yet, I have no solution at
hand. We are dealing with a quite complex situation here.
Nevertheless, creating many small patches here and there should not be our
solution for obvious reasons. This does not give a complete solution and
makes it harder to find the remaining corner cases.
I can assist to find a solution here, but I have not enough time to do this
alone at the moment.
Best regards,
Daniel Starke
On Tue, Apr 16, 2024 at 12:26:30PM +0000, Starke, Daniel wrote:
> > We think either (1) gsm_dlci_alloc() should hold a lock(mutex) and do
> > internal check about whether gsm->dlci[addr] is NUll or not, OR
> > (2) all callers of gsm_dlci_alloc() should hold gsm->mutex and check
> > whether gsm->dlci[addr] is NUll or not (like gsmtty_install()).
> >
> > Could you check this? If it makes sense, we will write a patch
> > following one of the suggestions.
>
> We are currently dealing with multiple race conditions in the n_gsm. Not
> only for the syzkaller reports but in recent exploits for example, too.
> I believe we need an overall concept/solution for these.
>
> Currently, the following actors can race:
> - ioctl (e.g. resetting the mux or one of its DLCIs)
> - ldisc callbacks (e.g. receiving SABM or DISC in gsm_queue())
> - tty callbacks (e.g. gsmtty_hangup())
> - internal write task (gsmld_write_task())
> - internal timers (e.g. gsm_control_keep_alive())
> - driver removal
>
> Each with another and ioctl's from multiple threads.
>
> Guarding these is not trivial for the following reasons:
> - execution context may not allow sleep (timers, write task, tty callbacks?)
> - creating an atomic section may conflict inner sleeps (e.g. ioctl)
> - dead-locking needs to be considered
> - locking may introduce high performance bottlenecks
>
> Still, not guarding one of the racing actors does not appears to be
> possible as I see it.
>
> Unfortunately, for the same reason the sleep while atomic issue when using
> a console on a virtual tty has not been fixed yet, I have no solution at
> hand. We are dealing with a quite complex situation here.
>
> Nevertheless, creating many small patches here and there should not be our
> solution for obvious reasons. This does not give a complete solution and
> makes it harder to find the remaining corner cases.
>
> I can assist to find a solution here, but I have not enough time to do this
> alone at the moment.
>
Thank you for your comprehensive explanation.
As you described, there are many concurrency issues entangled each other and
fixing single one will be rather confusing.
I hope this mail is supportive on solving the problems.
Sincerely,
Yewon Choi
> Best regards,
> Daniel Starke