2023-12-28 03:07:58

by Yang, Chenyuan

[permalink] [raw]
Subject: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

Hello,

We encountered 5 different crashes in the cec device by using our generated syscall specification for it, here are the descriptions of these 5 crashes and the related files are attached:

1. KASAN: slab-use-after-free Read in cec_queue_msg_fh (Reproducible)
2. WARNING: ODEBUG bug in cec_transmit_msg_fh
3. WARNING in cec_data_cancel
4. INFO: task hung in cec_claim_log_addrs (Reproducible)
5. general protection fault in cec_transmit_done_ts

For ?KASAN: slab-use-after-free Read in cec_queue_msg_fh?, we attached a syzkaller program to reproduce it. This crash is caused by ` list_add_tail(&entry->list, &fh->msgs);` (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L224), which reads a variable freed by `kfree(fh);` (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-api.c#L684). The reproducible program is a Syzkaller program, which can be executed following this document: https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md.

For ?WARNING: ODEBUG bug in cec_transmit_msg_fh?, unfortunately we failed to reproduce it but we indeed trigger this crash almost every time when we fuzz the cec device only. We attached the report and log for this bug. It tries freeing an active object by using `kfree(data);` (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L930).

For ?WARNING in cec_data_cancel?, it is an internal warning used in cec_data_cancel (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L365), which checks whether the transmit is the current or pending. Unfortunately, we also don't have the reproducible program for this bug, but we attach the report and log.

For ?INFO: task hung in cec_claim_log_addrs?, the kernel hangs when the cec device ` wait_for_completion(&adap->config_completion);` (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L1579). We have a reproducible C program for this.

For ?general protection fault in cec_transmit_done_ts?, the cec device tries derefencing a non-canonical address 0xdffffc00000000e0: 0000 [#1], which is related to the invocation ` cec_transmit_attempt_done_ts ` (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L697). It seems that the address of cec_adapter is totally wrong. We do not have a reproducible program for this bug, but the log and report for it are attached.

If you have any questions or require more information, please feel free to contact us.

Best,
Chenyuan


Attachments:
general-protection-fault_cec_transmit_done_ts-machineInfo (3.60 kB)
general-protection-fault_cec_transmit_done_ts-machineInfo
general-protection-fault_cec_transmit_done_ts.log (1.01 MB)
general-protection-fault_cec_transmit_done_ts.log
general-protection-fault_cec_transmit_done_ts.report (3.32 kB)
general-protection-fault_cec_transmit_done_ts.report
INFO-cec_claim_log_addrs-repro.cprog (22.25 kB)
INFO-cec_claim_log_addrs-repro.cprog
INFO-cec_claim_log_addrs-repro.log (1.02 MB)
INFO-cec_claim_log_addrs-repro.log
INFO-cec_claim_log_addrs-repro.prog (2.08 kB)
INFO-cec_claim_log_addrs-repro.prog
INFO-cec_claim_log_addrs-repro.report (6.39 kB)
INFO-cec_claim_log_addrs-repro.report
KASAN-UAF-cec_queue_msg_fh.log (68.25 kB)
KASAN-UAF-cec_queue_msg_fh.log
KASAN-UAF-cec_queue_msg_fh.prog (2.98 kB)
KASAN-UAF-cec_queue_msg_fh.prog
KASAN-UAF-cec_queue_msg_fh.report (9.00 kB)
KASAN-UAF-cec_queue_msg_fh.report
WARNING_cec_data_cancel-machineInfo (3.59 kB)
WARNING_cec_data_cancel-machineInfo
WARNING_cec_data_cancel.log (1.01 MB)
WARNING_cec_data_cancel.log
WARNING_cec_data_cancel.report (2.29 kB)
WARNING_cec_data_cancel.report
WARNING_ODEBUG_cec_transmit_msg_fh-machineInfo (3.59 kB)
WARNING_ODEBUG_cec_transmit_msg_fh-machineInfo
WARNING_ODEBUG_cec_transmit_msg_fh.log (1.01 MB)
WARNING_ODEBUG_cec_transmit_msg_fh.log
WARNING_ODEBUG_cec_transmit_msg_fh.report (2.69 kB)
WARNING_ODEBUG_cec_transmit_msg_fh.report
Download all attachments

2023-12-29 06:23:45

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

On Thu, 28 Dec 2023 at 10:58, Yang, Chenyuan <[email protected]> wrote:
>
> Hello,
>
> We encountered 5 different crashes in the cec device by using our generated syscall specification for it, here are the descriptions of these 5 crashes and the related files are attached:

Hi Yang,

Nice!

Do you plan to upstream your cec descriptions to syzkaller? That would
be useful.




> 1. KASAN: slab-use-after-free Read in cec_queue_msg_fh (Reproducible)
>
> 2. WARNING: ODEBUG bug in cec_transmit_msg_fh
>
> 3. WARNING in cec_data_cancel
>
> 4. INFO: task hung in cec_claim_log_addrs (Reproducible)
>
> 5. general protection fault in cec_transmit_done_ts
>
>
>
> For “KASAN: slab-use-after-free Read in cec_queue_msg_fh”, we attached a syzkaller program to reproduce it. This crash is caused by ` list_add_tail(&entry->list, &fh->msgs);` (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L224), which reads a variable freed by `kfree(fh);` (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-api.c#L684). The reproducible program is a Syzkaller program, which can be executed following this document: https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md.
>
>
>
> For “WARNING: ODEBUG bug in cec_transmit_msg_fh”, unfortunately we failed to reproduce it but we indeed trigger this crash almost every time when we fuzz the cec device only. We attached the report and log for this bug. It tries freeing an active object by using `kfree(data);` (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L930).
>
>
>
> For “WARNING in cec_data_cancel”, it is an internal warning used in cec_data_cancel (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L365), which checks whether the transmit is the current or pending. Unfortunately, we also don't have the reproducible program for this bug, but we attach the report and log.
>
>
>
> For “INFO: task hung in cec_claim_log_addrs”, the kernel hangs when the cec device ` wait_for_completion(&adap->config_completion);` (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L1579). We have a reproducible C program for this.
>
>
>
> For “general protection fault in cec_transmit_done_ts”, the cec device tries derefencing a non-canonical address 0xdffffc00000000e0: 0000 [#1], which is related to the invocation ` cec_transmit_attempt_done_ts ` (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L697). It seems that the address of cec_adapter is totally wrong. We do not have a reproducible program for this bug, but the log and report for it are attached.
>
>
>
> If you have any questions or require more information, please feel free to contact us.
>
>
>
> Best,
>
> Chenyuan
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller/PH7PR11MB57688E64ADE4FE82E658D86DA09EA%40PH7PR11MB5768.namprd11.prod.outlook.com.

2024-01-18 07:52:25

by Hans Verkuil

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

On 18/01/2024 05:25, Zhao, Zijie wrote:
> Dear Developers,
>
> We hope this email finds you well. We took a deeper look at the first crash KASAN: slab-use-after-free Read in cec_queue_msg_fh. We believe the cause is that one thread took the lock of a `struct
> cec_fh` but another thread freed it:
>
> One thread takes the lock of the `fh` of type `struct cec_fh`first (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L219);
> Another thread frees this `fh` without checking if any other thread is holding the lock (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-api.c#L684);
> Then KASAN is triggered when the first thread tries to access `fh->msgs` (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L224).
>
>
> While this particular reproducer seems harmless, we think the free might cause more problems when paired with threads running other functions that work on `fh`and then KASAN is disabled. We also think
> the `struct cec_fh` (https://elixir.bootlin.com/linux/v6.7-rc7/source/include/media/cec.h#L90) is worth attention since it stores many function pointers (e.g. `fh->adap->ops` stores
> https://elixir.bootlin.com/linux/v6.7-rc7/source/include/media/cec.h#L115 and `fh->adap->pin->ops` stores https://elixir.bootlin.com/linux/v6.7-rc7/source/include/media/cec-pin.h#L36).
>
> Could you please kindly take a look at the crashes as you have more expertise in them?

I've been looking at these on and off whenever I have some time. I found two issues and am
on the trail of a third. Once I have a patch for the third I was planning to post the patches
and ask you to retest. Some of the issues you found might all relate to the same root cause
(esp. the locking issue), so it would be great if you could help with that.

Regards,

Hans

>
> Thank you for your time!
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> *From:* Yang, Chenyuan <[email protected]>
> *Sent:* Wednesday, December 27, 2023 8:33 PM
> *To:* [email protected] <[email protected]>; [email protected] <[email protected]>
> *Cc:* [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected]
> <[email protected]>; Zhao, Zijie <[email protected]>; Zhang, Lingming <[email protected]>
> *Subject:* [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)
>  
>
> Hello,
>
>  
>
> We encountered 5 different crashes in the cec device by using our generated syscall specification for it, here are the descriptions of these 5 crashes and the related files are attached:
>
> 1. KASAN: slab-use-after-free Read in cec_queue_msg_fh (Reproducible)
>
> 2. WARNING: ODEBUG bug in cec_transmit_msg_fh
>
> 3. WARNING in cec_data_cancel
>
> 4. INFO: task hung in cec_claim_log_addrs (Reproducible)
>
> 5. general protection fault in cec_transmit_done_ts
>
>  
>
> For “KASAN: slab-use-after-free Read in cec_queue_msg_fh”, we attached a syzkaller program to reproduce it. This crash is caused by ` list_add_tail(&entry->list, &fh->msgs);`
> (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L224 <https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L224>), which reads a
> variable freed by `kfree(fh);` (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-api.c#L684
> <https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-api.c#L684>). The reproducible program is a Syzkaller program, which can be executed following this document:
> https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md <https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md>.
>
>  
>
> For “WARNING: ODEBUG bug in cec_transmit_msg_fh”, unfortunately we failed to reproduce it but we indeed trigger this crash almost every time when we fuzz the cec device only. We attached the report
> and log for this bug. It tries freeing an active object by using `kfree(data);` (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L930
> <https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L930>).
>
>  
>
> For “WARNING in cec_data_cancel”, it is an internal warning used in cec_data_cancel (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L365
> <https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L365>), which checks whether the transmit is the current or pending. Unfortunately, we also don't have the
> reproducible program for this bug, but we attach the report and log.
>
>  
>
> For “INFO: task hung in cec_claim_log_addrs”, the kernel hangs when the cec device ` wait_for_completion(&adap->config_completion);`
> (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L1579 <https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L1579>). We have a
> reproducible C program for this.
>
>  
>
> For “general protection fault in cec_transmit_done_ts”, the cec device tries derefencing a non-canonical address 0xdffffc00000000e0: 0000 [#1], which is related to the invocation `
> cec_transmit_attempt_done_ts ` (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L697
> <https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L697>). It seems that the address of cec_adapter is totally wrong. We do not have a reproducible program for this
> bug, but the log and report for it are attached.
>
>  
>
> If you have any questions or require more information, please feel free to contact us.
>
>  
>
> Best,
>
> Chenyuan
>


2024-01-19 08:17:42

by Hans Verkuil

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

Hi Chenyuan,

On 28/12/2023 03:33, Yang, Chenyuan wrote:
> Hello,
>
>  
>
> We encountered 5 different crashes in the cec device by using our generated syscall specification for it, here are the descriptions of these 5 crashes and the related files are attached:
>
> 1. KASAN: slab-use-after-free Read in cec_queue_msg_fh (Reproducible)
>
> 2. WARNING: ODEBUG bug in cec_transmit_msg_fh
>
> 3. WARNING in cec_data_cancel
>
> 4. INFO: task hung in cec_claim_log_addrs (Reproducible)
>
> 5. general protection fault in cec_transmit_done_ts
>
>  
>
> For “KASAN: slab-use-after-free Read in cec_queue_msg_fh”, we attached a syzkaller program to reproduce it. This crash is caused by ` list_add_tail(&entry->list, &fh->msgs);`
> (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L224 <https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L224>), which reads a
> variable freed by `kfree(fh);` (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-api.c#L684
> <https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-api.c#L684>). The reproducible program is a Syzkaller program, which can be executed following this document:
> https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md <https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md>.
>
>  
>
> For “WARNING: ODEBUG bug in cec_transmit_msg_fh”, unfortunately we failed to reproduce it but we indeed trigger this crash almost every time when we fuzz the cec device only. We attached the report
> and log for this bug. It tries freeing an active object by using `kfree(data);` (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L930
> <https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L930>).
>
>  
>
> For “WARNING in cec_data_cancel”, it is an internal warning used in cec_data_cancel (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L365
> <https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L365>), which checks whether the transmit is the current or pending. Unfortunately, we also don't have the
> reproducible program for this bug, but we attach the report and log.
>
>  
>
> For “INFO: task hung in cec_claim_log_addrs”, the kernel hangs when the cec device ` wait_for_completion(&adap->config_completion);`
> (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L1579 <https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L1579>). We have a
> reproducible C program for this.
>
>  
>
> For “general protection fault in cec_transmit_done_ts”, the cec device tries derefencing a non-canonical address 0xdffffc00000000e0: 0000 [#1], which is related to the invocation `
> cec_transmit_attempt_done_ts ` (https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L697
> <https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c#L697>). It seems that the address of cec_adapter is totally wrong. We do not have a reproducible program for this
> bug, but the log and report for it are attached.
>
>  
>
> If you have any questions or require more information, please feel free to contact us.

Can you retest with the patch below? I'm fairly certain this will fix issues 1 and 2.
I suspect at least some of the others are related to 1 & 2, but since I could never
get the reproducers working reliably, I had a hard time determining if there are more
bugs or if this patch resolves everything.

Your help testing this patch will be appreciated!

Regards,

Hans

Signed-off-by: Hans Verkuil <[email protected]>
---
drivers/media/cec/core/cec-adap.c | 3 +--
drivers/media/cec/core/cec-api.c | 3 +++
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
index 5741adf09a2e..079c3b142d91 100644
--- a/drivers/media/cec/core/cec-adap.c
+++ b/drivers/media/cec/core/cec-adap.c
@@ -936,8 +936,7 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg,
*/
mutex_unlock(&adap->lock);
wait_for_completion_killable(&data->c);
- if (!data->completed)
- cancel_delayed_work_sync(&data->work);
+ cancel_delayed_work_sync(&data->work);
mutex_lock(&adap->lock);

/* Cancel the transmit if it was interrupted */
diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c
index 67dc79ef1705..d64bb716f9c6 100644
--- a/drivers/media/cec/core/cec-api.c
+++ b/drivers/media/cec/core/cec-api.c
@@ -664,6 +664,8 @@ static int cec_release(struct inode *inode, struct file *filp)
list_del_init(&data->xfer_list);
}
mutex_unlock(&adap->lock);
+
+ mutex_lock(&fh->lock);
while (!list_empty(&fh->msgs)) {
struct cec_msg_entry *entry =
list_first_entry(&fh->msgs, struct cec_msg_entry, list);
@@ -681,6 +683,7 @@ static int cec_release(struct inode *inode, struct file *filp)
kfree(entry);
}
}
+ mutex_unlock(&fh->lock);
kfree(fh);

cec_put_device(devnode);
--
2.42.0



2024-01-22 19:26:25

by Yang, Chenyuan

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

Hi Hans,

Thank you very much for providing the patch!

After running the reproducible programs and 24-hour fuzzing, it seems that this patch could fix the issues 1, 2, 3 and 5.

The 4th issue, "INFO: task hung in cec_claim_log_addrs", is still triggered after applying the patch.

If you need more information, feel free to let met know.

Best,
Chenyuan

On 1/19/24, 2:17 AM, "Hans Verkuil" <[email protected]> wrote:

Hi Chenyuan,

On 28/12/2023 03:33, Yang, Chenyuan wrote:
> Hello,
>
>
>
> We encountered 5 different crashes in the cec device by using our generated syscall specification for it, here are the descriptions of these 5 crashes and the related files are attached:
>
> 1. KASAN: slab-use-after-free Read in cec_queue_msg_fh (Reproducible)
>
> 2. WARNING: ODEBUG bug in cec_transmit_msg_fh
>
> 3. WARNING in cec_data_cancel
>
> 4. INFO: task hung in cec_claim_log_addrs (Reproducible)
>
> 5. general protection fault in cec_transmit_done_ts
>
>
>
> For “KASAN: slab-use-after-free Read in cec_queue_msg_fh”, we attached a syzkaller program to reproduce it. This crash is caused by ` list_add_tail(&entry->list, &fh->msgs);`
> (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L224__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxZMf3fVQ$ <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L224__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxZMf3fVQ$ >), which reads a
> variable freed by `kfree(fh);` (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-api.c*L684__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxT0xaxsY$
> <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-api.c*L684__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxT0xaxsY$ >). The reproducible program is a Syzkaller program, which can be executed following this document:
> https://urldefense.com/v3/__https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md__;!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZx32PwCDs$ <https://urldefense.com/v3/__https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md__;!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZx32PwCDs$ >.
>
>
>
> For “WARNING: ODEBUG bug in cec_transmit_msg_fh”, unfortunately we failed to reproduce it but we indeed trigger this crash almost every time when we fuzz the cec device only. We attached the report
> and log for this bug. It tries freeing an active object by using `kfree(data);` (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L930__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxhwnuzFw$
> <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L930__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxhwnuzFw$ >).
>
>
>
> For “WARNING in cec_data_cancel”, it is an internal warning used in cec_data_cancel (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L365__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxJ9Jw4fU$
> <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L365__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxJ9Jw4fU$ >), which checks whether the transmit is the current or pending. Unfortunately, we also don't have the
> reproducible program for this bug, but we attach the report and log.
>
>
>
> For “INFO: task hung in cec_claim_log_addrs”, the kernel hangs when the cec device ` wait_for_completion(&adap->config_completion);`
> (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L1579__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxKP44OE0$ <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L1579__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxKP44OE0$ >). We have a
> reproducible C program for this.
>
>
>
> For “general protection fault in cec_transmit_done_ts”, the cec device tries derefencing a non-canonical address 0xdffffc00000000e0: 0000 [#1], which is related to the invocation `
> cec_transmit_attempt_done_ts ` (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L697__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxGnBFZv0$
> <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L697__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxGnBFZv0$ >). It seems that the address of cec_adapter is totally wrong. We do not have a reproducible program for this
> bug, but the log and report for it are attached.
>
>
>
> If you have any questions or require more information, please feel free to contact us.

Can you retest with the patch below? I'm fairly certain this will fix issues 1 and 2.
I suspect at least some of the others are related to 1 & 2, but since I could never
get the reproducers working reliably, I had a hard time determining if there are more
bugs or if this patch resolves everything.

Your help testing this patch will be appreciated!

Regards,

Hans

Signed-off-by: Hans Verkuil <[email protected]>
---
drivers/media/cec/core/cec-adap.c | 3 +--
drivers/media/cec/core/cec-api.c | 3 +++
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
index 5741adf09a2e..079c3b142d91 100644
--- a/drivers/media/cec/core/cec-adap.c
+++ b/drivers/media/cec/core/cec-adap.c
@@ -936,8 +936,7 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg,
*/
mutex_unlock(&adap->lock);
wait_for_completion_killable(&data->c);
- if (!data->completed)
- cancel_delayed_work_sync(&data->work);
+ cancel_delayed_work_sync(&data->work);
mutex_lock(&adap->lock);

/* Cancel the transmit if it was interrupted */
diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c
index 67dc79ef1705..d64bb716f9c6 100644
--- a/drivers/media/cec/core/cec-api.c
+++ b/drivers/media/cec/core/cec-api.c
@@ -664,6 +664,8 @@ static int cec_release(struct inode *inode, struct file *filp)
list_del_init(&data->xfer_list);
}
mutex_unlock(&adap->lock);
+
+ mutex_lock(&fh->lock);
while (!list_empty(&fh->msgs)) {
struct cec_msg_entry *entry =
list_first_entry(&fh->msgs, struct cec_msg_entry, list);
@@ -681,6 +683,7 @@ static int cec_release(struct inode *inode, struct file *filp)
kfree(entry);
}
}
+ mutex_unlock(&fh->lock);
kfree(fh);

cec_put_device(devnode);
--
2.42.0



2024-01-23 08:02:27

by Hans Verkuil

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

On 22/01/2024 20:11, Yang, Chenyuan wrote:
> Hi Hans,
>
> Thank you very much for providing the patch!
>
> After running the reproducible programs and 24-hour fuzzing, it seems that this patch could fix the issues 1, 2, 3 and 5.

Ah, that's good news.

>
> The 4th issue, "INFO: task hung in cec_claim_log_addrs", is still triggered after applying the patch.

I'll dig a bit deeper into this one, see if I can figure out the cause.

Thank you for your help in testing this!

Regards,

Hans

>
> If you need more information, feel free to let met know.
>
> Best,
> Chenyuan
>
> On 1/19/24, 2:17 AM, "Hans Verkuil" <[email protected]> wrote:
>
> Hi Chenyuan,
>
> On 28/12/2023 03:33, Yang, Chenyuan wrote:
> > Hello,
> >
> >
> >
> > We encountered 5 different crashes in the cec device by using our generated syscall specification for it, here are the descriptions of these 5 crashes and the related files are attached:
> >
> > 1. KASAN: slab-use-after-free Read in cec_queue_msg_fh (Reproducible)
> >
> > 2. WARNING: ODEBUG bug in cec_transmit_msg_fh
> >
> > 3. WARNING in cec_data_cancel
> >
> > 4. INFO: task hung in cec_claim_log_addrs (Reproducible)
> >
> > 5. general protection fault in cec_transmit_done_ts
> >
> >
> >
> > For “KASAN: slab-use-after-free Read in cec_queue_msg_fh”, we attached a syzkaller program to reproduce it. This crash is caused by ` list_add_tail(&entry->list, &fh->msgs);`
> > (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L224__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxZMf3fVQ$ <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L224__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxZMf3fVQ$ >), which reads a
> > variable freed by `kfree(fh);` (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-api.c*L684__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxT0xaxsY$
> > <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-api.c*L684__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxT0xaxsY$ >). The reproducible program is a Syzkaller program, which can be executed following this document:
> > https://urldefense.com/v3/__https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md__;!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZx32PwCDs$ <https://urldefense.com/v3/__https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md__;!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZx32PwCDs$ >.
> >
> >
> >
> > For “WARNING: ODEBUG bug in cec_transmit_msg_fh”, unfortunately we failed to reproduce it but we indeed trigger this crash almost every time when we fuzz the cec device only. We attached the report
> > and log for this bug. It tries freeing an active object by using `kfree(data);` (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L930__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxhwnuzFw$
> > <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L930__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxhwnuzFw$ >).
> >
> >
> >
> > For “WARNING in cec_data_cancel”, it is an internal warning used in cec_data_cancel (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L365__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxJ9Jw4fU$
> > <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L365__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxJ9Jw4fU$ >), which checks whether the transmit is the current or pending. Unfortunately, we also don't have the
> > reproducible program for this bug, but we attach the report and log.
> >
> >
> >
> > For “INFO: task hung in cec_claim_log_addrs”, the kernel hangs when the cec device ` wait_for_completion(&adap->config_completion);`
> > (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L1579__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxKP44OE0$ <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L1579__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxKP44OE0$ >). We have a
> > reproducible C program for this.
> >
> >
> >
> > For “general protection fault in cec_transmit_done_ts”, the cec device tries derefencing a non-canonical address 0xdffffc00000000e0: 0000 [#1], which is related to the invocation `
> > cec_transmit_attempt_done_ts ` (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L697__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxGnBFZv0$
> > <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c*L697__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxGnBFZv0$ >). It seems that the address of cec_adapter is totally wrong. We do not have a reproducible program for this
> > bug, but the log and report for it are attached.
> >
> >
> >
> > If you have any questions or require more information, please feel free to contact us.
>
> Can you retest with the patch below? I'm fairly certain this will fix issues 1 and 2.
> I suspect at least some of the others are related to 1 & 2, but since I could never
> get the reproducers working reliably, I had a hard time determining if there are more
> bugs or if this patch resolves everything.
>
> Your help testing this patch will be appreciated!
>
> Regards,
>
> Hans
>
> Signed-off-by: Hans Verkuil <[email protected]>
> ---
> drivers/media/cec/core/cec-adap.c | 3 +--
> drivers/media/cec/core/cec-api.c | 3 +++
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
> index 5741adf09a2e..079c3b142d91 100644
> --- a/drivers/media/cec/core/cec-adap.c
> +++ b/drivers/media/cec/core/cec-adap.c
> @@ -936,8 +936,7 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg,
> */
> mutex_unlock(&adap->lock);
> wait_for_completion_killable(&data->c);
> - if (!data->completed)
> - cancel_delayed_work_sync(&data->work);
> + cancel_delayed_work_sync(&data->work);
> mutex_lock(&adap->lock);
>
> /* Cancel the transmit if it was interrupted */
> diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c
> index 67dc79ef1705..d64bb716f9c6 100644
> --- a/drivers/media/cec/core/cec-api.c
> +++ b/drivers/media/cec/core/cec-api.c
> @@ -664,6 +664,8 @@ static int cec_release(struct inode *inode, struct file *filp)
> list_del_init(&data->xfer_list);
> }
> mutex_unlock(&adap->lock);
> +
> + mutex_lock(&fh->lock);
> while (!list_empty(&fh->msgs)) {
> struct cec_msg_entry *entry =
> list_first_entry(&fh->msgs, struct cec_msg_entry, list);
> @@ -681,6 +683,7 @@ static int cec_release(struct inode *inode, struct file *filp)
> kfree(entry);
> }
> }
> + mutex_unlock(&fh->lock);
> kfree(fh);
>
> cec_put_device(devnode);
> --
> 2.42.0
>
>
>


2024-01-23 10:39:16

by Hans Verkuil

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

On 23/01/2024 09:02, Hans Verkuil wrote:
> On 22/01/2024 20:11, Yang, Chenyuan wrote:
>> Hi Hans,
>>
>> Thank you very much for providing the patch!
>>
>> After running the reproducible programs and 24-hour fuzzing, it seems that this patch could fix the issues 1, 2, 3 and 5.
>
> Ah, that's good news.
>
>>
>> The 4th issue, "INFO: task hung in cec_claim_log_addrs", is still triggered after applying the patch.
>
> I'll dig a bit deeper into this one, see if I can figure out the cause.
>
> Thank you for your help in testing this!

Can you do another testrun with this patch on top of the previous one?

Thank you!

Regards,

Hans

Signed-off-by: Hans Verkuil <[email protected]>
---
diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
index 079c3b142d91..7b5dcdf775cc 100644
--- a/drivers/media/cec/core/cec-adap.c
+++ b/drivers/media/cec/core/cec-adap.c
@@ -935,7 +935,8 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg,
* Release the lock and wait, retake the lock afterwards.
*/
mutex_unlock(&adap->lock);
- wait_for_completion_killable(&data->c);
+ wait_for_completion_killable_timeout(&data->c,
+ msecs_to_jiffies(adap->xfer_timeout_ms + 1000));
cancel_delayed_work_sync(&data->work);
mutex_lock(&adap->lock);



2024-01-24 13:34:11

by Yang, Chenyuan

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

Hi Hans,

Thanks for your prompt response!

After applying the new patch, the system hang issue persists. I also tested with the latest Linux version, but the problem remains. The error displayed is 'INFO: task syz-executor372:16736 blocked for more than 143 seconds.' Could it be that the timeout setting for the CEC is too extensive, contributing to this hang?

Here is the report, trace and reproducible program for the hang:

```
Syzkaller hit 'INFO: task hung in cec_claim_log_addrs' bug.

INFO: task syz-executor372:16736 blocked for more than 143 seconds.
Not tainted 6.8.0-rc1-00029-g615d30064886-dirty #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor372 state:D stack:27872 pid:16736 tgid:16734 ppid:8035 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5400 [inline]
__schedule+0xd24/0x5b10 kernel/sched/core.c:6727
__schedule_loop kernel/sched/core.c:6802 [inline]
schedule+0xe9/0x270 kernel/sched/core.c:6817
schedule_timeout+0x250/0x290 kernel/time/timer.c:2159
do_wait_for_common kernel/sched/completion.c:95 [inline]
__wait_for_common+0x1cd/0x5d0 kernel/sched/completion.c:116
cec_claim_log_addrs+0x192/0x260 drivers/media/cec/core/cec-adap.c:1606
__cec_s_log_addrs+0xdfc/0x16e0 drivers/media/cec/core/cec-adap.c:1920
cec_adap_s_log_addrs drivers/media/cec/core/cec-api.c:184 [inline]
cec_ioctl+0x1e7c/0x2690 drivers/media/cec/core/cec-api.c:528
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:871 [inline]
__se_sys_ioctl fs/ioctl.c:857 [inline]
__x64_sys_ioctl+0x19d/0x210 fs/ioctl.c:857
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd2/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7fed9d78f23d
RSP: 002b:00007fed9d712198 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fed9d8242f0 RCX: 00007fed9d78f23d
RDX: 0000000020000100 RSI: 00000000c05c6104 RDI: 0000000000000004
RBP: 00007fed9d7ef08c R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 55803a8253a605c5
R13: 956cd5a763ae25af R14: 8158e34c95c29778 R15: 00007fed9d8242f8
</TASK>

Showing all locks held in the system:
1 lock held by khungtaskd/29:
#0: ffffffff8cfabf20 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
#0: ffffffff8cfabf20 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
#0: ffffffff8cfabf20 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x75/0x340 kernel/locking/lockdep.c:6614
3 locks held by systemd-journal/4510:
1 lock held by in:imklog/7580:
#0: ffff8880132ef9c8 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xdb/0x160 fs/file.c:1191

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 29 Comm: khungtaskd Not tainted 6.8.0-rc1-00029-g615d30064886-dirty #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
nmi_cpu_backtrace+0x29c/0x350 lib/nmi_backtrace.c:113
nmi_trigger_cpumask_backtrace+0x299/0x300 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:160 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:222 [inline]
watchdog+0xe7a/0x1100 kernel/hung_task.c:379
kthread+0x2cc/0x3b0 kernel/kthread.c:388
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
</TASK>
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 7583 Comm: rs:main Q:Reg Not tainted 6.8.0-rc1-00029-g615d30064886-dirty #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0033:0x7f0236ea76ac
Code: 08 48 8d 4d 08 48 89 d0 48 d1 e8 48 39 04 24 72 3d 44 89 e6 48 89 ef e8 42 fc ff ff 48 8b 44 24 78 64 48 2b 04 25 28 00 00 00 <0f> 85 e6 01 00 00 48 81 c4 88 00 00 00 4c 89 ef 5b 5d 41 5c 41 5d
RSP: 002b:00007f0235bffac0 EFLAGS: 00000246
RAX: 0000000000000000 RBX: 000000000000011f RCX: 0000561d7d1f6e70
RDX: 000000000000011e RSI: 0000000000000000 RDI: 0000561d7d1f6e8c
RBP: 0000561d7d1f6e68 R08: 0000000000000001 R09: 000000000000000a
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000561d7d1f6c50 R14: 0000561d7d1f6e94 R15: 00007f0235bffaf0
FS: 00007f0235c00700 GS: 0000000000000000


Syzkaller reproducer:
# {Threaded:true Repeat:true RepeatTimes:0 Procs:1 Slowdown:1 Sandbox: SandboxArg:0 Leak:false NetInjection:false NetDevices:false NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false DevlinkPCI:false NicVF:false USB:false VhciInjection:false Wifi:false IEEE802154:false Sysctl:false Swap:false UseTmpDir:false HandleSegv:false Repro:false Trace:false LegacyOptions:{Collide:false Fault:false FaultCall:0 FaultNth:0}}
r0 = syz_open_dev$cec_llm_open(&(0x7f0000000240), 0x0, 0x0)
ioctl$CEC_ADAP_S_LOG_ADDRS(r0, 0xc05c6104, &(0x7f0000000000)={"1157fbfa", 0x7, 0x5, 0xd4, 0xfffffffd, 0x5, "020000000a3ac8d3f653fc5bfc63a9", "8c3500", "0000e600", "519a84f1", ["1d10000000ffffffffff00", "b2c5d7e70561fbc4c39b5908", "d9f08668551414911a90c022", "6822c4c1322b547b17359592"]}) (async)
r1 = syz_open_dev$cec_llm_open(&(0x7f00000002c0), 0x0, 0x0)
ioctl$CEC_ADAP_S_LOG_ADDRS(r1, 0xc05c6104, &(0x7f0000000100)={"6b2dbba0", 0x0, 0x6, 0x9, 0x0, 0x5, "8b9fc0d5f029b78f8d31f64ac97f9d", "28f232c0", "efb2fcf5", "97541973", ["5c72b16343317b0b23e10116", "e9b6d0cfaa7cca88684a584d", "586b8e57a6be0da0ae2f27eb", "cda634d3d560667bdac2e046"]}) (async)
ioctl$CEC_ADAP_S_LOG_ADDRS(r1, 0xc05c6104, &(0x7f0000000080)={"5df5676a", 0x0, 0x81, 0x0, 0x0, 0x0, "7897c2954ce35881e0810c5295ad35", "2becffd0", "b195f683", "c9667930", ["af25ae63a7d56c95987751c2", "c505a653823a8055df682105", "7bb64494ee0de68901476d55", "f16f1b8989a5e7abd92df4a4"]})


C reproducer:
// autogenerated by syzkaller (https://github.com/google/syzkaller)

#define _GNU_SOURCE

#include <dirent.h>
#include <endian.h>
#include <errno.h>
#include <fcntl.h>
#include <pthread.h>
#include <signal.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/prctl.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#include <linux/futex.h>

static void sleep_ms(uint64_t ms)
{
usleep(ms * 1000);
}

static uint64_t current_time_ms(void)
{
struct timespec ts;
if (clock_gettime(CLOCK_MONOTONIC, &ts))
exit(1);
return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
}

static void thread_start(void* (*fn)(void*), void* arg)
{
pthread_t th;
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, 128 << 10);
int i = 0;
for (; i < 100; i++) {
if (pthread_create(&th, &attr, fn, arg) == 0) {
pthread_attr_destroy(&attr);
return;
}
if (errno == EAGAIN) {
usleep(50);
continue;
}
break;
}
exit(1);
}

typedef struct {
int state;
} event_t;

static void event_init(event_t* ev)
{
ev->state = 0;
}

static void event_reset(event_t* ev)
{
ev->state = 0;
}

static void event_set(event_t* ev)
{
if (ev->state)
exit(1);
__atomic_store_n(&ev->state, 1, __ATOMIC_RELEASE);
syscall(SYS_futex, &ev->state, FUTEX_WAKE | FUTEX_PRIVATE_FLAG, 1000000);
}

static void event_wait(event_t* ev)
{
while (!__atomic_load_n(&ev->state, __ATOMIC_ACQUIRE))
syscall(SYS_futex, &ev->state, FUTEX_WAIT | FUTEX_PRIVATE_FLAG, 0, 0);
}

static int event_isset(event_t* ev)
{
return __atomic_load_n(&ev->state, __ATOMIC_ACQUIRE);
}

static int event_timedwait(event_t* ev, uint64_t timeout)
{
uint64_t start = current_time_ms();
uint64_t now = start;
for (;;) {
uint64_t remain = timeout - (now - start);
struct timespec ts;
ts.tv_sec = remain / 1000;
ts.tv_nsec = (remain % 1000) * 1000 * 1000;
syscall(SYS_futex, &ev->state, FUTEX_WAIT | FUTEX_PRIVATE_FLAG, 0, &ts);
if (__atomic_load_n(&ev->state, __ATOMIC_ACQUIRE))
return 1;
now = current_time_ms();
if (now - start > timeout)
return 0;
}
}

static bool write_file(const char* file, const char* what, ...)
{
char buf[1024];
va_list args;
va_start(args, what);
vsnprintf(buf, sizeof(buf), what, args);
va_end(args);
buf[sizeof(buf) - 1] = 0;
int len = strlen(buf);
int fd = open(file, O_WRONLY | O_CLOEXEC);
if (fd == -1)
return false;
if (write(fd, buf, len) != len) {
int err = errno;
close(fd);
errno = err;
return false;
}
close(fd);
return true;
}

static long syz_open_dev(volatile long a0, volatile long a1, volatile long a2)
{
if (a0 == 0xc || a0 == 0xb) {
char buf[128];
sprintf(buf, "/dev/%s/%d:%d", a0 == 0xc ? "char" : "block", (uint8_t)a1, (uint8_t)a2);
return open(buf, O_RDWR, 0);
} else {
char buf[1024];
char* hash;
strncpy(buf, (char*)a0, sizeof(buf) - 1);
buf[sizeof(buf) - 1] = 0;
while ((hash = strchr(buf, '#'))) {
*hash = '0' + (char)(a1 % 10);
a1 /= 10;
}
return open(buf, a2, 0);
}
}

static void kill_and_wait(int pid, int* status)
{
kill(-pid, SIGKILL);
kill(pid, SIGKILL);
for (int i = 0; i < 100; i++) {
if (waitpid(-1, status, WNOHANG | __WALL) == pid)
return;
usleep(1000);
}
DIR* dir = opendir("/sys/fs/fuse/connections");
if (dir) {
for (;;) {
struct dirent* ent = readdir(dir);
if (!ent)
break;
if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
continue;
char abort[300];
snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort", ent->d_name);
int fd = open(abort, O_WRONLY);
if (fd == -1) {
continue;
}
if (write(fd, abort, 1) < 0) {
}
close(fd);
}
closedir(dir);
} else {
}
while (waitpid(-1, status, __WALL) != pid) {
}
}

static void setup_test()
{
prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
setpgrp();
write_file("/proc/self/oom_score_adj", "1000");
}

struct thread_t {
int created, call;
event_t ready, done;
};

static struct thread_t threads[16];
static void execute_call(int call);
static int running;

static void* thr(void* arg)
{
struct thread_t* th = (struct thread_t*)arg;
for (;;) {
event_wait(&th->ready);
event_reset(&th->ready);
execute_call(th->call);
__atomic_fetch_sub(&running, 1, __ATOMIC_RELAXED);
event_set(&th->done);
}
return 0;
}

static void execute_one(void)
{
int i, call, thread;
for (call = 0; call < 5; call++) {
for (thread = 0; thread < (int)(sizeof(threads) / sizeof(threads[0])); thread++) {
struct thread_t* th = &threads[thread];
if (!th->created) {
th->created = 1;
event_init(&th->ready);
event_init(&th->done);
event_set(&th->done);
thread_start(thr, th);
}
if (!event_isset(&th->done))
continue;
event_reset(&th->done);
th->call = call;
__atomic_fetch_add(&running, 1, __ATOMIC_RELAXED);
event_set(&th->ready);
if (call == 1 || call == 3)
break;
event_timedwait(&th->done, 50);
break;
}
}
for (i = 0; i < 100 && __atomic_load_n(&running, __ATOMIC_RELAXED); i++)
sleep_ms(1);
}

static void execute_one(void);

#define WAIT_FLAGS __WALL

static void loop(void)
{
int iter = 0;
for (;; iter++) {
int pid = fork();
if (pid < 0)
exit(1);
if (pid == 0) {
setup_test();
execute_one();
exit(0);
}
int status = 0;
uint64_t start = current_time_ms();
for (;;) {
if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
break;
sleep_ms(1);
if (current_time_ms() - start < 5000)
continue;
kill_and_wait(pid, &status);
break;
}
}
}

uint64_t r[2] = {0xffffffffffffffff, 0xffffffffffffffff};

void execute_call(int call)
{
intptr_t res = 0;
switch (call) {
case 0:
memcpy((void*)0x20000240, "/dev/cec#\000", 10);
res = -1;
res = syz_open_dev(/*dev=*/0x20000240, /*id=*/0, /*flags=*/0);
if (res != -1)
r[0] = res;
break;
case 1:
memcpy((void*)0x20000000, "\x11\x57\xfb\xfa", 4);
*(uint16_t*)0x20000004 = 7;
*(uint8_t*)0x20000006 = 5;
*(uint8_t*)0x20000007 = 0xd4;
*(uint32_t*)0x20000008 = 0xfffffffd;
*(uint32_t*)0x2000000c = 5;
memcpy((void*)0x20000010, "\x02\x00\x00\x00\x0a\x3a\xc8\xd3\xf6\x53\xfc\x5b\xfc\x63\xa9", 15);
memcpy((void*)0x2000001f, "\x8c\x35\x00\x00", 4);
memcpy((void*)0x20000023, "\x00\x00\xe6\x00", 4);
memcpy((void*)0x20000027, "\x51\x9a\x84\xf1", 4);
memcpy((void*)0x2000002b, "\x1d\x10\x00\x00\x00\xff\xff\xff\xff\xff\x00\x00", 12);
memcpy((void*)0x20000037, "\xb2\xc5\xd7\xe7\x05\x61\xfb\xc4\xc3\x9b\x59\x08", 12);
memcpy((void*)0x20000043, "\xd9\xf0\x86\x68\x55\x14\x14\x91\x1a\x90\xc0\x22", 12);
memcpy((void*)0x2000004f, "\x68\x22\xc4\xc1\x32\x2b\x54\x7b\x17\x35\x95\x92", 12);
syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc05c6104, /*arg=*/0x20000000ul);
break;
case 2:
memcpy((void*)0x200002c0, "/dev/cec#\000", 10);
res = -1;
res = syz_open_dev(/*dev=*/0x200002c0, /*id=*/0, /*flags=*/0);
if (res != -1)
r[1] = res;
break;
case 3:
memcpy((void*)0x20000100, "\x6b\x2d\xbb\xa0", 4);
*(uint16_t*)0x20000104 = 0;
*(uint8_t*)0x20000106 = 6;
*(uint8_t*)0x20000107 = 9;
*(uint32_t*)0x20000108 = 0;
*(uint32_t*)0x2000010c = 5;
memcpy((void*)0x20000110, "\x8b\x9f\xc0\xd5\xf0\x29\xb7\x8f\x8d\x31\xf6\x4a\xc9\x7f\x9d", 15);
memcpy((void*)0x2000011f, "\x28\xf2\x32\xc0", 4);
memcpy((void*)0x20000123, "\xef\xb2\xfc\xf5", 4);
memcpy((void*)0x20000127, "\x97\x54\x19\x73", 4);
memcpy((void*)0x2000012b, "\x5c\x72\xb1\x63\x43\x31\x7b\x0b\x23\xe1\x01\x16", 12);
memcpy((void*)0x20000137, "\xe9\xb6\xd0\xcf\xaa\x7c\xca\x88\x68\x4a\x58\x4d", 12);
memcpy((void*)0x20000143, "\x58\x6b\x8e\x57\xa6\xbe\x0d\xa0\xae\x2f\x27\xeb", 12);
memcpy((void*)0x2000014f, "\xcd\xa6\x34\xd3\xd5\x60\x66\x7b\xda\xc2\xe0\x46", 12);
syscall(__NR_ioctl, /*fd=*/r[1], /*cmd=*/0xc05c6104, /*arg=*/0x20000100ul);
break;
case 4:
memcpy((void*)0x20000080, "\x5d\xf5\x67\x6a", 4);
*(uint16_t*)0x20000084 = 0;
*(uint8_t*)0x20000086 = 0x81;
*(uint8_t*)0x20000087 = 0;
*(uint32_t*)0x20000088 = 0;
*(uint32_t*)0x2000008c = 0;
memcpy((void*)0x20000090, "\x78\x97\xc2\x95\x4c\xe3\x58\x81\xe0\x81\x0c\x52\x95\xad\x35", 15);
memcpy((void*)0x2000009f, "\x2b\xec\xff\xd0", 4);
memcpy((void*)0x200000a3, "\xb1\x95\xf6\x83", 4);
memcpy((void*)0x200000a7, "\xc9\x66\x79\x30", 4);
memcpy((void*)0x200000ab, "\xaf\x25\xae\x63\xa7\xd5\x6c\x95\x98\x77\x51\xc2", 12);
memcpy((void*)0x200000b7, "\xc5\x05\xa6\x53\x82\x3a\x80\x55\xdf\x68\x21\x05", 12);
memcpy((void*)0x200000c3, "\x7b\xb6\x44\x94\xee\x0d\xe6\x89\x01\x47\x6d\x55", 12);
memcpy((void*)0x200000cf, "\xf1\x6f\x1b\x89\x89\xa5\xe7\xab\xd9\x2d\xf4\xa4", 12);
syscall(__NR_ioctl, /*fd=*/r[1], /*cmd=*/0xc05c6104, /*arg=*/0x20000080ul);
break;
}

}
int main(void)
{
syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul, /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul, /*prot=*/7ul, /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul, /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
loop();
return 0;
}
```

Best,
Chenyuan

On 1/19/24, 2:17 AM, "Hans Verkuil" <[email protected] <mailto:[email protected]>> wrote:


Hi Chenyuan,


On 28/12/2023 03:33, Yang, Chenyuan wrote:
> Hello,
>
>
>
> We encountered 5 different crashes in the cec device by using our generated syscall specification for it, here are the descriptions of these 5 crashes and the related files are attached:
>
> 1. KASAN: slab-use-after-free Read in cec_queue_msg_fh (Reproducible)
>
> 2. WARNING: ODEBUG bug in cec_transmit_msg_fh
>
> 3. WARNING in cec_data_cancel
>
> 4. INFO: task hung in cec_claim_log_addrs (Reproducible)
>
> 5. general protection fault in cec_transmit_done_ts
>
>
>
> For “KASAN: slab-use-after-free Read in cec_queue_msg_fh”, we attached a syzkaller program to reproduce it. This crash is caused by ` list_add_tail(&entry->list, &fh->msgs);`
> (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L224__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxZMf3fVQ$ <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L224__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxZMf3fVQ$ >), which reads a
> variable freed by `kfree(fh);` (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-api.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-api.c>*L684__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxT0xaxsY$
> <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-api.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-api.c>*L684__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxT0xaxsY$ >). The reproducible program is a Syzkaller program, which can be executed following this document:
> https://urldefense.com/v3/__https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md__;!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZx32PwCDs$ <https://urldefense.com/v3/__https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md__;!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZx32PwCDs$> <https://urldefense.com/v3/__https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md__;!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZx32PwCDs$ <https://urldefense.com/v3/__https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md__;!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZx32PwCDs$> >.
>
>
>
> For “WARNING: ODEBUG bug in cec_transmit_msg_fh”, unfortunately we failed to reproduce it but we indeed trigger this crash almost every time when we fuzz the cec device only. We attached the report
> and log for this bug. It tries freeing an active object by using `kfree(data);` (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L930__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxhwnuzFw$
> <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L930__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxhwnuzFw$ >).
>
>
>
> For “WARNING in cec_data_cancel”, it is an internal warning used in cec_data_cancel (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L365__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxJ9Jw4fU$
> <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L365__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxJ9Jw4fU$ >), which checks whether the transmit is the current or pending. Unfortunately, we also don't have the
> reproducible program for this bug, but we attach the report and log.
>
>
>
> For “INFO: task hung in cec_claim_log_addrs”, the kernel hangs when the cec device ` wait_for_completion(&adap->config_completion);`
> (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L1579__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxKP44OE0$ <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L1579__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxKP44OE0$ >). We have a
> reproducible C program for this.
>
>
>
> For “general protection fault in cec_transmit_done_ts”, the cec device tries derefencing a non-canonical address 0xdffffc00000000e0: 0000 [#1], which is related to the invocation `
> cec_transmit_attempt_done_ts ` (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L697__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxGnBFZv0$
> <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L697__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxGnBFZv0$ >). It seems that the address of cec_adapter is totally wrong. We do not have a reproducible program for this
> bug, but the log and report for it are attached.
>
>
>
> If you have any questions or require more information, please feel free to contact us.


Can you retest with the patch below? I'm fairly certain this will fix issues 1 and 2.
I suspect at least some of the others are related to 1 & 2, but since I could never
get the reproducers working reliably, I had a hard time determining if there are more
bugs or if this patch resolves everything.


Your help testing this patch will be appreciated!


Regards,


Hans


Signed-off-by: Hans Verkuil <[email protected] <mailto:[email protected]>>
---
drivers/media/cec/core/cec-adap.c | 3 +--
drivers/media/cec/core/cec-api.c | 3 +++
2 files changed, 4 insertions(+), 2 deletions(-)


diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
index 5741adf09a2e..079c3b142d91 100644
--- a/drivers/media/cec/core/cec-adap.c
+++ b/drivers/media/cec/core/cec-adap.c
@@ -936,8 +936,7 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg,
*/
mutex_unlock(&adap->lock);
wait_for_completion_killable(&data->c);
- if (!data->completed)
- cancel_delayed_work_sync(&data->work);
+ cancel_delayed_work_sync(&data->work);
mutex_lock(&adap->lock);


/* Cancel the transmit if it was interrupted */
diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c
index 67dc79ef1705..d64bb716f9c6 100644
--- a/drivers/media/cec/core/cec-api.c
+++ b/drivers/media/cec/core/cec-api.c
@@ -664,6 +664,8 @@ static int cec_release(struct inode *inode, struct file *filp)
list_del_init(&data->xfer_list);
}
mutex_unlock(&adap->lock);
+
+ mutex_lock(&fh->lock);
while (!list_empty(&fh->msgs)) {
struct cec_msg_entry *entry =
list_first_entry(&fh->msgs, struct cec_msg_entry, list);
@@ -681,6 +683,7 @@ static int cec_release(struct inode *inode, struct file *filp)
kfree(entry);
}
}
+ mutex_unlock(&fh->lock);
kfree(fh);


cec_put_device(devnode);
--
2.42.0







2024-01-25 11:11:18

by Hans Verkuil

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

Hi Chenyuan,

On 24/01/2024 14:33, Yang, Chenyuan wrote:
> Hi Hans,
>
> Thanks for your prompt response!
>
> After applying the new patch, the system hang issue persists. I also tested with the latest Linux version, but the problem remains. The error displayed is 'INFO: task syz-executor372:16736 blocked for more than 143 seconds.' Could it be that the timeout setting for the CEC is too extensive, contributing to this hang?

Again, thank you for testing this.

After investigation I suspect the issue is elsewhere.

Can you test with the patch below instead?

Thank you!

Hans

Signed-off-by: Hans Verkuil <[email protected]>
---
diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
index 079c3b142d91..e5c86bc5ed93 100644
--- a/drivers/media/cec/core/cec-adap.c
+++ b/drivers/media/cec/core/cec-adap.c
@@ -1562,10 +1562,12 @@ static int cec_config_thread_func(void *arg)
cec_transmit_msg_fh(adap, &msg, NULL, false);
}
}
+ mutex_unlock(&adap->lock);
+ call_void_op(adap, configured);
+ mutex_lock(&adap->lock);
adap->kthread_config = NULL;
complete(&adap->config_completion);
mutex_unlock(&adap->lock);
- call_void_op(adap, configured);
return 0;

unconfigure:
@@ -1591,6 +1593,12 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
if (WARN_ON(adap->is_configuring || adap->is_configured))
return;

+ if (adap->kthread_config) {
+ mutex_unlock(&adap->lock);
+ wait_for_completion(&adap->config_completion);
+ mutex_lock(&adap->lock);
+ }
+
init_completion(&adap->config_completion);

/* Ready to kick off the thread */
@@ -1598,8 +1606,8 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
adap->kthread_config = kthread_run(cec_config_thread_func, adap,
"ceccfg-%s", adap->name);
if (IS_ERR(adap->kthread_config)) {
- adap->kthread_config = NULL;
adap->is_configuring = false;
+ adap->kthread_config = NULL;
} else if (block) {
mutex_unlock(&adap->lock);
wait_for_completion(&adap->config_completion);


2024-01-29 03:04:21

by Yang, Chenyuan

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

Hi Hans,

Thanks a lot for this new patch!

After applying this new patch in the latest kernel (hash: ecb1b8288dc7ccbdcb3b9df005fa1c0e0c0388a7) and fuzzing with Syzkaller, it seems that the hang still exists.
To help you better debug it, I attached the covered lines for the fuzz testing and the output of `git diff`. Hope this could help you.

By the way, the syscall descriptions for CEC have been merged into the Syzkaller mainstream: https://github.com/google/syzkaller/blob/master/sys/linux/dev_cec.txt.

Let me know if you need further information.

Best,
Chenyuan

On 1/19/24, 2:17 AM, "Hans Verkuil" <[email protected] <mailto:[email protected]>> wrote:


Hi Chenyuan,


On 28/12/2023 03:33, Yang, Chenyuan wrote:
> Hello,
>
>
>
> We encountered 5 different crashes in the cec device by using our generated syscall specification for it, here are the descriptions of these 5 crashes and the related files are attached:
>
> 1. KASAN: slab-use-after-free Read in cec_queue_msg_fh (Reproducible)
>
> 2. WARNING: ODEBUG bug in cec_transmit_msg_fh
>
> 3. WARNING in cec_data_cancel
>
> 4. INFO: task hung in cec_claim_log_addrs (Reproducible)
>
> 5. general protection fault in cec_transmit_done_ts
>
>
>
> For “KASAN: slab-use-after-free Read in cec_queue_msg_fh”, we attached a syzkaller program to reproduce it. This crash is caused by ` list_add_tail(&entry->list, &fh->msgs);`
> (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L224__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxZMf3fVQ$ <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L224__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxZMf3fVQ$ >), which reads a
> variable freed by `kfree(fh);` (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-api.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-api.c>*L684__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxT0xaxsY$
> <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-api.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-api.c>*L684__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxT0xaxsY$ >). The reproducible program is a Syzkaller program, which can be executed following this document:
> https://urldefense.com/v3/__https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md__;!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZx32PwCDs$ <https://urldefense.com/v3/__https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md__;!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZx32PwCDs$> <https://urldefense.com/v3/__https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md__;!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZx32PwCDs$ <https://urldefense.com/v3/__https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md__;!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZx32PwCDs$> >.
>
>
>
> For “WARNING: ODEBUG bug in cec_transmit_msg_fh”, unfortunately we failed to reproduce it but we indeed trigger this crash almost every time when we fuzz the cec device only. We attached the report
> and log for this bug. It tries freeing an active object by using `kfree(data);` (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L930__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxhwnuzFw$
> <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L930__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxhwnuzFw$ >).
>
>
>
> For “WARNING in cec_data_cancel”, it is an internal warning used in cec_data_cancel (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L365__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxJ9Jw4fU$
> <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L365__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxJ9Jw4fU$ >), which checks whether the transmit is the current or pending. Unfortunately, we also don't have the
> reproducible program for this bug, but we attach the report and log.
>
>
>
> For “INFO: task hung in cec_claim_log_addrs”, the kernel hangs when the cec device ` wait_for_completion(&adap->config_completion);`
> (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L1579__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxKP44OE0$ <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L1579__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxKP44OE0$ >). We have a
> reproducible C program for this.
>
>
>
> For “general protection fault in cec_transmit_done_ts”, the cec device tries derefencing a non-canonical address 0xdffffc00000000e0: 0000 [#1], which is related to the invocation `
> cec_transmit_attempt_done_ts ` (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L697__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxGnBFZv0$
> <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.7-rc7/source/drivers/media/cec/core/cec-adap.c>*L697__;Iw!!DZ3fjg!9_O4Tm7W1dKV8lXOcDFUTmIqAd6eUmsffQg3gwvypxBR3WFuQkIlRr2vAsIpwMt7lt86UlzdOTV_jBaVO8pkIiZxGnBFZv0$ >). It seems that the address of cec_adapter is totally wrong. We do not have a reproducible program for this
> bug, but the log and report for it are attached.
>
>
>
> If you have any questions or require more information, please feel free to contact us.


Can you retest with the patch below? I'm fairly certain this will fix issues 1 and 2.
I suspect at least some of the others are related to 1 & 2, but since I could never
get the reproducers working reliably, I had a hard time determining if there are more
bugs or if this patch resolves everything.


Your help testing this patch will be appreciated!


Regards,


Hans


Signed-off-by: Hans Verkuil <[email protected] <mailto:[email protected]>>
---
drivers/media/cec/core/cec-adap.c | 3 +--
drivers/media/cec/core/cec-api.c | 3 +++
2 files changed, 4 insertions(+), 2 deletions(-)


diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
index 5741adf09a2e..079c3b142d91 100644
--- a/drivers/media/cec/core/cec-adap.c
+++ b/drivers/media/cec/core/cec-adap.c
@@ -936,8 +936,7 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg,
*/
mutex_unlock(&adap->lock);
wait_for_completion_killable(&data->c);
- if (!data->completed)
- cancel_delayed_work_sync(&data->work);
+ cancel_delayed_work_sync(&data->work);
mutex_lock(&adap->lock);


/* Cancel the transmit if it was interrupted */
diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c
index 67dc79ef1705..d64bb716f9c6 100644
--- a/drivers/media/cec/core/cec-api.c
+++ b/drivers/media/cec/core/cec-api.c
@@ -664,6 +664,8 @@ static int cec_release(struct inode *inode, struct file *filp)
list_del_init(&data->xfer_list);
}
mutex_unlock(&adap->lock);
+
+ mutex_lock(&fh->lock);
while (!list_empty(&fh->msgs)) {
struct cec_msg_entry *entry =
list_first_entry(&fh->msgs, struct cec_msg_entry, list);
@@ -681,6 +683,7 @@ static int cec_release(struct inode *inode, struct file *filp)
kfree(entry);
}
}
+ mutex_unlock(&fh->lock);
kfree(fh);


cec_put_device(devnode);
--
2.42.0








Attachments:
cov_lines.txt (129.08 kB)
cov_lines.txt
diff.patch (2.59 kB)
diff.patch
Download all attachments

2024-01-30 15:08:51

by Hans Verkuil

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

On 29/01/2024 04:03, Yang, Chenyuan wrote:
> Hi Hans,
>
> Thanks a lot for this new patch!
>
> After applying this new patch in the latest kernel (hash: ecb1b8288dc7ccbdcb3b9df005fa1c0e0c0388a7) and fuzzing with Syzkaller, it seems that the hang still exists.
> To help you better debug it, I attached the covered lines for the fuzz testing and the output of `git diff`. Hope this could help you.
>
> By the way, the syscall descriptions for CEC have been merged into the Syzkaller mainstream: https://github.com/google/syzkaller/blob/master/sys/linux/dev_cec.txt.
>
> Let me know if you need further information.
>
> Best,
> Chenyuan

Here is another patch. This now time outs on all wait_for_completion calls
and reports a WARN_ON and shows additional info. Hopefully this will give me
better insight into what is going on.

Unfortunately I was unable to reproduce this issue on my VM, so I have to
rely on you to run the test.

Regards,

Hans

[PATCH] Test

Signed-off-by: Hans Verkuil <[email protected]>
---
diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
index 5741adf09a2e..b1951eb7f5bd 100644
--- a/drivers/media/cec/core/cec-adap.c
+++ b/drivers/media/cec/core/cec-adap.c
@@ -935,9 +935,12 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg,
* Release the lock and wait, retake the lock afterwards.
*/
mutex_unlock(&adap->lock);
- wait_for_completion_killable(&data->c);
- if (!data->completed)
- cancel_delayed_work_sync(&data->work);
+ if (WARN_ON(wait_for_completion_killable_timeout(&data->c, msecs_to_jiffies(adap->xfer_timeout_ms + 1000)) <= 0)) {
+ dprintk(0, "wfc1: %px %d%d%d%d %x\n", adap->kthread_config,
+ adap->is_configuring, adap->is_configured,
+ adap->is_enabled, adap->must_reconfigure, adap->phys_addr);
+ }
+ cancel_delayed_work_sync(&data->work);
mutex_lock(&adap->lock);

/* Cancel the transmit if it was interrupted */
@@ -1563,10 +1566,12 @@ static int cec_config_thread_func(void *arg)
cec_transmit_msg_fh(adap, &msg, NULL, false);
}
}
+ mutex_unlock(&adap->lock);
+ call_void_op(adap, configured);
+ mutex_lock(&adap->lock);
adap->kthread_config = NULL;
complete(&adap->config_completion);
mutex_unlock(&adap->lock);
- call_void_op(adap, configured);
return 0;

unconfigure:
@@ -1592,6 +1597,17 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
if (WARN_ON(adap->is_configuring || adap->is_configured))
return;

+ if (adap->kthread_config) {
+ mutex_unlock(&adap->lock);
+// wait_for_completion(&adap->config_completion);
+ if (WARN_ON(wait_for_completion_killable_timeout(&adap->config_completion, msecs_to_jiffies(10000)) <= 0)) {
+ dprintk(0, "wfc2: %px %d%d%d%d %x\n", adap->kthread_config,
+ adap->is_configuring, adap->is_configured,
+ adap->is_enabled, adap->must_reconfigure, adap->phys_addr);
+ }
+ mutex_lock(&adap->lock);
+ }
+
init_completion(&adap->config_completion);

/* Ready to kick off the thread */
@@ -1599,11 +1615,17 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
adap->kthread_config = kthread_run(cec_config_thread_func, adap,
"ceccfg-%s", adap->name);
if (IS_ERR(adap->kthread_config)) {
- adap->kthread_config = NULL;
adap->is_configuring = false;
+ adap->kthread_config = NULL;
} else if (block) {
mutex_unlock(&adap->lock);
- wait_for_completion(&adap->config_completion);
+ //wait_for_completion(&adap->config_completion);
+ if (WARN_ON(wait_for_completion_killable_timeout(&adap->config_completion, msecs_to_jiffies(10000)) <= 0)) {
+ dprintk(0, "wfc3: %px %d%d%d%d %x\n", adap->kthread_config,
+ adap->is_configuring, adap->is_configured,
+ adap->is_enabled, adap->must_reconfigure, adap->phys_addr);
+
+ }
mutex_lock(&adap->lock);
}
}
diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c
index 67dc79ef1705..d64bb716f9c6 100644
--- a/drivers/media/cec/core/cec-api.c
+++ b/drivers/media/cec/core/cec-api.c
@@ -664,6 +664,8 @@ static int cec_release(struct inode *inode, struct file *filp)
list_del_init(&data->xfer_list);
}
mutex_unlock(&adap->lock);
+
+ mutex_lock(&fh->lock);
while (!list_empty(&fh->msgs)) {
struct cec_msg_entry *entry =
list_first_entry(&fh->msgs, struct cec_msg_entry, list);
@@ -681,6 +683,7 @@ static int cec_release(struct inode *inode, struct file *filp)
kfree(entry);
}
}
+ mutex_unlock(&fh->lock);
kfree(fh);

cec_put_device(devnode);


2024-02-12 15:08:43

by Hans Verkuil

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

Hi Chenyuan,

On 30/01/2024 15:35, Hans Verkuil wrote:
> On 29/01/2024 04:03, Yang, Chenyuan wrote:
>> Hi Hans,
>>
>> Thanks a lot for this new patch!
>>
>> After applying this new patch in the latest kernel (hash: ecb1b8288dc7ccbdcb3b9df005fa1c0e0c0388a7) and fuzzing with Syzkaller, it seems that the hang still exists.
>> To help you better debug it, I attached the covered lines for the fuzz testing and the output of `git diff`. Hope this could help you.
>>
>> By the way, the syscall descriptions for CEC have been merged into the Syzkaller mainstream: https://github.com/google/syzkaller/blob/master/sys/linux/dev_cec.txt.
>>
>> Let me know if you need further information.
>>
>> Best,
>> Chenyuan
>
> Here is another patch. This now time outs on all wait_for_completion calls
> and reports a WARN_ON and shows additional info. Hopefully this will give me
> better insight into what is going on.
>
> Unfortunately I was unable to reproduce this issue on my VM, so I have to
> rely on you to run the test.

Did you have time to run the test with this patch? It would be very useful to
see the results.

Regards,

Hans

>
> Regards,
>
> Hans
>
> [PATCH] Test
>
> Signed-off-by: Hans Verkuil <[email protected]>
> ---
> diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
> index 5741adf09a2e..b1951eb7f5bd 100644
> --- a/drivers/media/cec/core/cec-adap.c
> +++ b/drivers/media/cec/core/cec-adap.c
> @@ -935,9 +935,12 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg,
> * Release the lock and wait, retake the lock afterwards.
> */
> mutex_unlock(&adap->lock);
> - wait_for_completion_killable(&data->c);
> - if (!data->completed)
> - cancel_delayed_work_sync(&data->work);
> + if (WARN_ON(wait_for_completion_killable_timeout(&data->c, msecs_to_jiffies(adap->xfer_timeout_ms + 1000)) <= 0)) {
> + dprintk(0, "wfc1: %px %d%d%d%d %x\n", adap->kthread_config,
> + adap->is_configuring, adap->is_configured,
> + adap->is_enabled, adap->must_reconfigure, adap->phys_addr);
> + }
> + cancel_delayed_work_sync(&data->work);
> mutex_lock(&adap->lock);
>
> /* Cancel the transmit if it was interrupted */
> @@ -1563,10 +1566,12 @@ static int cec_config_thread_func(void *arg)
> cec_transmit_msg_fh(adap, &msg, NULL, false);
> }
> }
> + mutex_unlock(&adap->lock);
> + call_void_op(adap, configured);
> + mutex_lock(&adap->lock);
> adap->kthread_config = NULL;
> complete(&adap->config_completion);
> mutex_unlock(&adap->lock);
> - call_void_op(adap, configured);
> return 0;
>
> unconfigure:
> @@ -1592,6 +1597,17 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
> if (WARN_ON(adap->is_configuring || adap->is_configured))
> return;
>
> + if (adap->kthread_config) {
> + mutex_unlock(&adap->lock);
> +// wait_for_completion(&adap->config_completion);
> + if (WARN_ON(wait_for_completion_killable_timeout(&adap->config_completion, msecs_to_jiffies(10000)) <= 0)) {
> + dprintk(0, "wfc2: %px %d%d%d%d %x\n", adap->kthread_config,
> + adap->is_configuring, adap->is_configured,
> + adap->is_enabled, adap->must_reconfigure, adap->phys_addr);
> + }
> + mutex_lock(&adap->lock);
> + }
> +
> init_completion(&adap->config_completion);
>
> /* Ready to kick off the thread */
> @@ -1599,11 +1615,17 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
> adap->kthread_config = kthread_run(cec_config_thread_func, adap,
> "ceccfg-%s", adap->name);
> if (IS_ERR(adap->kthread_config)) {
> - adap->kthread_config = NULL;
> adap->is_configuring = false;
> + adap->kthread_config = NULL;
> } else if (block) {
> mutex_unlock(&adap->lock);
> - wait_for_completion(&adap->config_completion);
> + //wait_for_completion(&adap->config_completion);
> + if (WARN_ON(wait_for_completion_killable_timeout(&adap->config_completion, msecs_to_jiffies(10000)) <= 0)) {
> + dprintk(0, "wfc3: %px %d%d%d%d %x\n", adap->kthread_config,
> + adap->is_configuring, adap->is_configured,
> + adap->is_enabled, adap->must_reconfigure, adap->phys_addr);
> +
> + }
> mutex_lock(&adap->lock);
> }
> }
> diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c
> index 67dc79ef1705..d64bb716f9c6 100644
> --- a/drivers/media/cec/core/cec-api.c
> +++ b/drivers/media/cec/core/cec-api.c
> @@ -664,6 +664,8 @@ static int cec_release(struct inode *inode, struct file *filp)
> list_del_init(&data->xfer_list);
> }
> mutex_unlock(&adap->lock);
> +
> + mutex_lock(&fh->lock);
> while (!list_empty(&fh->msgs)) {
> struct cec_msg_entry *entry =
> list_first_entry(&fh->msgs, struct cec_msg_entry, list);
> @@ -681,6 +683,7 @@ static int cec_release(struct inode *inode, struct file *filp)
> kfree(entry);
> }
> }
> + mutex_unlock(&fh->lock);
> kfree(fh);
>
> cec_put_device(devnode);
>


2024-02-13 16:47:29

by Yang, Chenyuan

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

Hi Hans,

Here is the output for the first warning and its C reproducer is attached:

```
[ 69.031655][ T7985] sshd (7985) used greatest stack depth: 21808 bytes left
[ 135.592879][ T8039] cec-vivid-000-vid-out0: wfc1: 0000000000000000 0110 1000
[ 135.592987][ T8040] cec-vivid-001-vid-out0: wfc1: 0000000000000000 0110 1000
```

Best,
Chenyuan

On 2/13/24, 9:40 AM, "[email protected] <mailto:[email protected]> on behalf of Yang, Chenyuan" <[email protected] <mailto:[email protected]> on behalf of [email protected] <mailto:[email protected]>> wrote:


Hi Hans,


Sorry to reply so late.


Here is the reproducible C program that could trigger the following debug warning:
```
if ((wait_for_completion_killable_timeout(&adap->config_completion, msecs_to_jiffies(10000)) <= 0)) {
dprintk(0, "wfc3: %px %d%d%d%d %x\n", adap->kthread_config,
adap->is_configuring, adap->is_configured,
adap->is_enabled, adap->must_reconfigure, adap->phys_addr);
WARN_ON(1);
}
```


Output:


```
[ 2147.996471][ T29] kauditd_printk_skb: 2 callbacks suppressed
[ 2147.996480][ T29] audit: type=1804 audit(1707782435.859:14): pid=1281266 uid=0 auid=4290
[ 2148.025365][ T29] audit: type=1804 audit(1707782435.879:15): pid=1281266 uid=0 auid=4290
[54569.994174][T772636] cec-vivid-002-vid-cap0: wfc3: 0000000000000000 0110 0


[54569.995754][T772636] WARNING: CPU: 0 PID: 772636 at drivers/media/cec/core/cec-adap.c:1620
[54569.996578][T772636] Modules linked in:
[54569.997066][T772636] CPU: 0 PID: 772636 Comm: exe Not tainted 6.8.0-rc1-00169-gecb1b8288d5
[54569.997804][T772636] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-14
[54569.998416][T772636] RIP: 0010:cec_claim_log_addrs+0x29b/0x7c0
[54569.998830][T772636] Code: 7c 08 84 d2 0f 85 50 04 00 00 44 8b 25 ee 33 9f 0b 31 ff 44 89f
[54570.000124][T772636] RSP: 0018:ffffc90002117b30 EFLAGS: 00010293
[54570.000549][T772636] RAX: 0000000000000000 RBX: ffff88801d6a0000 RCX: ffffffff816a1959
[54570.001086][T772636] RDX: ffff888028e4b900 RSI: ffffffff8710b1ca RDI: 0000000000000005
[54570.001621][T772636] RBP: ffff88801d6a0638 R08: 0000000000000005 R09: 0000000000000000
[54570.002152][T772636] R10: 0000000080000000 R11: 0000000000000001 R12: 0000000000000000
[54570.002684][T772636] R13: ffff88801d6a07da R14: 0000000000000000 R15: 0000000000000001
[54570.004825][T772636] FS: 00007fee0aee1700(0000) GS:ffff88802ca00000(0000) knlGS:000000000
[54570.005445][T772636] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[54570.005893][T772636] CR2: 00007ffe793b19f0 CR3: 000000002130a000 CR4: 00000000000006f0
[54570.006430][T772636] Call Trace:
[54570.006660][T772636] <TASK>
[54570.006871][T772636] ? show_regs+0x96/0xa0
[54570.007197][T772636] ? __warn+0xe6/0x390
[54570.007506][T772636] ? cec_claim_log_addrs+0x29b/0x7c0
[54570.007875][T772636] ? report_bug+0x2dd/0x500
[54570.008203][T772636] ? cec_claim_log_addrs+0x29b/0x7c0
[54570.008573][T772636] ? handle_bug+0x99/0x120
[54570.008891][T772636] ? exc_invalid_op+0x36/0x80
[54570.009220][T772636] ? asm_exc_invalid_op+0x1a/0x20
[54570.009584][T772636] ? __wake_up_klogd.part.0+0x99/0xf0
[54570.009962][T772636] ? cec_claim_log_addrs+0x29a/0x7c0
[54570.010344][T772636] ? cec_claim_log_addrs+0x29b/0x7c0
[54570.010718][T772636] ? cec_adap_enable+0x534/0xbd0
[54570.011065][T772636] __cec_s_log_addrs+0xdfc/0x16e0
[54570.011437][T772636] cec_ioctl+0x1e7c/0x2690
[54570.011778][T772636] ? cec_release+0xbb0/0xbb0
[54570.012107][T772636] ? tomoyo_execute_permission+0x4a0/0x4a0
[54570.012520][T772636] ? __sanitizer_cov_trace_switch+0x54/0x90
[54570.012938][T772636] ? do_vfs_ioctl+0x138/0x16c0
[54570.013493][T772636] ? vfs_fileattr_set+0xc40/0xc40
[54570.013954][T772636] ? lock_downgrade+0x6a0/0x6a0
[54570.014422][T772636] ? bpf_lsm_file_ioctl+0x9/0x10
[54570.014791][T772636] ? cec_release+0xbb0/0xbb0
[54570.015117][T772636] __x64_sys_ioctl+0x19d/0x210
[54570.015466][T772636] do_syscall_64+0xd2/0x250
[54570.015785][T772636] entry_SYSCALL_64_after_hwframe+0x63/0x6b
[54570.016191][T772636] RIP: 0033:0x7fee0affaf29
[54570.016502][T772636] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 488
[54570.017779][T772636] RSP: 002b:00007fee0aee0e98 EFLAGS: 00000202 ORIG_RAX: 000000000000000
[54570.018343][T772636] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fee0affaf29
[54570.018872][T772636] RDX: 0000000020000680 RSI: 00000000c05c6104 RDI: 0000000000000004
[54570.019411][T772636] RBP: 00007fee0aee0ec0 R08: 0000000000000000 R09: 0000000000000000
[54570.019941][T772636] R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffe792f9dce
[54570.020475][T772636] R13: 00007ffe792f9dcf R14: 00007fee0aee0fc0 R15: 0000000000022000
[54570.021013][T772636] </TASK>
[54570.021234][T772636] Kernel panic - not syncing: kernel: panic_on_warn set ...
[54570.021720][T772636] CPU: 0 PID: 772636 Comm: exe Not tainted 6.8.0-rc1-00169-gecb1b8288d5
[54570.022329][T772636] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-14
[54570.022956][T772636] Call Trace:
[54570.023192][T772636] <TASK>
[54570.023412][T772636] dump_stack_lvl+0xd9/0x150
[54570.023741][T772636] panic+0x6b9/0x760
[54570.024022][T772636] ? panic_smp_self_stop+0xa0/0xa0
[54570.024388][T772636] ? check_panic_on_warn+0x1f/0xc0
[54570.024746][T772636] ? cec_claim_log_addrs+0x29b/0x7c0
[54570.025111][T772636] check_panic_on_warn+0xb1/0xc0
[54570.025459][T772636] __warn+0xf2/0x390
[54570.025740][T772636] ? cec_claim_log_addrs+0x29b/0x7c0
[54570.026109][T772636] report_bug+0x2dd/0x500
[54570.026418][T772636] ? cec_claim_log_addrs+0x29b/0x7c0
[54570.026788][T772636] handle_bug+0x99/0x120
[54570.027085][T772636] exc_invalid_op+0x36/0x80
[54570.027408][T772636] asm_exc_invalid_op+0x1a/0x20
[54570.027748][T772636] RIP: 0010:cec_claim_log_addrs+0x29b/0x7c0
[54570.028154][T772636] Code: 7c 08 84 d2 0f 85 50 04 00 00 44 8b 25 ee 33 9f 0b 31 ff 44 89f
[54570.029431][T772636] RSP: 0018:ffffc90002117b30 EFLAGS: 00010293
[54570.029847][T772636] RAX: 0000000000000000 RBX: ffff88801d6a0000 RCX: ffffffff816a1959
[54570.030380][T772636] RDX: ffff888028e4b900 RSI: ffffffff8710b1ca RDI: 0000000000000005
[54570.030907][T772636] RBP: ffff88801d6a0638 R08: 0000000000000005 R09: 0000000000000000
[54570.031443][T772636] R10: 0000000080000000 R11: 0000000000000001 R12: 0000000000000000
[54570.031972][T772636] R13: ffff88801d6a07da R14: 0000000000000000 R15: 0000000000000001
[54570.032507][T772636] ? __wake_up_klogd.part.0+0x99/0xf0
[54570.032896][T772636] ? cec_claim_log_addrs+0x29a/0x7c0
[54570.033271][T772636] ? cec_adap_enable+0x534/0xbd0
[54570.033620][T772636] __cec_s_log_addrs+0xdfc/0x16e0
[54570.033978][T772636] cec_ioctl+0x1e7c/0x2690
[54570.034296][T772636] ? cec_release+0xbb0/0xbb0
[54570.034623][T772636] ? tomoyo_execute_permission+0x4a0/0x4a0
[54570.035030][T772636] ? __sanitizer_cov_trace_switch+0x54/0x90
[54570.035447][T772636] ? do_vfs_ioctl+0x138/0x16c0
[54570.035782][T772636] ? vfs_fileattr_set+0xc40/0xc40
[54570.036138][T772636] ? lock_downgrade+0x6a0/0x6a0
[54570.036487][T772636] ? bpf_lsm_file_ioctl+0x9/0x10
[54570.036835][T772636] ? cec_release+0xbb0/0xbb0
[54570.037159][T772636] __x64_sys_ioctl+0x19d/0x210
[54570.037497][T772636] do_syscall_64+0xd2/0x250
[54570.037817][T772636] entry_SYSCALL_64_after_hwframe+0x63/0x6b
```


I will collect the programs to trigger another 2 warnings and send you soon.


Best,
Chenyuan


On 2/12/24, 8:42 AM, "Hans Verkuil" <[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>> wrote:




Hi Chenyuan,




On 30/01/2024 15:35, Hans Verkuil wrote:
> On 29/01/2024 04:03, Yang, Chenyuan wrote:
>> Hi Hans,
>>
>> Thanks a lot for this new patch!
>>
>> After applying this new patch in the latest kernel (hash: ecb1b8288dc7ccbdcb3b9df005fa1c0e0c0388a7) and fuzzing with Syzkaller, it seems that the hang still exists.
>> To help you better debug it, I attached the covered lines for the fuzz testing and the output of `git diff`. Hope this could help you.
>>
>> By the way, the syscall descriptions for CEC have been merged into the Syzkaller mainstream: https://urldefense.com/v3/__https://github.com/google/syzkaller/blob/master/sys/linux/dev_cec.txt__;!!DZ3fjg!8zd76_aCk9k_5Rn-mW78tk9IyhZUYovPw2SF9v6Pd5Tof8hm8qocM_NFBwbt0oRdAIH5uE8Ql8ysaoTJrMsn-dbvM9Xwwoo$ <https://urldefense.com/v3/__https://github.com/google/syzkaller/blob/master/sys/linux/dev_cec.txt__;!!DZ3fjg!8zd76_aCk9k_5Rn-mW78tk9IyhZUYovPw2SF9v6Pd5Tof8hm8qocM_NFBwbt0oRdAIH5uE8Ql8ysaoTJrMsn-dbvM9Xwwoo$> <https://urldefense.com/v3/__https://github.com/google/syzkaller/blob/master/sys/linux/dev_cec.txt__;!!DZ3fjg!8zd76_aCk9k_5Rn-mW78tk9IyhZUYovPw2SF9v6Pd5Tof8hm8qocM_NFBwbt0oRdAIH5uE8Ql8ysaoTJrMsn-dbvM9Xwwoo$ <https://urldefense.com/v3/__https://github.com/google/syzkaller/blob/master/sys/linux/dev_cec.txt__;!!DZ3fjg!8zd76_aCk9k_5Rn-mW78tk9IyhZUYovPw2SF9v6Pd5Tof8hm8qocM_NFBwbt0oRdAIH5uE8Ql8ysaoTJrMsn-dbvM9Xwwoo$>> .


>>
>> Let me know if you need further information.
>>
>> Best,
>> Chenyuan
>
> Here is another patch. This now time outs on all wait_for_completion calls
> and reports a WARN_ON and shows additional info. Hopefully this will give me
> better insight into what is going on.
>
> Unfortunately I was unable to reproduce this issue on my VM, so I have to
> rely on you to run the test.




Did you have time to run the test with this patch? It would be very useful to
see the results.




Regards,




Hans




>
> Regards,
>
> Hans
>
> [PATCH] Test
>
> Signed-off-by: Hans Verkuil <[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>>
> ---
> diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
> index 5741adf09a2e..b1951eb7f5bd 100644
> --- a/drivers/media/cec/core/cec-adap.c
> +++ b/drivers/media/cec/core/cec-adap.c
> @@ -935,9 +935,12 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg,
> * Release the lock and wait, retake the lock afterwards.
> */
> mutex_unlock(&adap->lock);
> - wait_for_completion_killable(&data->c);
> - if (!data->completed)
> - cancel_delayed_work_sync(&data->work);
> + if (WARN_ON(wait_for_completion_killable_timeout(&data->c, msecs_to_jiffies(adap->xfer_timeout_ms + 1000)) <= 0)) {
> + dprintk(0, "wfc1: %px %d%d%d%d %x\n", adap->kthread_config,
> + adap->is_configuring, adap->is_configured,
> + adap->is_enabled, adap->must_reconfigure, adap->phys_addr);
> + }
> + cancel_delayed_work_sync(&data->work);
> mutex_lock(&adap->lock);
>
> /* Cancel the transmit if it was interrupted */
> @@ -1563,10 +1566,12 @@ static int cec_config_thread_func(void *arg)
> cec_transmit_msg_fh(adap, &msg, NULL, false);
> }
> }
> + mutex_unlock(&adap->lock);
> + call_void_op(adap, configured);
> + mutex_lock(&adap->lock);
> adap->kthread_config = NULL;
> complete(&adap->config_completion);
> mutex_unlock(&adap->lock);
> - call_void_op(adap, configured);
> return 0;
>
> unconfigure:
> @@ -1592,6 +1597,17 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
> if (WARN_ON(adap->is_configuring || adap->is_configured))
> return;
>
> + if (adap->kthread_config) {
> + mutex_unlock(&adap->lock);
> +// wait_for_completion(&adap->config_completion);
> + if (WARN_ON(wait_for_completion_killable_timeout(&adap->config_completion, msecs_to_jiffies(10000)) <= 0)) {
> + dprintk(0, "wfc2: %px %d%d%d%d %x\n", adap->kthread_config,
> + adap->is_configuring, adap->is_configured,
> + adap->is_enabled, adap->must_reconfigure, adap->phys_addr);
> + }
> + mutex_lock(&adap->lock);
> + }
> +
> init_completion(&adap->config_completion);
>
> /* Ready to kick off the thread */
> @@ -1599,11 +1615,17 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
> adap->kthread_config = kthread_run(cec_config_thread_func, adap,
> "ceccfg-%s", adap->name);
> if (IS_ERR(adap->kthread_config)) {
> - adap->kthread_config = NULL;
> adap->is_configuring = false;
> + adap->kthread_config = NULL;
> } else if (block) {
> mutex_unlock(&adap->lock);
> - wait_for_completion(&adap->config_completion);
> + //wait_for_completion(&adap->config_completion);
> + if (WARN_ON(wait_for_completion_killable_timeout(&adap->config_completion, msecs_to_jiffies(10000)) <= 0)) {
> + dprintk(0, "wfc3: %px %d%d%d%d %x\n", adap->kthread_config,
> + adap->is_configuring, adap->is_configured,
> + adap->is_enabled, adap->must_reconfigure, adap->phys_addr);
> +
> + }
> mutex_lock(&adap->lock);
> }
> }
> diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c
> index 67dc79ef1705..d64bb716f9c6 100644
> --- a/drivers/media/cec/core/cec-api.c
> +++ b/drivers/media/cec/core/cec-api.c
> @@ -664,6 +664,8 @@ static int cec_release(struct inode *inode, struct file *filp)
> list_del_init(&data->xfer_list);
> }
> mutex_unlock(&adap->lock);
> +
> + mutex_lock(&fh->lock);
> while (!list_empty(&fh->msgs)) {
> struct cec_msg_entry *entry =
> list_first_entry(&fh->msgs, struct cec_msg_entry, list);
> @@ -681,6 +683,7 @@ static int cec_release(struct inode *inode, struct file *filp)
> kfree(entry);
> }
> }
> + mutex_unlock(&fh->lock);
> kfree(fh);
>
> cec_put_device(devnode);
>










--
You received this message because you are subscribed to a topic in the Google Groups "syzkaller" group.
To unsubscribe from this topic, visit https://urldefense.com/v3/__https://groups.google.com/d/topic/syzkaller/wAHwQl5L8xk/unsubscribe__;!!DZ3fjg!6Zy8Roz73ycXIsgxej-lC_SC2_I1wXfZiXEsWChUX398JijiXc0iDFWD0EKyKx-ZrtWsc-zeTKq7SfvIzA$ <https://urldefense.com/v3/__https://groups.google.com/d/topic/syzkaller/wAHwQl5L8xk/unsubscribe__;!!DZ3fjg!6Zy8Roz73ycXIsgxej-lC_SC2_I1wXfZiXEsWChUX398JijiXc0iDFWD0EKyKx-ZrtWsc-zeTKq7SfvIzA$> .
To unsubscribe from this group and all its topics, send an email to [email protected] <mailto:[email protected]>.
To view this discussion on the web visit https://urldefense.com/v3/__https://groups.google.com/d/msgid/syzkaller/F8D4A291-8CFB-4A25-B296-3CA07B56F459 <https://urldefense.com/v3/__https://groups.google.com/d/msgid/syzkaller/F8D4A291-8CFB-4A25-B296-3CA07B56F459>*40illinois.edu__;JQ!!DZ3fjg!6Zy8Roz73ycXIsgxej-lC_SC2_I1wXfZiXEsWChUX398JijiXc0iDFWD0EKyKx-ZrtWsc-zeTKpauJqakQ$ .




Attachments:
cec-warn-1.c (7.95 kB)
cec-warn-1.c

2024-02-13 17:08:16

by Yang, Chenyuan

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

Hi Hans,

Sorry to reply so late.

Here is the reproducible C program that could trigger the following debug warning:
```
if ((wait_for_completion_killable_timeout(&adap->config_completion, msecs_to_jiffies(10000)) <= 0)) {
dprintk(0, "wfc3: %px %d%d%d%d %x\n", adap->kthread_config,
adap->is_configuring, adap->is_configured,
adap->is_enabled, adap->must_reconfigure, adap->phys_addr);
WARN_ON(1);
}
```

Output:

```
[ 2147.996471][ T29] kauditd_printk_skb: 2 callbacks suppressed
[ 2147.996480][ T29] audit: type=1804 audit(1707782435.859:14): pid=1281266 uid=0 auid=4290
[ 2148.025365][ T29] audit: type=1804 audit(1707782435.879:15): pid=1281266 uid=0 auid=4290
[54569.994174][T772636] cec-vivid-002-vid-cap0: wfc3: 0000000000000000 0110 0

[54569.995754][T772636] WARNING: CPU: 0 PID: 772636 at drivers/media/cec/core/cec-adap.c:1620
[54569.996578][T772636] Modules linked in:
[54569.997066][T772636] CPU: 0 PID: 772636 Comm: exe Not tainted 6.8.0-rc1-00169-gecb1b8288d5
[54569.997804][T772636] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-14
[54569.998416][T772636] RIP: 0010:cec_claim_log_addrs+0x29b/0x7c0
[54569.998830][T772636] Code: 7c 08 84 d2 0f 85 50 04 00 00 44 8b 25 ee 33 9f 0b 31 ff 44 89f
[54570.000124][T772636] RSP: 0018:ffffc90002117b30 EFLAGS: 00010293
[54570.000549][T772636] RAX: 0000000000000000 RBX: ffff88801d6a0000 RCX: ffffffff816a1959
[54570.001086][T772636] RDX: ffff888028e4b900 RSI: ffffffff8710b1ca RDI: 0000000000000005
[54570.001621][T772636] RBP: ffff88801d6a0638 R08: 0000000000000005 R09: 0000000000000000
[54570.002152][T772636] R10: 0000000080000000 R11: 0000000000000001 R12: 0000000000000000
[54570.002684][T772636] R13: ffff88801d6a07da R14: 0000000000000000 R15: 0000000000000001
[54570.004825][T772636] FS: 00007fee0aee1700(0000) GS:ffff88802ca00000(0000) knlGS:000000000
[54570.005445][T772636] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[54570.005893][T772636] CR2: 00007ffe793b19f0 CR3: 000000002130a000 CR4: 00000000000006f0
[54570.006430][T772636] Call Trace:
[54570.006660][T772636] <TASK>
[54570.006871][T772636] ? show_regs+0x96/0xa0
[54570.007197][T772636] ? __warn+0xe6/0x390
[54570.007506][T772636] ? cec_claim_log_addrs+0x29b/0x7c0
[54570.007875][T772636] ? report_bug+0x2dd/0x500
[54570.008203][T772636] ? cec_claim_log_addrs+0x29b/0x7c0
[54570.008573][T772636] ? handle_bug+0x99/0x120
[54570.008891][T772636] ? exc_invalid_op+0x36/0x80
[54570.009220][T772636] ? asm_exc_invalid_op+0x1a/0x20
[54570.009584][T772636] ? __wake_up_klogd.part.0+0x99/0xf0
[54570.009962][T772636] ? cec_claim_log_addrs+0x29a/0x7c0
[54570.010344][T772636] ? cec_claim_log_addrs+0x29b/0x7c0
[54570.010718][T772636] ? cec_adap_enable+0x534/0xbd0
[54570.011065][T772636] __cec_s_log_addrs+0xdfc/0x16e0
[54570.011437][T772636] cec_ioctl+0x1e7c/0x2690
[54570.011778][T772636] ? cec_release+0xbb0/0xbb0
[54570.012107][T772636] ? tomoyo_execute_permission+0x4a0/0x4a0
[54570.012520][T772636] ? __sanitizer_cov_trace_switch+0x54/0x90
[54570.012938][T772636] ? do_vfs_ioctl+0x138/0x16c0
[54570.013493][T772636] ? vfs_fileattr_set+0xc40/0xc40
[54570.013954][T772636] ? lock_downgrade+0x6a0/0x6a0
[54570.014422][T772636] ? bpf_lsm_file_ioctl+0x9/0x10
[54570.014791][T772636] ? cec_release+0xbb0/0xbb0
[54570.015117][T772636] __x64_sys_ioctl+0x19d/0x210
[54570.015466][T772636] do_syscall_64+0xd2/0x250
[54570.015785][T772636] entry_SYSCALL_64_after_hwframe+0x63/0x6b
[54570.016191][T772636] RIP: 0033:0x7fee0affaf29
[54570.016502][T772636] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 488
[54570.017779][T772636] RSP: 002b:00007fee0aee0e98 EFLAGS: 00000202 ORIG_RAX: 000000000000000
[54570.018343][T772636] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fee0affaf29
[54570.018872][T772636] RDX: 0000000020000680 RSI: 00000000c05c6104 RDI: 0000000000000004
[54570.019411][T772636] RBP: 00007fee0aee0ec0 R08: 0000000000000000 R09: 0000000000000000
[54570.019941][T772636] R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffe792f9dce
[54570.020475][T772636] R13: 00007ffe792f9dcf R14: 00007fee0aee0fc0 R15: 0000000000022000
[54570.021013][T772636] </TASK>
[54570.021234][T772636] Kernel panic - not syncing: kernel: panic_on_warn set ...
[54570.021720][T772636] CPU: 0 PID: 772636 Comm: exe Not tainted 6.8.0-rc1-00169-gecb1b8288d5
[54570.022329][T772636] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-14
[54570.022956][T772636] Call Trace:
[54570.023192][T772636] <TASK>
[54570.023412][T772636] dump_stack_lvl+0xd9/0x150
[54570.023741][T772636] panic+0x6b9/0x760
[54570.024022][T772636] ? panic_smp_self_stop+0xa0/0xa0
[54570.024388][T772636] ? check_panic_on_warn+0x1f/0xc0
[54570.024746][T772636] ? cec_claim_log_addrs+0x29b/0x7c0
[54570.025111][T772636] check_panic_on_warn+0xb1/0xc0
[54570.025459][T772636] __warn+0xf2/0x390
[54570.025740][T772636] ? cec_claim_log_addrs+0x29b/0x7c0
[54570.026109][T772636] report_bug+0x2dd/0x500
[54570.026418][T772636] ? cec_claim_log_addrs+0x29b/0x7c0
[54570.026788][T772636] handle_bug+0x99/0x120
[54570.027085][T772636] exc_invalid_op+0x36/0x80
[54570.027408][T772636] asm_exc_invalid_op+0x1a/0x20
[54570.027748][T772636] RIP: 0010:cec_claim_log_addrs+0x29b/0x7c0
[54570.028154][T772636] Code: 7c 08 84 d2 0f 85 50 04 00 00 44 8b 25 ee 33 9f 0b 31 ff 44 89f
[54570.029431][T772636] RSP: 0018:ffffc90002117b30 EFLAGS: 00010293
[54570.029847][T772636] RAX: 0000000000000000 RBX: ffff88801d6a0000 RCX: ffffffff816a1959
[54570.030380][T772636] RDX: ffff888028e4b900 RSI: ffffffff8710b1ca RDI: 0000000000000005
[54570.030907][T772636] RBP: ffff88801d6a0638 R08: 0000000000000005 R09: 0000000000000000
[54570.031443][T772636] R10: 0000000080000000 R11: 0000000000000001 R12: 0000000000000000
[54570.031972][T772636] R13: ffff88801d6a07da R14: 0000000000000000 R15: 0000000000000001
[54570.032507][T772636] ? __wake_up_klogd.part.0+0x99/0xf0
[54570.032896][T772636] ? cec_claim_log_addrs+0x29a/0x7c0
[54570.033271][T772636] ? cec_adap_enable+0x534/0xbd0
[54570.033620][T772636] __cec_s_log_addrs+0xdfc/0x16e0
[54570.033978][T772636] cec_ioctl+0x1e7c/0x2690
[54570.034296][T772636] ? cec_release+0xbb0/0xbb0
[54570.034623][T772636] ? tomoyo_execute_permission+0x4a0/0x4a0
[54570.035030][T772636] ? __sanitizer_cov_trace_switch+0x54/0x90
[54570.035447][T772636] ? do_vfs_ioctl+0x138/0x16c0
[54570.035782][T772636] ? vfs_fileattr_set+0xc40/0xc40
[54570.036138][T772636] ? lock_downgrade+0x6a0/0x6a0
[54570.036487][T772636] ? bpf_lsm_file_ioctl+0x9/0x10
[54570.036835][T772636] ? cec_release+0xbb0/0xbb0
[54570.037159][T772636] __x64_sys_ioctl+0x19d/0x210
[54570.037497][T772636] do_syscall_64+0xd2/0x250
[54570.037817][T772636] entry_SYSCALL_64_after_hwframe+0x63/0x6b
```

I will collect the programs to trigger another 2 warnings and send you soon.

Best,
Chenyuan

On 2/12/24, 8:42 AM, "Hans Verkuil" <[email protected] <mailto:[email protected]>> wrote:


Hi Chenyuan,


On 30/01/2024 15:35, Hans Verkuil wrote:
> On 29/01/2024 04:03, Yang, Chenyuan wrote:
>> Hi Hans,
>>
>> Thanks a lot for this new patch!
>>
>> After applying this new patch in the latest kernel (hash: ecb1b8288dc7ccbdcb3b9df005fa1c0e0c0388a7) and fuzzing with Syzkaller, it seems that the hang still exists.
>> To help you better debug it, I attached the covered lines for the fuzz testing and the output of `git diff`. Hope this could help you.
>>
>> By the way, the syscall descriptions for CEC have been merged into the Syzkaller mainstream: https://urldefense.com/v3/__https://github.com/google/syzkaller/blob/master/sys/linux/dev_cec.txt__;!!DZ3fjg!8zd76_aCk9k_5Rn-mW78tk9IyhZUYovPw2SF9v6Pd5Tof8hm8qocM_NFBwbt0oRdAIH5uE8Ql8ysaoTJrMsn-dbvM9Xwwoo$ <https://urldefense.com/v3/__https://github.com/google/syzkaller/blob/master/sys/linux/dev_cec.txt__;!!DZ3fjg!8zd76_aCk9k_5Rn-mW78tk9IyhZUYovPw2SF9v6Pd5Tof8hm8qocM_NFBwbt0oRdAIH5uE8Ql8ysaoTJrMsn-dbvM9Xwwoo$> .

>>
>> Let me know if you need further information.
>>
>> Best,
>> Chenyuan
>
> Here is another patch. This now time outs on all wait_for_completion calls
> and reports a WARN_ON and shows additional info. Hopefully this will give me
> better insight into what is going on.
>
> Unfortunately I was unable to reproduce this issue on my VM, so I have to
> rely on you to run the test.


Did you have time to run the test with this patch? It would be very useful to
see the results.


Regards,


Hans


>
> Regards,
>
> Hans
>
> [PATCH] Test
>
> Signed-off-by: Hans Verkuil <[email protected] <mailto:[email protected]>>
> ---
> diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
> index 5741adf09a2e..b1951eb7f5bd 100644
> --- a/drivers/media/cec/core/cec-adap.c
> +++ b/drivers/media/cec/core/cec-adap.c
> @@ -935,9 +935,12 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg,
> * Release the lock and wait, retake the lock afterwards.
> */
> mutex_unlock(&adap->lock);
> - wait_for_completion_killable(&data->c);
> - if (!data->completed)
> - cancel_delayed_work_sync(&data->work);
> + if (WARN_ON(wait_for_completion_killable_timeout(&data->c, msecs_to_jiffies(adap->xfer_timeout_ms + 1000)) <= 0)) {
> + dprintk(0, "wfc1: %px %d%d%d%d %x\n", adap->kthread_config,
> + adap->is_configuring, adap->is_configured,
> + adap->is_enabled, adap->must_reconfigure, adap->phys_addr);
> + }
> + cancel_delayed_work_sync(&data->work);
> mutex_lock(&adap->lock);
>
> /* Cancel the transmit if it was interrupted */
> @@ -1563,10 +1566,12 @@ static int cec_config_thread_func(void *arg)
> cec_transmit_msg_fh(adap, &msg, NULL, false);
> }
> }
> + mutex_unlock(&adap->lock);
> + call_void_op(adap, configured);
> + mutex_lock(&adap->lock);
> adap->kthread_config = NULL;
> complete(&adap->config_completion);
> mutex_unlock(&adap->lock);
> - call_void_op(adap, configured);
> return 0;
>
> unconfigure:
> @@ -1592,6 +1597,17 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
> if (WARN_ON(adap->is_configuring || adap->is_configured))
> return;
>
> + if (adap->kthread_config) {
> + mutex_unlock(&adap->lock);
> +// wait_for_completion(&adap->config_completion);
> + if (WARN_ON(wait_for_completion_killable_timeout(&adap->config_completion, msecs_to_jiffies(10000)) <= 0)) {
> + dprintk(0, "wfc2: %px %d%d%d%d %x\n", adap->kthread_config,
> + adap->is_configuring, adap->is_configured,
> + adap->is_enabled, adap->must_reconfigure, adap->phys_addr);
> + }
> + mutex_lock(&adap->lock);
> + }
> +
> init_completion(&adap->config_completion);
>
> /* Ready to kick off the thread */
> @@ -1599,11 +1615,17 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
> adap->kthread_config = kthread_run(cec_config_thread_func, adap,
> "ceccfg-%s", adap->name);
> if (IS_ERR(adap->kthread_config)) {
> - adap->kthread_config = NULL;
> adap->is_configuring = false;
> + adap->kthread_config = NULL;
> } else if (block) {
> mutex_unlock(&adap->lock);
> - wait_for_completion(&adap->config_completion);
> + //wait_for_completion(&adap->config_completion);
> + if (WARN_ON(wait_for_completion_killable_timeout(&adap->config_completion, msecs_to_jiffies(10000)) <= 0)) {
> + dprintk(0, "wfc3: %px %d%d%d%d %x\n", adap->kthread_config,
> + adap->is_configuring, adap->is_configured,
> + adap->is_enabled, adap->must_reconfigure, adap->phys_addr);
> +
> + }
> mutex_lock(&adap->lock);
> }
> }
> diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c
> index 67dc79ef1705..d64bb716f9c6 100644
> --- a/drivers/media/cec/core/cec-api.c
> +++ b/drivers/media/cec/core/cec-api.c
> @@ -664,6 +664,8 @@ static int cec_release(struct inode *inode, struct file *filp)
> list_del_init(&data->xfer_list);
> }
> mutex_unlock(&adap->lock);
> +
> + mutex_lock(&fh->lock);
> while (!list_empty(&fh->msgs)) {
> struct cec_msg_entry *entry =
> list_first_entry(&fh->msgs, struct cec_msg_entry, list);
> @@ -681,6 +683,7 @@ static int cec_release(struct inode *inode, struct file *filp)
> kfree(entry);
> }
> }
> + mutex_unlock(&fh->lock);
> kfree(fh);
>
> cec_put_device(devnode);
>






Attachments:
cec-warn-3.c (18.37 kB)
cec-warn-3.c

2024-02-23 14:45:51

by Hans Verkuil

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

Hi Chenyuan,

Here is another patch for you to try. I think it is good for blocking CEC_ADAP_S_LOG_ADDRS
ioctl calls, but if the filehandle is in non-blocking mode, I'm still not certain it
is correct. But one issue at a time :-)

Regards,

Hans

diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
index 559a172ebc6c..a493cbce2456 100644
--- a/drivers/media/cec/core/cec-adap.c
+++ b/drivers/media/cec/core/cec-adap.c
@@ -936,8 +936,7 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg,
*/
mutex_unlock(&adap->lock);
wait_for_completion_killable(&data->c);
- if (!data->completed)
- cancel_delayed_work_sync(&data->work);
+ cancel_delayed_work_sync(&data->work);
mutex_lock(&adap->lock);

/* Cancel the transmit if it was interrupted */
@@ -1575,9 +1574,12 @@ static int cec_config_thread_func(void *arg)
*/
static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
{
- if (WARN_ON(adap->is_configuring || adap->is_configured))
+ if (WARN_ON(adap->is_claiming_log_addrs ||
+ adap->is_configuring || adap->is_configured))
return;

+ adap->is_claiming_log_addrs = true;
+
init_completion(&adap->config_completion);

/* Ready to kick off the thread */
@@ -1592,6 +1594,7 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
wait_for_completion(&adap->config_completion);
mutex_lock(&adap->lock);
}
+ adap->is_claiming_log_addrs = false;
}

/*
diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c
index 67dc79ef1705..3ef915344304 100644
--- a/drivers/media/cec/core/cec-api.c
+++ b/drivers/media/cec/core/cec-api.c
@@ -178,7 +178,7 @@ static long cec_adap_s_log_addrs(struct cec_adapter *adap, struct cec_fh *fh,
CEC_LOG_ADDRS_FL_ALLOW_RC_PASSTHRU |
CEC_LOG_ADDRS_FL_CDC_ONLY;
mutex_lock(&adap->lock);
- if (!adap->is_configuring &&
+ if (!adap->is_claiming_log_addrs && !adap->is_configuring &&
(!log_addrs.num_log_addrs || !adap->is_configured) &&
!cec_is_busy(adap, fh)) {
err = __cec_s_log_addrs(adap, &log_addrs, block);
@@ -664,6 +664,8 @@ static int cec_release(struct inode *inode, struct file *filp)
list_del_init(&data->xfer_list);
}
mutex_unlock(&adap->lock);
+
+ mutex_lock(&fh->lock);
while (!list_empty(&fh->msgs)) {
struct cec_msg_entry *entry =
list_first_entry(&fh->msgs, struct cec_msg_entry, list);
@@ -681,6 +683,7 @@ static int cec_release(struct inode *inode, struct file *filp)
kfree(entry);
}
}
+ mutex_unlock(&fh->lock);
kfree(fh);

cec_put_device(devnode);
diff --git a/include/media/cec.h b/include/media/cec.h
index 10c9cf6058b7..cc3fcd0496c3 100644
--- a/include/media/cec.h
+++ b/include/media/cec.h
@@ -258,6 +258,7 @@ struct cec_adapter {
u16 phys_addr;
bool needs_hpd;
bool is_enabled;
+ bool is_claiming_log_addrs;
bool is_configuring;
bool must_reconfigure;
bool is_configured;


2024-02-26 12:40:35

by Yang, Chenyuan

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

Hi Hans,

Thank you for your continued efforts in investigating this bug and implementing the new patch!

Regarding the two warnings, they have been addressed by this new patch and are no longer reproducible. Additionally, I conducted a 48-hour fuzzing test on the CEC driver, which has successfully eliminated the previous hanging issue.

One thing to note that the system will now log timeout events:
```
[ 2281.265385][ T2034] cec-vivid-001-vid-out0: transmit timed out
[ 2282.994510][ T2017] cec-vivid-000-vid-cap0: transmit timed out
[ 2283.063484][ T2050] cec-vivid-002-vid-out0: transmit timed out
[ 2283.073468][ T2065] cec-vivid-003-vid-cap0: transmit timed out
[ 2283.373518][ T2033] cec-vivid-001-vid-cap0: transmit timed out
[ 2285.113544][ T2018] cec-vivid-000-vid-out0: transmit timed out
[ 2285.193502][ T2050] cec-vivid-002-vid-out0: transmit timed out
[ 2285.193570][ T2065] cec-vivid-003-vid-cap0: transmit timed out
[ 2285.513570][ T2033] cec-vivid-001-vid-cap0: transmit timed out
```

Best,
Chenyuan

From: Hans Verkuil <[email protected]>
Date: Friday, February 23, 2024 at 8:44 AM
To: Yang, Chenyuan <[email protected]>, [email protected] <[email protected]>, [email protected] <[email protected]>
Cc: [email protected] <[email protected]>, [email protected] <[email protected]>, [email protected] <[email protected]>, Zhao, Zijie <[email protected]>, Zhang, Lingming <[email protected]>
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)
Hi Chenyuan,

Here is another patch for you to try. I think it is good for blocking CEC_ADAP_S_LOG_ADDRS
ioctl calls, but if the filehandle is in non-blocking mode, I'm still not certain it
is correct. But one issue at a time :-)

Regards,

        Hans

diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
index 559a172ebc6c..a493cbce2456 100644
--- a/drivers/media/cec/core/cec-adap.c
+++ b/drivers/media/cec/core/cec-adap.c
@@ -936,8 +936,7 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg,
          */
         mutex_unlock(&adap->lock);
         wait_for_completion_killable(&data->c);
-       if (!data->completed)
-               cancel_delayed_work_sync(&data->work);
+       cancel_delayed_work_sync(&data->work);
         mutex_lock(&adap->lock);

         /* Cancel the transmit if it was interrupted */
@@ -1575,9 +1574,12 @@ static int cec_config_thread_func(void *arg)
  */
 static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
 {
-       if (WARN_ON(adap->is_configuring || adap->is_configured))
+       if (WARN_ON(adap->is_claiming_log_addrs ||
+                   adap->is_configuring || adap->is_configured))
                 return;

+       adap->is_claiming_log_addrs = true;
+
         init_completion(&adap->config_completion);

         /* Ready to kick off the thread */
@@ -1592,6 +1594,7 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
                 wait_for_completion(&adap->config_completion);
                 mutex_lock(&adap->lock);
         }
+       adap->is_claiming_log_addrs = false;
 }

 /*
diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c
index 67dc79ef1705..3ef915344304 100644
--- a/drivers/media/cec/core/cec-api.c
+++ b/drivers/media/cec/core/cec-api.c
@@ -178,7 +178,7 @@ static long cec_adap_s_log_addrs(struct cec_adapter *adap, struct cec_fh *fh,
                            CEC_LOG_ADDRS_FL_ALLOW_RC_PASSTHRU |
                            CEC_LOG_ADDRS_FL_CDC_ONLY;
         mutex_lock(&adap->lock);
-       if (!adap->is_configuring &&
+       if (!adap->is_claiming_log_addrs && !adap->is_configuring &&
             (!log_addrs.num_log_addrs || !adap->is_configured) &&
             !cec_is_busy(adap, fh)) {
                 err = __cec_s_log_addrs(adap, &log_addrs, block);
@@ -664,6 +664,8 @@ static int cec_release(struct inode *inode, struct file *filp)
                 list_del_init(&data->xfer_list);
         }
         mutex_unlock(&adap->lock);
+
+       mutex_lock(&fh->lock);
         while (!list_empty(&fh->msgs)) {
                 struct cec_msg_entry *entry =
                         list_first_entry(&fh->msgs, struct cec_msg_entry, list);
@@ -681,6 +683,7 @@ static int cec_release(struct inode *inode, struct file *filp)
                         kfree(entry);
                 }
         }
+       mutex_unlock(&fh->lock);
         kfree(fh);

         cec_put_device(devnode);
diff --git a/include/media/cec.h b/include/media/cec.h
index 10c9cf6058b7..cc3fcd0496c3 100644
--- a/include/media/cec.h
+++ b/include/media/cec.h
@@ -258,6 +258,7 @@ struct cec_adapter {
         u16 phys_addr;
         bool needs_hpd;
         bool is_enabled;
+       bool is_claiming_log_addrs;
         bool is_configuring;
         bool must_reconfigure;
         bool is_configured;

2024-04-19 14:51:37

by Takashi Iwai

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

On Mon, 26 Feb 2024 13:27:16 +0100,
Yang, Chenyuan wrote:
>
> Hi Hans,
>
> Thank you for your continued efforts in investigating this bug and implementing the new patch!
>
> Regarding the two warnings, they have been addressed by this new patch and are no longer reproducible. Additionally, I conducted a 48-hour fuzzing test on the CEC driver, which has successfully eliminated the previous hanging issue.
>
> One thing to note that the system will now log timeout events:
> ```
> [ 2281.265385][ T2034] cec-vivid-001-vid-out0: transmit timed out
> [ 2282.994510][ T2017] cec-vivid-000-vid-cap0: transmit timed out
> [ 2283.063484][ T2050] cec-vivid-002-vid-out0: transmit timed out
> [ 2283.073468][ T2065] cec-vivid-003-vid-cap0: transmit timed out
> [ 2283.373518][ T2033] cec-vivid-001-vid-cap0: transmit timed out
> [ 2285.113544][ T2018] cec-vivid-000-vid-out0: transmit timed out
> [ 2285.193502][ T2050] cec-vivid-002-vid-out0: transmit timed out
> [ 2285.193570][ T2065] cec-vivid-003-vid-cap0: transmit timed out
> [ 2285.513570][ T2033] cec-vivid-001-vid-cap0: transmit timed out
> ```
>
> Best,
> Chenyuan

Hi Hans,

how is the current status of this bug fix? It seems that the thread
stalled, and I wonder how we can go further.

I'm asking it because CVE-2024-23848 was assigned and we've been asked
about the bug fix.


Thanks!

Takashi

>
> From: Hans Verkuil <[email protected]>
> Date: Friday, February 23, 2024 at 8:44 AM
> To: Yang, Chenyuan <[email protected]>, [email protected] <[email protected]>, [email protected] <[email protected]>
> Cc: [email protected] <[email protected]>, [email protected] <[email protected]>, [email protected] <[email protected]>, Zhao, Zijie <[email protected]>, Zhang, Lingming <[email protected]>
> Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)
> Hi Chenyuan,
>
> Here is another patch for you to try. I think it is good for blocking CEC_ADAP_S_LOG_ADDRS
> ioctl calls, but if the filehandle is in non-blocking mode, I'm still not certain it
> is correct. But one issue at a time :-)
>
> Regards,
>
>         Hans
>
> diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
> index 559a172ebc6c..a493cbce2456 100644
> --- a/drivers/media/cec/core/cec-adap.c
> +++ b/drivers/media/cec/core/cec-adap.c
> @@ -936,8 +936,7 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg,
>           */
>          mutex_unlock(&adap->lock);
>          wait_for_completion_killable(&data->c);
> -       if (!data->completed)
> -               cancel_delayed_work_sync(&data->work);
> +       cancel_delayed_work_sync(&data->work);
>          mutex_lock(&adap->lock);
>
>          /* Cancel the transmit if it was interrupted */
> @@ -1575,9 +1574,12 @@ static int cec_config_thread_func(void *arg)
>   */
>  static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
>  {
> -       if (WARN_ON(adap->is_configuring || adap->is_configured))
> +       if (WARN_ON(adap->is_claiming_log_addrs ||
> +                   adap->is_configuring || adap->is_configured))
>                  return;
>
> +       adap->is_claiming_log_addrs = true;
> +
>          init_completion(&adap->config_completion);
>
>          /* Ready to kick off the thread */
> @@ -1592,6 +1594,7 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
>                  wait_for_completion(&adap->config_completion);
>                  mutex_lock(&adap->lock);
>          }
> +       adap->is_claiming_log_addrs = false;
>  }
>
>  /*
> diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c
> index 67dc79ef1705..3ef915344304 100644
> --- a/drivers/media/cec/core/cec-api.c
> +++ b/drivers/media/cec/core/cec-api.c
> @@ -178,7 +178,7 @@ static long cec_adap_s_log_addrs(struct cec_adapter *adap, struct cec_fh *fh,
>                             CEC_LOG_ADDRS_FL_ALLOW_RC_PASSTHRU |
>                             CEC_LOG_ADDRS_FL_CDC_ONLY;
>          mutex_lock(&adap->lock);
> -       if (!adap->is_configuring &&
> +       if (!adap->is_claiming_log_addrs && !adap->is_configuring &&
>              (!log_addrs.num_log_addrs || !adap->is_configured) &&
>              !cec_is_busy(adap, fh)) {
>                  err = __cec_s_log_addrs(adap, &log_addrs, block);
> @@ -664,6 +664,8 @@ static int cec_release(struct inode *inode, struct file *filp)
>                  list_del_init(&data->xfer_list);
>          }
>          mutex_unlock(&adap->lock);
> +
> +       mutex_lock(&fh->lock);
>          while (!list_empty(&fh->msgs)) {
>                  struct cec_msg_entry *entry =
>                          list_first_entry(&fh->msgs, struct cec_msg_entry, list);
> @@ -681,6 +683,7 @@ static int cec_release(struct inode *inode, struct file *filp)
>                          kfree(entry);
>                  }
>          }
> +       mutex_unlock(&fh->lock);
>          kfree(fh);
>
>          cec_put_device(devnode);
> diff --git a/include/media/cec.h b/include/media/cec.h
> index 10c9cf6058b7..cc3fcd0496c3 100644
> --- a/include/media/cec.h
> +++ b/include/media/cec.h
> @@ -258,6 +258,7 @@ struct cec_adapter {
>          u16 phys_addr;
>          bool needs_hpd;
>          bool is_enabled;
> +       bool is_claiming_log_addrs;
>          bool is_configuring;
>          bool must_reconfigure;
>          bool is_configured;
>

2024-04-22 12:14:31

by Hans Verkuil

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

Hi Takashi,

On 19/04/2024 16:51, Takashi Iwai wrote:
> On Mon, 26 Feb 2024 13:27:16 +0100,
> Yang, Chenyuan wrote:
>>
>> Hi Hans,
>>
>> Thank you for your continued efforts in investigating this bug and implementing the new patch!
>>
>> Regarding the two warnings, they have been addressed by this new patch and are no longer reproducible. Additionally, I conducted a 48-hour fuzzing test on the CEC driver, which has successfully eliminated the previous hanging issue.
>>
>> One thing to note that the system will now log timeout events:
>> ```
>> [ 2281.265385][ T2034] cec-vivid-001-vid-out0: transmit timed out
>> [ 2282.994510][ T2017] cec-vivid-000-vid-cap0: transmit timed out
>> [ 2283.063484][ T2050] cec-vivid-002-vid-out0: transmit timed out
>> [ 2283.073468][ T2065] cec-vivid-003-vid-cap0: transmit timed out
>> [ 2283.373518][ T2033] cec-vivid-001-vid-cap0: transmit timed out
>> [ 2285.113544][ T2018] cec-vivid-000-vid-out0: transmit timed out
>> [ 2285.193502][ T2050] cec-vivid-002-vid-out0: transmit timed out
>> [ 2285.193570][ T2065] cec-vivid-003-vid-cap0: transmit timed out
>> [ 2285.513570][ T2033] cec-vivid-001-vid-cap0: transmit timed out

>> ```
>>
>> Best,
>> Chenyuan
>
> Hi Hans,
>
> how is the current status of this bug fix? It seems that the thread
> stalled, and I wonder how we can go further.
>
> I'm asking it because CVE-2024-23848 was assigned and we've been asked
> about the bug fix.

I missed this reply, so I will take another look at the patch. Too many emails :-(

Two other patches relating to this I have just posted:

https://patchwork.linuxtv.org/project/linux-media/patch/[email protected]/
https://patchwork.linuxtv.org/project/linux-media/patch/[email protected]/

Regards,

Hans

>
>
> Thanks!
>
> Takashi
>
>>
>> From: Hans Verkuil <[email protected]>
>> Date: Friday, February 23, 2024 at 8:44 AM
>> To: Yang, Chenyuan <[email protected]>, [email protected] <[email protected]>, [email protected] <[email protected]>
>> Cc: [email protected] <[email protected]>, [email protected] <[email protected]>, [email protected] <[email protected]>, Zhao, Zijie <[email protected]>, Zhang, Lingming <[email protected]>
>> Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)
>> Hi Chenyuan,
>>
>> Here is another patch for you to try. I think it is good for blocking CEC_ADAP_S_LOG_ADDRS
>> ioctl calls, but if the filehandle is in non-blocking mode, I'm still not certain it
>> is correct. But one issue at a time :-)
>>
>> Regards,
>>
>>         Hans
>>
>> diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
>> index 559a172ebc6c..a493cbce2456 100644
>> --- a/drivers/media/cec/core/cec-adap.c
>> +++ b/drivers/media/cec/core/cec-adap.c
>> @@ -936,8 +936,7 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg,
>>           */
>>          mutex_unlock(&adap->lock);
>>          wait_for_completion_killable(&data->c);
>> -       if (!data->completed)
>> -               cancel_delayed_work_sync(&data->work);
>> +       cancel_delayed_work_sync(&data->work);
>>          mutex_lock(&adap->lock);
>>
>>          /* Cancel the transmit if it was interrupted */
>> @@ -1575,9 +1574,12 @@ static int cec_config_thread_func(void *arg)
>>   */
>>  static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
>>  {
>> -       if (WARN_ON(adap->is_configuring || adap->is_configured))
>> +       if (WARN_ON(adap->is_claiming_log_addrs ||
>> +                   adap->is_configuring || adap->is_configured))
>>                  return;
>>
>> +       adap->is_claiming_log_addrs = true;
>> +
>>          init_completion(&adap->config_completion);
>>
>>          /* Ready to kick off the thread */
>> @@ -1592,6 +1594,7 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
>>                  wait_for_completion(&adap->config_completion);
>>                  mutex_lock(&adap->lock);
>>          }
>> +       adap->is_claiming_log_addrs = false;
>>  }
>>
>>  /*
>> diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c
>> index 67dc79ef1705..3ef915344304 100644
>> --- a/drivers/media/cec/core/cec-api.c
>> +++ b/drivers/media/cec/core/cec-api.c
>> @@ -178,7 +178,7 @@ static long cec_adap_s_log_addrs(struct cec_adapter *adap, struct cec_fh *fh,
>>                             CEC_LOG_ADDRS_FL_ALLOW_RC_PASSTHRU |
>>                             CEC_LOG_ADDRS_FL_CDC_ONLY;
>>          mutex_lock(&adap->lock);
>> -       if (!adap->is_configuring &&
>> +       if (!adap->is_claiming_log_addrs && !adap->is_configuring &&
>>              (!log_addrs.num_log_addrs || !adap->is_configured) &&
>>              !cec_is_busy(adap, fh)) {
>>                  err = __cec_s_log_addrs(adap, &log_addrs, block);
>> @@ -664,6 +664,8 @@ static int cec_release(struct inode *inode, struct file *filp)
>>                  list_del_init(&data->xfer_list);
>>          }
>>          mutex_unlock(&adap->lock);
>> +
>> +       mutex_lock(&fh->lock);
>>          while (!list_empty(&fh->msgs)) {
>>                  struct cec_msg_entry *entry =
>>                          list_first_entry(&fh->msgs, struct cec_msg_entry, list);
>> @@ -681,6 +683,7 @@ static int cec_release(struct inode *inode, struct file *filp)
>>                          kfree(entry);
>>                  }
>>          }
>> +       mutex_unlock(&fh->lock);
>>          kfree(fh);
>>
>>          cec_put_device(devnode);
>> diff --git a/include/media/cec.h b/include/media/cec.h
>> index 10c9cf6058b7..cc3fcd0496c3 100644
>> --- a/include/media/cec.h
>> +++ b/include/media/cec.h
>> @@ -258,6 +258,7 @@ struct cec_adapter {
>>          u16 phys_addr;
>>          bool needs_hpd;
>>          bool is_enabled;
>> +       bool is_claiming_log_addrs;
>>          bool is_configuring;
>>          bool must_reconfigure;
>>          bool is_configured;
>>


2024-04-22 15:05:00

by Hans Verkuil

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

Hi Chenyuan,

My apologies for the delay, I missed your email.

On 26/02/2024 13:27, Yang, Chenyuan wrote:
> Hi Hans,
>
> Thank you for your continued efforts in investigating this bug and implementing the new patch!
>
> Regarding the two warnings, they have been addressed by this new patch and are no longer reproducible. Additionally, I conducted a 48-hour fuzzing test on the CEC driver, which has successfully eliminated the previous hanging issue.
>
> One thing to note that the system will now log timeout events:
> ```
> [ 2281.265385][ T2034] cec-vivid-001-vid-out0: transmit timed out
> [ 2282.994510][ T2017] cec-vivid-000-vid-cap0: transmit timed out
> [ 2283.063484][ T2050] cec-vivid-002-vid-out0: transmit timed out
> [ 2283.073468][ T2065] cec-vivid-003-vid-cap0: transmit timed out
> [ 2283.373518][ T2033] cec-vivid-001-vid-cap0: transmit timed out
> [ 2285.113544][ T2018] cec-vivid-000-vid-out0: transmit timed out
> [ 2285.193502][ T2050] cec-vivid-002-vid-out0: transmit timed out
> [ 2285.193570][ T2065] cec-vivid-003-vid-cap0: transmit timed out
> [ 2285.513570][ T2033] cec-vivid-001-vid-cap0: transmit timed out
> ```

Is this happening all the time, or just once in a (long?) while?

Regards,

Hans

>
> Best,
> Chenyuan
>
> From: Hans Verkuil <[email protected]>
> Date: Friday, February 23, 2024 at 8:44 AM
> To: Yang, Chenyuan <[email protected]>, [email protected] <[email protected]>, [email protected] <[email protected]>
> Cc: [email protected] <[email protected]>, [email protected] <[email protected]>, [email protected] <[email protected]>, Zhao, Zijie <[email protected]>, Zhang, Lingming <[email protected]>
> Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)
> Hi Chenyuan,
>
> Here is another patch for you to try. I think it is good for blocking CEC_ADAP_S_LOG_ADDRS
> ioctl calls, but if the filehandle is in non-blocking mode, I'm still not certain it
> is correct. But one issue at a time :-)
>
> Regards,
>
>         Hans
>
> diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
> index 559a172ebc6c..a493cbce2456 100644
> --- a/drivers/media/cec/core/cec-adap.c
> +++ b/drivers/media/cec/core/cec-adap.c
> @@ -936,8 +936,7 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg,
>           */
>          mutex_unlock(&adap->lock);
>          wait_for_completion_killable(&data->c);
> -       if (!data->completed)
> -               cancel_delayed_work_sync(&data->work);
> +       cancel_delayed_work_sync(&data->work);
>          mutex_lock(&adap->lock);
>
>          /* Cancel the transmit if it was interrupted */
> @@ -1575,9 +1574,12 @@ static int cec_config_thread_func(void *arg)
>   */
>  static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
>  {
> -       if (WARN_ON(adap->is_configuring || adap->is_configured))
> +       if (WARN_ON(adap->is_claiming_log_addrs ||
> +                   adap->is_configuring || adap->is_configured))
>                  return;
>
> +       adap->is_claiming_log_addrs = true;
> +
>          init_completion(&adap->config_completion);
>
>          /* Ready to kick off the thread */
> @@ -1592,6 +1594,7 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
>                  wait_for_completion(&adap->config_completion);
>                  mutex_lock(&adap->lock);
>          }
> +       adap->is_claiming_log_addrs = false;
>  }
>
>  /*
> diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c
> index 67dc79ef1705..3ef915344304 100644
> --- a/drivers/media/cec/core/cec-api.c
> +++ b/drivers/media/cec/core/cec-api.c
> @@ -178,7 +178,7 @@ static long cec_adap_s_log_addrs(struct cec_adapter *adap, struct cec_fh *fh,
>                             CEC_LOG_ADDRS_FL_ALLOW_RC_PASSTHRU |
>                             CEC_LOG_ADDRS_FL_CDC_ONLY;
>          mutex_lock(&adap->lock);
> -       if (!adap->is_configuring &&
> +       if (!adap->is_claiming_log_addrs && !adap->is_configuring &&
>              (!log_addrs.num_log_addrs || !adap->is_configured) &&
>              !cec_is_busy(adap, fh)) {
>                  err = __cec_s_log_addrs(adap, &log_addrs, block);
> @@ -664,6 +664,8 @@ static int cec_release(struct inode *inode, struct file *filp)
>                  list_del_init(&data->xfer_list);
>          }
>          mutex_unlock(&adap->lock);
> +
> +       mutex_lock(&fh->lock);
>          while (!list_empty(&fh->msgs)) {
>                  struct cec_msg_entry *entry =
>                          list_first_entry(&fh->msgs, struct cec_msg_entry, list);
> @@ -681,6 +683,7 @@ static int cec_release(struct inode *inode, struct file *filp)
>                          kfree(entry);
>                  }
>          }
> +       mutex_unlock(&fh->lock);
>          kfree(fh);
>
>          cec_put_device(devnode);
> diff --git a/include/media/cec.h b/include/media/cec.h
> index 10c9cf6058b7..cc3fcd0496c3 100644
> --- a/include/media/cec.h
> +++ b/include/media/cec.h
> @@ -258,6 +258,7 @@ struct cec_adapter {
>          u16 phys_addr;
>          bool needs_hpd;
>          bool is_enabled;
> +       bool is_claiming_log_addrs;
>          bool is_configuring;
>          bool must_reconfigure;
>          bool is_configured;
>


2024-04-22 18:54:46

by Yang, Chenyuan

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

Hi Hans,

Such timeout logs happen all the time when I execute the C program attached.

```
gcc -pthread repro.c -o exe
./exe
```

The logs are from QEMU:

```
Debian GNU/Linux 11 syzkaller ttyS0

syzkaller login: [ 326.705401][ T51] Bluetooth: hci0: sending frame failed (-49)
[ 326.707063][ T4466] Bluetooth: hci0: Opcode 0x1003 failed: -49
[ 335.945400][ T4466] Bluetooth: hci0: Opcode 0x1003 failed: -110
[ 335.945417][ T51] Bluetooth: hci0: command 0x1003 tx timeout
[ 390.885042][ T2019] cec-vivid-000-vid-out0: transmit timed out
[ 390.894890][ T2050] cec-vivid-002-vid-cap0: transmit timed out
[ 390.895540][ T2034] cec-vivid-001-vid-cap0: transmit timed out
[ 390.905041][ T2067] cec-vivid-003-vid-out0: transmit timed out
[ 392.985033][ T2018] cec-vivid-000-vid-cap0: transmit timed out
...
```

Best,
Chenyuan

On 4/22/24, 10:04 AM, "Hans Verkuil" <[email protected] <mailto:[email protected]>> wrote:


Hi Chenyuan,


My apologies for the delay, I missed your email.


On 26/02/2024 13:27, Yang, Chenyuan wrote:
> Hi Hans,
>
> Thank you for your continued efforts in investigating this bug and implementing the new patch!
>
> Regarding the two warnings, they have been addressed by this new patch and are no longer reproducible. Additionally, I conducted a 48-hour fuzzing test on the CEC driver, which has successfully eliminated the previous hanging issue.
>
> One thing to note that the system will now log timeout events:
> ```
> [ 2281.265385][ T2034] cec-vivid-001-vid-out0: transmit timed out
> [ 2282.994510][ T2017] cec-vivid-000-vid-cap0: transmit timed out
> [ 2283.063484][ T2050] cec-vivid-002-vid-out0: transmit timed out
> [ 2283.073468][ T2065] cec-vivid-003-vid-cap0: transmit timed out
> [ 2283.373518][ T2033] cec-vivid-001-vid-cap0: transmit timed out
> [ 2285.113544][ T2018] cec-vivid-000-vid-out0: transmit timed out
> [ 2285.193502][ T2050] cec-vivid-002-vid-out0: transmit timed out
> [ 2285.193570][ T2065] cec-vivid-003-vid-cap0: transmit timed out
> [ 2285.513570][ T2033] cec-vivid-001-vid-cap0: transmit timed out
> ```


Is this happening all the time, or just once in a (long?) while?


Regards,


Hans


>
> Best,
> Chenyuan
>
> From: Hans Verkuil <[email protected] <mailto:[email protected]>>
> Date: Friday, February 23, 2024 at 8:44 AM
> To: Yang, Chenyuan <[email protected] <mailto:[email protected]>>, [email protected] <mailto:[email protected]> <[email protected] <mailto:[email protected]>>, [email protected] <mailto:[email protected]> <[email protected] <mailto:[email protected]>>
> Cc: [email protected] <mailto:[email protected]> <[email protected] <mailto:[email protected]>>, [email protected] <mailto:[email protected]> <[email protected] <mailto:[email protected]>>, [email protected] <mailto:[email protected]> <[email protected] <mailto:[email protected]>>, Zhao, Zijie <[email protected] <mailto:[email protected]>>, Zhang, Lingming <[email protected] <mailto:[email protected]>>
> Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)
> Hi Chenyuan,
>
> Here is another patch for you to try. I think it is good for blocking CEC_ADAP_S_LOG_ADDRS
> ioctl calls, but if the filehandle is in non-blocking mode, I'm still not certain it
> is correct. But one issue at a time :-)
>
> Regards,
>
> Hans
>
> diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
> index 559a172ebc6c..a493cbce2456 100644
> --- a/drivers/media/cec/core/cec-adap.c
> +++ b/drivers/media/cec/core/cec-adap.c
> @@ -936,8 +936,7 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg,
> */
> mutex_unlock(&adap->lock);
> wait_for_completion_killable(&data->c);
> - if (!data->completed)
> - cancel_delayed_work_sync(&data->work);
> + cancel_delayed_work_sync(&data->work);
> mutex_lock(&adap->lock);
>
> /* Cancel the transmit if it was interrupted */
> @@ -1575,9 +1574,12 @@ static int cec_config_thread_func(void *arg)
> */
> static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
> {
> - if (WARN_ON(adap->is_configuring || adap->is_configured))
> + if (WARN_ON(adap->is_claiming_log_addrs ||
> + adap->is_configuring || adap->is_configured))
> return;
>
> + adap->is_claiming_log_addrs = true;
> +
> init_completion(&adap->config_completion);
>
> /* Ready to kick off the thread */
> @@ -1592,6 +1594,7 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
> wait_for_completion(&adap->config_completion);
> mutex_lock(&adap->lock);
> }
> + adap->is_claiming_log_addrs = false;
> }
>
> /*
> diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c
> index 67dc79ef1705..3ef915344304 100644
> --- a/drivers/media/cec/core/cec-api.c
> +++ b/drivers/media/cec/core/cec-api.c
> @@ -178,7 +178,7 @@ static long cec_adap_s_log_addrs(struct cec_adapter *adap, struct cec_fh *fh,
> CEC_LOG_ADDRS_FL_ALLOW_RC_PASSTHRU |
> CEC_LOG_ADDRS_FL_CDC_ONLY;
> mutex_lock(&adap->lock);
> - if (!adap->is_configuring &&
> + if (!adap->is_claiming_log_addrs && !adap->is_configuring &&
> (!log_addrs.num_log_addrs || !adap->is_configured) &&
> !cec_is_busy(adap, fh)) {
> err = __cec_s_log_addrs(adap, &log_addrs, block);
> @@ -664,6 +664,8 @@ static int cec_release(struct inode *inode, struct file *filp)
> list_del_init(&data->xfer_list);
> }
> mutex_unlock(&adap->lock);
> +
> + mutex_lock(&fh->lock);
> while (!list_empty(&fh->msgs)) {
> struct cec_msg_entry *entry =
> list_first_entry(&fh->msgs, struct cec_msg_entry, list);
> @@ -681,6 +683,7 @@ static int cec_release(struct inode *inode, struct file *filp)
> kfree(entry);
> }
> }
> + mutex_unlock(&fh->lock);
> kfree(fh);
>
> cec_put_device(devnode);
> diff --git a/include/media/cec.h b/include/media/cec.h
> index 10c9cf6058b7..cc3fcd0496c3 100644
> --- a/include/media/cec.h
> +++ b/include/media/cec.h
> @@ -258,6 +258,7 @@ struct cec_adapter {
> u16 phys_addr;
> bool needs_hpd;
> bool is_enabled;
> + bool is_claiming_log_addrs;
> bool is_configuring;
> bool must_reconfigure;
> bool is_configured;
>






Attachments:
repro.c (7.94 kB)
repro.c

2024-04-22 19:31:04

by Takashi Iwai

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

On Mon, 22 Apr 2024 14:14:17 +0200,
Hans Verkuil wrote:
>
> Hi Takashi,
>
> On 19/04/2024 16:51, Takashi Iwai wrote:
> > On Mon, 26 Feb 2024 13:27:16 +0100,
> > Yang, Chenyuan wrote:
> >>
> >> Hi Hans,
> >>
> >> Thank you for your continued efforts in investigating this bug and implementing the new patch!
> >>
> >> Regarding the two warnings, they have been addressed by this new patch and are no longer reproducible. Additionally, I conducted a 48-hour fuzzing test on the CEC driver, which has successfully eliminated the previous hanging issue.
> >>
> >> One thing to note that the system will now log timeout events:
> >> ```
> >> [ 2281.265385][ T2034] cec-vivid-001-vid-out0: transmit timed out
> >> [ 2282.994510][ T2017] cec-vivid-000-vid-cap0: transmit timed out
> >> [ 2283.063484][ T2050] cec-vivid-002-vid-out0: transmit timed out
> >> [ 2283.073468][ T2065] cec-vivid-003-vid-cap0: transmit timed out
> >> [ 2283.373518][ T2033] cec-vivid-001-vid-cap0: transmit timed out
> >> [ 2285.113544][ T2018] cec-vivid-000-vid-out0: transmit timed out
> >> [ 2285.193502][ T2050] cec-vivid-002-vid-out0: transmit timed out
> >> [ 2285.193570][ T2065] cec-vivid-003-vid-cap0: transmit timed out
> >> [ 2285.513570][ T2033] cec-vivid-001-vid-cap0: transmit timed out
>
> >> ```
> >>
> >> Best,
> >> Chenyuan
> >
> > Hi Hans,
> >
> > how is the current status of this bug fix? It seems that the thread
> > stalled, and I wonder how we can go further.
> >
> > I'm asking it because CVE-2024-23848 was assigned and we've been asked
> > about the bug fix.
>
> I missed this reply, so I will take another look at the patch. Too many emails :-(
>
> Two other patches relating to this I have just posted:
>
> https://patchwork.linuxtv.org/project/linux-media/patch/[email protected]/
> https://patchwork.linuxtv.org/project/linux-media/patch/[email protected]/

Thanks! If a full set of patches are ready, please let us know.


Takashi

>
> Regards,
>
> Hans
>
> >
> >
> > Thanks!
> >
> > Takashi
> >
> >>
> >> From: Hans Verkuil <[email protected]>
> >> Date: Friday, February 23, 2024 at 8:44 AM
> >> To: Yang, Chenyuan <[email protected]>, [email protected] <[email protected]>, [email protected] <[email protected]>
> >> Cc: [email protected] <[email protected]>, [email protected] <[email protected]>, [email protected] <[email protected]>, Zhao, Zijie <[email protected]>, Zhang, Lingming <[email protected]>
> >> Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)
> >> Hi Chenyuan,
> >>
> >> Here is another patch for you to try. I think it is good for blocking CEC_ADAP_S_LOG_ADDRS
> >> ioctl calls, but if the filehandle is in non-blocking mode, I'm still not certain it
> >> is correct. But one issue at a time :-)
> >>
> >> Regards,
> >>
> >>         Hans
> >>
> >> diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
> >> index 559a172ebc6c..a493cbce2456 100644
> >> --- a/drivers/media/cec/core/cec-adap.c
> >> +++ b/drivers/media/cec/core/cec-adap.c
> >> @@ -936,8 +936,7 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg,
> >>           */
> >>          mutex_unlock(&adap->lock);
> >>          wait_for_completion_killable(&data->c);
> >> -       if (!data->completed)
> >> -               cancel_delayed_work_sync(&data->work);
> >> +       cancel_delayed_work_sync(&data->work);
> >>          mutex_lock(&adap->lock);
> >>
> >>          /* Cancel the transmit if it was interrupted */
> >> @@ -1575,9 +1574,12 @@ static int cec_config_thread_func(void *arg)
> >>   */
> >>  static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
> >>  {
> >> -       if (WARN_ON(adap->is_configuring || adap->is_configured))
> >> +       if (WARN_ON(adap->is_claiming_log_addrs ||
> >> +                   adap->is_configuring || adap->is_configured))
> >>                  return;
> >>
> >> +       adap->is_claiming_log_addrs = true;
> >> +
> >>          init_completion(&adap->config_completion);
> >>
> >>          /* Ready to kick off the thread */
> >> @@ -1592,6 +1594,7 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
> >>                  wait_for_completion(&adap->config_completion);
> >>                  mutex_lock(&adap->lock);
> >>          }
> >> +       adap->is_claiming_log_addrs = false;
> >>  }
> >>
> >>  /*
> >> diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c
> >> index 67dc79ef1705..3ef915344304 100644
> >> --- a/drivers/media/cec/core/cec-api.c
> >> +++ b/drivers/media/cec/core/cec-api.c
> >> @@ -178,7 +178,7 @@ static long cec_adap_s_log_addrs(struct cec_adapter *adap, struct cec_fh *fh,
> >>                             CEC_LOG_ADDRS_FL_ALLOW_RC_PASSTHRU |
> >>                             CEC_LOG_ADDRS_FL_CDC_ONLY;
> >>          mutex_lock(&adap->lock);
> >> -       if (!adap->is_configuring &&
> >> +       if (!adap->is_claiming_log_addrs && !adap->is_configuring &&
> >>              (!log_addrs.num_log_addrs || !adap->is_configured) &&
> >>              !cec_is_busy(adap, fh)) {
> >>                  err = __cec_s_log_addrs(adap, &log_addrs, block);
> >> @@ -664,6 +664,8 @@ static int cec_release(struct inode *inode, struct file *filp)
> >>                  list_del_init(&data->xfer_list);
> >>          }
> >>          mutex_unlock(&adap->lock);
> >> +
> >> +       mutex_lock(&fh->lock);
> >>          while (!list_empty(&fh->msgs)) {
> >>                  struct cec_msg_entry *entry =
> >>                          list_first_entry(&fh->msgs, struct cec_msg_entry, list);
> >> @@ -681,6 +683,7 @@ static int cec_release(struct inode *inode, struct file *filp)
> >>                          kfree(entry);
> >>                  }
> >>          }
> >> +       mutex_unlock(&fh->lock);
> >>          kfree(fh);
> >>
> >>          cec_put_device(devnode);
> >> diff --git a/include/media/cec.h b/include/media/cec.h
> >> index 10c9cf6058b7..cc3fcd0496c3 100644
> >> --- a/include/media/cec.h
> >> +++ b/include/media/cec.h
> >> @@ -258,6 +258,7 @@ struct cec_adapter {
> >>          u16 phys_addr;
> >>          bool needs_hpd;
> >>          bool is_enabled;
> >> +       bool is_claiming_log_addrs;
> >>          bool is_configuring;
> >>          bool must_reconfigure;
> >>          bool is_configured;
> >>
>

2024-04-22 20:58:05

by Hans Verkuil

[permalink] [raw]
Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)

On 22/04/2024 20:54, Yang, Chenyuan wrote:
> Hi Hans,
>
> Such timeout logs happen all the time when I execute the C program attached.
>
> ```
> gcc -pthread repro.c -o exe
> ./exe
> ```
>
> The logs are from QEMU:
>
> ```
> Debian GNU/Linux 11 syzkaller ttyS0
>
> syzkaller login: [ 326.705401][ T51] Bluetooth: hci0: sending frame failed (-49)
> [ 326.707063][ T4466] Bluetooth: hci0: Opcode 0x1003 failed: -49
> [ 335.945400][ T4466] Bluetooth: hci0: Opcode 0x1003 failed: -110
> [ 335.945417][ T51] Bluetooth: hci0: command 0x1003 tx timeout
> [ 390.885042][ T2019] cec-vivid-000-vid-out0: transmit timed out
> [ 390.894890][ T2050] cec-vivid-002-vid-cap0: transmit timed out
> [ 390.895540][ T2034] cec-vivid-001-vid-cap0: transmit timed out
> [ 390.905041][ T2067] cec-vivid-003-vid-out0: transmit timed out
> [ 392.985033][ T2018] cec-vivid-000-vid-cap0: transmit timed out

Hmm, I don't see this. With how many CPU cores is the qemu instance configured?
And with what module options is the vivid module loaded?

Regards,

Hans

> ...
> ```
>
> Best,
> Chenyuan
>
> On 4/22/24, 10:04 AM, "Hans Verkuil" <[email protected] <mailto:[email protected]>> wrote:
>
>
> Hi Chenyuan,
>
>
> My apologies for the delay, I missed your email.
>
>
> On 26/02/2024 13:27, Yang, Chenyuan wrote:
>> Hi Hans,
>>
>> Thank you for your continued efforts in investigating this bug and implementing the new patch!
>>
>> Regarding the two warnings, they have been addressed by this new patch and are no longer reproducible. Additionally, I conducted a 48-hour fuzzing test on the CEC driver, which has successfully eliminated the previous hanging issue.
>>
>> One thing to note that the system will now log timeout events:
>> ```
>> [ 2281.265385][ T2034] cec-vivid-001-vid-out0: transmit timed out
>> [ 2282.994510][ T2017] cec-vivid-000-vid-cap0: transmit timed out
>> [ 2283.063484][ T2050] cec-vivid-002-vid-out0: transmit timed out
>> [ 2283.073468][ T2065] cec-vivid-003-vid-cap0: transmit timed out
>> [ 2283.373518][ T2033] cec-vivid-001-vid-cap0: transmit timed out
>> [ 2285.113544][ T2018] cec-vivid-000-vid-out0: transmit timed out
>> [ 2285.193502][ T2050] cec-vivid-002-vid-out0: transmit timed out
>> [ 2285.193570][ T2065] cec-vivid-003-vid-cap0: transmit timed out
>> [ 2285.513570][ T2033] cec-vivid-001-vid-cap0: transmit timed out
>> ```
>
>
> Is this happening all the time, or just once in a (long?) while?
>
>
> Regards,
>
>
> Hans
>
>
>>
>> Best,
>> Chenyuan
>>
>> From: Hans Verkuil <[email protected] <mailto:[email protected]>>
>> Date: Friday, February 23, 2024 at 8:44 AM
>> To: Yang, Chenyuan <[email protected] <mailto:[email protected]>>, [email protected] <mailto:[email protected]> <[email protected] <mailto:[email protected]>>, [email protected] <mailto:[email protected]> <[email protected] <mailto:[email protected]>>
>> Cc: [email protected] <mailto:[email protected]> <[email protected] <mailto:[email protected]>>, [email protected] <mailto:[email protected]> <[email protected] <mailto:[email protected]>>, [email protected] <mailto:[email protected]> <[email protected] <mailto:[email protected]>>, Zhao, Zijie <[email protected] <mailto:[email protected]>>, Zhang, Lingming <[email protected] <mailto:[email protected]>>
>> Subject: Re: [Linux Kernel Bugs] KASAN: slab-use-after-free Read in cec_queue_msg_fh and 4 other crashes in the cec device (`cec_ioctl`)
>> Hi Chenyuan,
>>
>> Here is another patch for you to try. I think it is good for blocking CEC_ADAP_S_LOG_ADDRS
>> ioctl calls, but if the filehandle is in non-blocking mode, I'm still not certain it
>> is correct. But one issue at a time :-)
>>
>> Regards,
>>
>> Hans
>>
>> diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c
>> index 559a172ebc6c..a493cbce2456 100644
>> --- a/drivers/media/cec/core/cec-adap.c
>> +++ b/drivers/media/cec/core/cec-adap.c
>> @@ -936,8 +936,7 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg,
>> */
>> mutex_unlock(&adap->lock);
>> wait_for_completion_killable(&data->c);
>> - if (!data->completed)
>> - cancel_delayed_work_sync(&data->work);
>> + cancel_delayed_work_sync(&data->work);
>> mutex_lock(&adap->lock);
>>
>> /* Cancel the transmit if it was interrupted */
>> @@ -1575,9 +1574,12 @@ static int cec_config_thread_func(void *arg)
>> */
>> static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
>> {
>> - if (WARN_ON(adap->is_configuring || adap->is_configured))
>> + if (WARN_ON(adap->is_claiming_log_addrs ||
>> + adap->is_configuring || adap->is_configured))
>> return;
>>
>> + adap->is_claiming_log_addrs = true;
>> +
>> init_completion(&adap->config_completion);
>>
>> /* Ready to kick off the thread */
>> @@ -1592,6 +1594,7 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block)
>> wait_for_completion(&adap->config_completion);
>> mutex_lock(&adap->lock);
>> }
>> + adap->is_claiming_log_addrs = false;
>> }
>>
>> /*
>> diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c
>> index 67dc79ef1705..3ef915344304 100644
>> --- a/drivers/media/cec/core/cec-api.c
>> +++ b/drivers/media/cec/core/cec-api.c
>> @@ -178,7 +178,7 @@ static long cec_adap_s_log_addrs(struct cec_adapter *adap, struct cec_fh *fh,
>> CEC_LOG_ADDRS_FL_ALLOW_RC_PASSTHRU |
>> CEC_LOG_ADDRS_FL_CDC_ONLY;
>> mutex_lock(&adap->lock);
>> - if (!adap->is_configuring &&
>> + if (!adap->is_claiming_log_addrs && !adap->is_configuring &&
>> (!log_addrs.num_log_addrs || !adap->is_configured) &&
>> !cec_is_busy(adap, fh)) {
>> err = __cec_s_log_addrs(adap, &log_addrs, block);
>> @@ -664,6 +664,8 @@ static int cec_release(struct inode *inode, struct file *filp)
>> list_del_init(&data->xfer_list);
>> }
>> mutex_unlock(&adap->lock);
>> +
>> + mutex_lock(&fh->lock);
>> while (!list_empty(&fh->msgs)) {
>> struct cec_msg_entry *entry =
>> list_first_entry(&fh->msgs, struct cec_msg_entry, list);
>> @@ -681,6 +683,7 @@ static int cec_release(struct inode *inode, struct file *filp)
>> kfree(entry);
>> }
>> }
>> + mutex_unlock(&fh->lock);
>> kfree(fh);
>>
>> cec_put_device(devnode);
>> diff --git a/include/media/cec.h b/include/media/cec.h
>> index 10c9cf6058b7..cc3fcd0496c3 100644
>> --- a/include/media/cec.h
>> +++ b/include/media/cec.h
>> @@ -258,6 +258,7 @@ struct cec_adapter {
>> u16 phys_addr;
>> bool needs_hpd;
>> bool is_enabled;
>> + bool is_claiming_log_addrs;
>> bool is_configuring;
>> bool must_reconfigure;
>> bool is_configured;
>>
>
>
>
>
>