2022-08-09 08:23:07

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] mtd_blkdevs: add mtd_table_mutex lock back to blktrans_{open, release} to avoid race condition

On Tue, Aug 09, 2022 at 03:57:53PM +0800, Liwei Song wrote:
> without lock mtd_table_mutex in blktrans_{open, release}, there will
> be a race condition when access devices /dev/mtd1 and /dev/mtdblock1
> at the same time with a high frequency open and close test:
>
> kernel BUG at drivers/mtd/mtdcore.c:1221!
> lr : blktrans_release+0xb0/0x120

This does not seem on a current mainline kernel and seems to be
a somewhat incomplete backtrace. Can you send the full dmesg of
a latest mainline run and maybe share the reproducer?

> diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
> index b8ae1ec14e17..147e4a11dfe4 100644
> --- a/drivers/mtd/mtd_blkdevs.c
> +++ b/drivers/mtd/mtd_blkdevs.c
> @@ -188,6 +188,8 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode)
>
> kref_get(&dev->ref);
>
> + if (!mutex_trylock(&mtd_table_mutex))
> + return ret;

No, that's not really the solution.

Turning the kref_get above into a kref_get_unless_zero might be better
path to look into.


2022-08-09 09:17:26

by Liwei Song

[permalink] [raw]
Subject: Re: [PATCH] mtd_blkdevs: add mtd_table_mutex lock back to blktrans_{open, release} to avoid race condition



On 8/9/22 16:08, ChristophHellwig wrote:
> On Tue, Aug 09, 2022 at 03:57:53PM +0800, Liwei Song wrote:
>> without lock mtd_table_mutex in blktrans_{open, release}, there will
>> be a race condition when access devices /dev/mtd1 and /dev/mtdblock1
>> at the same time with a high frequency open and close test:
>>
>> kernel BUG at drivers/mtd/mtdcore.c:1221!
>> lr : blktrans_release+0xb0/0x120
>
> This does not seem on a current mainline kernel and seems to be
> a somewhat incomplete backtrace. Can you send the full dmesg of
> a latest mainline run and maybe share the reproducer?

Yes, the kernel I used is 5.15, unfortunately this is the latest version
that worked on my board, the whole log is:

[ 31.301343] ------------[ cut here ]------------
[ 31.301343] ------------[ cut here ]------------
[ 31.301365] kernel BUG at drivers/mtd/mtdcore.c:1221!
[ 31.314981] kernel BUG at drivers/mtd/mtdcore.c:1221!
[ 31.329328] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 31.374117] Modules linked in: 8021q sch_fq_codel openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse
[ 31.395820] CPU: 2 PID: 390 Comm: a.out Not tainted 5.15.58-yocto-standard #1
[ 31.412672] Hardware name: SoCFPGA Agilex SoCDK (DT)
[ 31.427372] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 31.444058] pc : __put_mtd_device+0x4c/0x84
[ 31.457977] lr : put_mtd_device+0x3c/0x5c
[ 31.464122] sp : ffff80000c26bc50
[ 31.466126] x29: ffff80000c26bc50 x28: ffff000285785100 x27: 0000000000000000
[ 31.471945] x26: 0000000045585401 x25: 0000000000000000 x24: ffff000285785768
[ 31.477762] x23: ffff0002803ee520 x22: ffff0002804f8900 x21: ffff000281956800
[ 31.483580] x20: ffff000281956800 x19: ffff000281955080 x18: 0000000000000000
[ 31.489402] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 31.495219] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 31.501037] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff8000088a79b0
[ 31.506855] x8 : ffff80000c26bcb0 x7 : 0000000000000000 x6 : 0000000000000001
[ 31.512673] x5 : ffff000280959488 x4 : 0000000000000000 x3 : 0000000000000000
[ 31.518491] x2 : ffff000281956800 x1 : 00000000ffffffff x0 : ffff000281955080
[ 31.524310] Call trace:
[ 31.525450] __put_mtd_device+0x4c/0x84
[ 31.527979] put_mtd_device+0x3c/0x5c
[ 31.530333] mtdchar_close+0x3c/0x84
[ 31.532604] __fput+0x78/0x220
[ 31.534357] ____fput+0x1c/0x30
[ 31.536193] task_work_run+0x88/0xc0
[ 31.538467] do_notify_resume+0x384/0x12a0
[ 31.541261] el0_svc+0x6c/0x80
[ 31.543015] el0t_64_sync_handler+0xa4/0x130
[ 31.545977] el0t_64_sync+0x1a0/0x1a4
[ 31.548338] Code: b9448841 51000421 b9048841 36ffff41 (d4210000)
[ 31.553115] ---[ end trace 9652b26ea1d7daa1 ]---
[ 31.556420] Internal error: Oops - BUG: 0 [#2] PREEMPT SMP
[ 31.556420] note: a.out[390] exited with preempt_count 1
[ 31.560588] Modules linked in: 8021q sch_fq_codel openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse
[ 31.575265] CPU: 3 PID: 391 Comm: a.out Tainted: G D 5.15.58-yocto-standard #1
[ 31.582466] Hardware name: SoCFPGA Agilex SoCDK (DT)
[ 31.586115] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 31.591757] pc : __put_mtd_device+0x4c/0x84
[ 31.594642] lr : blktrans_release+0xb0/0x120
[ 31.597603] sp : ffff80000c22bc20
[ 31.599608] x29: ffff80000c22bc20 x28: ffff000285785e80 x27: 0000000000000000
[ 31.605428] x26: 0000000045585401 x25: 0000000000000000 x24: ffff0002857864e8
[ 31.611247] x23: ffff0002803ee520 x22: ffff0002803e3470 x21: ffff0002803e3400
[ 31.617066] x20: ffff0002803e3020 x19: ffff000281955080 x18: 0000000000000000
[ 31.622884] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 31.628702] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 31.634519] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff8000088b0230
[ 31.640337] x8 : ffff80000c22bb90 x7 : 0000000000000000 x6 : 0000000000000001
[ 31.646155] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[ 31.651973] x2 : ffff000281956800 x1 : 00000000ffffffff x0 : ffff000281955080
[ 31.657792] Call trace:
[ 31.658933] __put_mtd_device+0x4c/0x84
[ 31.661462] blktrans_release+0xb0/0x120
[ 31.664077] blkdev_put+0xd4/0x210
[ 31.666175] blkdev_close+0x34/0x50
[ 31.668355] __fput+0x78/0x220
[ 31.670108] ____fput+0x1c/0x30
[ 31.671943] task_work_run+0x88/0xc0
[ 31.674217] do_notify_resume+0x384/0x12a0
[ 31.677009] el0_svc+0x6c/0x80
[ 31.678762] el0t_64_sync_handler+0xa4/0x130
[ 31.681723] el0t_64_sync+0x1a0/0x1a4
[ 31.684082] Code: b9448841 51000421 b9048841 36ffff41 (d4210000)
[ 31.688857] ---[ end trace 9652b26ea1d7daa2 ]---
[ 31.692161] note: a.out[391] exited with preempt_count 1

the testcase a.out is compiled from below code:
when run the case /dev/mtd1 and /dev/mtdblock1 will be used for open and close test.

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <signal.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
pid_t pid, pid1, pid2;
int fd,ret = 0;
int status = 0;
char device_char[12]="/dev/mtd";
char device_block[17]="/dev/mtdblock";

strcat(device_char, argv[1]);
strcat(device_block, argv[1]);

pid1 = fork();
if(pid1 == 0){
while(1){
fd = open(device_char, O_SYNC|O_RDWR);
ret = close(fd);
}
}
pid2 = fork();
if(pid2 == 0){
while(1){
fd = open(device_block, O_SYNC|O_RDWR);
ret = close(fd);
}
}
sleep(10);
kill(pid1, SIGKILL);
kill(pid2, SIGKILL);
pid = wait(&status);
pid = wait(&status);
return 0;
}

>
>> diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
>> index b8ae1ec14e17..147e4a11dfe4 100644
>> --- a/drivers/mtd/mtd_blkdevs.c
>> +++ b/drivers/mtd/mtd_blkdevs.c
>> @@ -188,6 +188,8 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode)
>>
>> kref_get(&dev->ref);
>>
>> + if (!mutex_trylock(&mtd_table_mutex))
>> + return ret;
>
> No, that's not really the solution.
>
> Turning the kref_get above into a kref_get_unless_zero might be better
> path to look into.

Thanks, I will have a look at this.

Liwei.


>

2022-08-10 08:27:22

by Liwei Song

[permalink] [raw]
Subject: Re: [PATCH] mtd_blkdevs: add mtd_table_mutex lock back to blktrans_{open, release} to avoid race condition



On 8/9/22 17:03, Liwei Song wrote:
>
>
> On 8/9/22 16:08, ChristophHellwig wrote:
>> On Tue, Aug 09, 2022 at 03:57:53PM +0800, Liwei Song wrote:
>>> without lock mtd_table_mutex in blktrans_{open, release}, there will
>>> be a race condition when access devices /dev/mtd1 and /dev/mtdblock1
>>> at the same time with a high frequency open and close test:
>>>
>>> kernel BUG at drivers/mtd/mtdcore.c:1221!
>>> lr : blktrans_release+0xb0/0x120
>>
>> This does not seem on a current mainline kernel and seems to be
>> a somewhat incomplete backtrace. Can you send the full dmesg of
>> a latest mainline run and maybe share the reproducer?
>
> Yes, the kernel I used is 5.15, unfortunately this is the latest version
> that worked on my board, the whole log is:
>
> [ 31.301343] ------------[ cut here ]------------
> [ 31.301343] ------------[ cut here ]------------
> [ 31.301365] kernel BUG at drivers/mtd/mtdcore.c:1221!
> [ 31.314981] kernel BUG at drivers/mtd/mtdcore.c:1221!
> [ 31.329328] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> [ 31.374117] Modules linked in: 8021q sch_fq_codel openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse
> [ 31.395820] CPU: 2 PID: 390 Comm: a.out Not tainted 5.15.58-yocto-standard #1
> [ 31.412672] Hardware name: SoCFPGA Agilex SoCDK (DT)
> [ 31.427372] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 31.444058] pc : __put_mtd_device+0x4c/0x84
> [ 31.457977] lr : put_mtd_device+0x3c/0x5c
> [ 31.464122] sp : ffff80000c26bc50
> [ 31.466126] x29: ffff80000c26bc50 x28: ffff000285785100 x27: 0000000000000000
> [ 31.471945] x26: 0000000045585401 x25: 0000000000000000 x24: ffff000285785768
> [ 31.477762] x23: ffff0002803ee520 x22: ffff0002804f8900 x21: ffff000281956800
> [ 31.483580] x20: ffff000281956800 x19: ffff000281955080 x18: 0000000000000000
> [ 31.489402] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> [ 31.495219] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> [ 31.501037] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff8000088a79b0
> [ 31.506855] x8 : ffff80000c26bcb0 x7 : 0000000000000000 x6 : 0000000000000001
> [ 31.512673] x5 : ffff000280959488 x4 : 0000000000000000 x3 : 0000000000000000
> [ 31.518491] x2 : ffff000281956800 x1 : 00000000ffffffff x0 : ffff000281955080
> [ 31.524310] Call trace:
> [ 31.525450] __put_mtd_device+0x4c/0x84
> [ 31.527979] put_mtd_device+0x3c/0x5c
> [ 31.530333] mtdchar_close+0x3c/0x84
> [ 31.532604] __fput+0x78/0x220
> [ 31.534357] ____fput+0x1c/0x30
> [ 31.536193] task_work_run+0x88/0xc0
> [ 31.538467] do_notify_resume+0x384/0x12a0
> [ 31.541261] el0_svc+0x6c/0x80
> [ 31.543015] el0t_64_sync_handler+0xa4/0x130
> [ 31.545977] el0t_64_sync+0x1a0/0x1a4
> [ 31.548338] Code: b9448841 51000421 b9048841 36ffff41 (d4210000)
> [ 31.553115] ---[ end trace 9652b26ea1d7daa1 ]---
> [ 31.556420] Internal error: Oops - BUG: 0 [#2] PREEMPT SMP
> [ 31.556420] note: a.out[390] exited with preempt_count 1
> [ 31.560588] Modules linked in: 8021q sch_fq_codel openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse
> [ 31.575265] CPU: 3 PID: 391 Comm: a.out Tainted: G D 5.15.58-yocto-standard #1
> [ 31.582466] Hardware name: SoCFPGA Agilex SoCDK (DT)
> [ 31.586115] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 31.591757] pc : __put_mtd_device+0x4c/0x84
> [ 31.594642] lr : blktrans_release+0xb0/0x120
> [ 31.597603] sp : ffff80000c22bc20
> [ 31.599608] x29: ffff80000c22bc20 x28: ffff000285785e80 x27: 0000000000000000
> [ 31.605428] x26: 0000000045585401 x25: 0000000000000000 x24: ffff0002857864e8
> [ 31.611247] x23: ffff0002803ee520 x22: ffff0002803e3470 x21: ffff0002803e3400
> [ 31.617066] x20: ffff0002803e3020 x19: ffff000281955080 x18: 0000000000000000
> [ 31.622884] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> [ 31.628702] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> [ 31.634519] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff8000088b0230
> [ 31.640337] x8 : ffff80000c22bb90 x7 : 0000000000000000 x6 : 0000000000000001
> [ 31.646155] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
> [ 31.651973] x2 : ffff000281956800 x1 : 00000000ffffffff x0 : ffff000281955080
> [ 31.657792] Call trace:
> [ 31.658933] __put_mtd_device+0x4c/0x84
> [ 31.661462] blktrans_release+0xb0/0x120
> [ 31.664077] blkdev_put+0xd4/0x210
> [ 31.666175] blkdev_close+0x34/0x50
> [ 31.668355] __fput+0x78/0x220
> [ 31.670108] ____fput+0x1c/0x30
> [ 31.671943] task_work_run+0x88/0xc0
> [ 31.674217] do_notify_resume+0x384/0x12a0
> [ 31.677009] el0_svc+0x6c/0x80
> [ 31.678762] el0t_64_sync_handler+0xa4/0x130
> [ 31.681723] el0t_64_sync+0x1a0/0x1a4
> [ 31.684082] Code: b9448841 51000421 b9048841 36ffff41 (d4210000)
> [ 31.688857] ---[ end trace 9652b26ea1d7daa2 ]---
> [ 31.692161] note: a.out[391] exited with preempt_count 1
>
> the testcase a.out is compiled from below code:
> when run the case /dev/mtd1 and /dev/mtdblock1 will be used for open and close test.
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <string.h>
> #include <signal.h>
> #include <unistd.h>
>
> int main(int argc, char *argv[])
> {
> pid_t pid, pid1, pid2;
> int fd,ret = 0;
> int status = 0;
> char device_char[12]="/dev/mtd";
> char device_block[17]="/dev/mtdblock";
>
> strcat(device_char, argv[1]);
> strcat(device_block, argv[1]);
>
> pid1 = fork();
> if(pid1 == 0){
> while(1){
> fd = open(device_char, O_SYNC|O_RDWR);
> ret = close(fd);
> }
> }
> pid2 = fork();
> if(pid2 == 0){
> while(1){
> fd = open(device_block, O_SYNC|O_RDWR);
> ret = close(fd);
> }
> }
> sleep(10);
> kill(pid1, SIGKILL);
> kill(pid2, SIGKILL);
> pid = wait(&status);
> pid = wait(&status);
> return 0;
> }
>
>>
>>> diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
>>> index b8ae1ec14e17..147e4a11dfe4 100644
>>> --- a/drivers/mtd/mtd_blkdevs.c
>>> +++ b/drivers/mtd/mtd_blkdevs.c
>>> @@ -188,6 +188,8 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode)
>>>
>>> kref_get(&dev->ref);
>>>
>>> + if (!mutex_trylock(&mtd_table_mutex))
>>> + return ret;
>>
>> No, that's not really the solution.
>>
>> Turning the kref_get above into a kref_get_unless_zero might be better
>> path to look into.
>
> Thanks, I will have a look at this.

Hi Christoph,

It seems this way can not stop the race to decrease/increase mtd->usecount,
the race condition is between mtdchar_{open, close}()->(get)put_mtd_device()->__(get)put_mtd_device()
and blktrans_{open,release}()-> __(get)put_mtd_device(), when operate the same device
as char device(/dev/mtd1) and block device(/dev/mtdblock1), the original fix for
this issue is 073db4a51ee4 ("mtd: fix: avoid race condition when accessing mtd->usecount"),
Could you give some suggestion about this?

Thanks,
Liwei.



>
> Liwei.
>
>
>>

2022-09-05 08:25:53

by Liwei Song

[permalink] [raw]
Subject: Re: [PATCH] mtd_blkdevs: add mtd_table_mutex lock back to blktrans_{open, release} to avoid race condition



On 8/10/22 15:47, Liwei Song wrote:
>
>
> On 8/9/22 17:03, Liwei Song wrote:
>>
>>
>> On 8/9/22 16:08, ChristophHellwig wrote:
>>> On Tue, Aug 09, 2022 at 03:57:53PM +0800, Liwei Song wrote:
>>>> without lock mtd_table_mutex in blktrans_{open, release}, there will
>>>> be a race condition when access devices /dev/mtd1 and /dev/mtdblock1
>>>> at the same time with a high frequency open and close test:
>>>>
>>>> kernel BUG at drivers/mtd/mtdcore.c:1221!
>>>> lr : blktrans_release+0xb0/0x120
>>>
>>> This does not seem on a current mainline kernel and seems to be
>>> a somewhat incomplete backtrace. Can you send the full dmesg of
>>> a latest mainline run and maybe share the reproducer?
>>
>> Yes, the kernel I used is 5.15, unfortunately this is the latest version
>> that worked on my board, the whole log is:
>>
>> [ 31.301343] ------------[ cut here ]------------
>> [ 31.301343] ------------[ cut here ]------------
>> [ 31.301365] kernel BUG at drivers/mtd/mtdcore.c:1221!
>> [ 31.314981] kernel BUG at drivers/mtd/mtdcore.c:1221!
>> [ 31.329328] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>> [ 31.374117] Modules linked in: 8021q sch_fq_codel openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse
>> [ 31.395820] CPU: 2 PID: 390 Comm: a.out Not tainted 5.15.58-yocto-standard #1
>> [ 31.412672] Hardware name: SoCFPGA Agilex SoCDK (DT)
>> [ 31.427372] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [ 31.444058] pc : __put_mtd_device+0x4c/0x84
>> [ 31.457977] lr : put_mtd_device+0x3c/0x5c
>> [ 31.464122] sp : ffff80000c26bc50
>> [ 31.466126] x29: ffff80000c26bc50 x28: ffff000285785100 x27: 0000000000000000
>> [ 31.471945] x26: 0000000045585401 x25: 0000000000000000 x24: ffff000285785768
>> [ 31.477762] x23: ffff0002803ee520 x22: ffff0002804f8900 x21: ffff000281956800
>> [ 31.483580] x20: ffff000281956800 x19: ffff000281955080 x18: 0000000000000000
>> [ 31.489402] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>> [ 31.495219] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
>> [ 31.501037] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff8000088a79b0
>> [ 31.506855] x8 : ffff80000c26bcb0 x7 : 0000000000000000 x6 : 0000000000000001
>> [ 31.512673] x5 : ffff000280959488 x4 : 0000000000000000 x3 : 0000000000000000
>> [ 31.518491] x2 : ffff000281956800 x1 : 00000000ffffffff x0 : ffff000281955080
>> [ 31.524310] Call trace:
>> [ 31.525450] __put_mtd_device+0x4c/0x84
>> [ 31.527979] put_mtd_device+0x3c/0x5c
>> [ 31.530333] mtdchar_close+0x3c/0x84
>> [ 31.532604] __fput+0x78/0x220
>> [ 31.534357] ____fput+0x1c/0x30
>> [ 31.536193] task_work_run+0x88/0xc0
>> [ 31.538467] do_notify_resume+0x384/0x12a0
>> [ 31.541261] el0_svc+0x6c/0x80
>> [ 31.543015] el0t_64_sync_handler+0xa4/0x130
>> [ 31.545977] el0t_64_sync+0x1a0/0x1a4
>> [ 31.548338] Code: b9448841 51000421 b9048841 36ffff41 (d4210000)
>> [ 31.553115] ---[ end trace 9652b26ea1d7daa1 ]---
>> [ 31.556420] Internal error: Oops - BUG: 0 [#2] PREEMPT SMP
>> [ 31.556420] note: a.out[390] exited with preempt_count 1
>> [ 31.560588] Modules linked in: 8021q sch_fq_codel openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse
>> [ 31.575265] CPU: 3 PID: 391 Comm: a.out Tainted: G D 5.15.58-yocto-standard #1
>> [ 31.582466] Hardware name: SoCFPGA Agilex SoCDK (DT)
>> [ 31.586115] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [ 31.591757] pc : __put_mtd_device+0x4c/0x84
>> [ 31.594642] lr : blktrans_release+0xb0/0x120
>> [ 31.597603] sp : ffff80000c22bc20
>> [ 31.599608] x29: ffff80000c22bc20 x28: ffff000285785e80 x27: 0000000000000000
>> [ 31.605428] x26: 0000000045585401 x25: 0000000000000000 x24: ffff0002857864e8
>> [ 31.611247] x23: ffff0002803ee520 x22: ffff0002803e3470 x21: ffff0002803e3400
>> [ 31.617066] x20: ffff0002803e3020 x19: ffff000281955080 x18: 0000000000000000
>> [ 31.622884] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>> [ 31.628702] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
>> [ 31.634519] x11: 0000000000000000 x10: 0000000000000000 x9 : ffff8000088b0230
>> [ 31.640337] x8 : ffff80000c22bb90 x7 : 0000000000000000 x6 : 0000000000000001
>> [ 31.646155] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
>> [ 31.651973] x2 : ffff000281956800 x1 : 00000000ffffffff x0 : ffff000281955080
>> [ 31.657792] Call trace:
>> [ 31.658933] __put_mtd_device+0x4c/0x84
>> [ 31.661462] blktrans_release+0xb0/0x120
>> [ 31.664077] blkdev_put+0xd4/0x210
>> [ 31.666175] blkdev_close+0x34/0x50
>> [ 31.668355] __fput+0x78/0x220
>> [ 31.670108] ____fput+0x1c/0x30
>> [ 31.671943] task_work_run+0x88/0xc0
>> [ 31.674217] do_notify_resume+0x384/0x12a0
>> [ 31.677009] el0_svc+0x6c/0x80
>> [ 31.678762] el0t_64_sync_handler+0xa4/0x130
>> [ 31.681723] el0t_64_sync+0x1a0/0x1a4
>> [ 31.684082] Code: b9448841 51000421 b9048841 36ffff41 (d4210000)
>> [ 31.688857] ---[ end trace 9652b26ea1d7daa2 ]---
>> [ 31.692161] note: a.out[391] exited with preempt_count 1
>>
>> the testcase a.out is compiled from below code:
>> when run the case /dev/mtd1 and /dev/mtdblock1 will be used for open and close test.
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <sys/types.h>
>> #include <sys/stat.h>
>> #include <fcntl.h>
>> #include <string.h>
>> #include <signal.h>
>> #include <unistd.h>
>>
>> int main(int argc, char *argv[])
>> {
>> pid_t pid, pid1, pid2;
>> int fd,ret = 0;
>> int status = 0;
>> char device_char[12]="/dev/mtd";
>> char device_block[17]="/dev/mtdblock";
>>
>> strcat(device_char, argv[1]);
>> strcat(device_block, argv[1]);
>>
>> pid1 = fork();
>> if(pid1 == 0){
>> while(1){
>> fd = open(device_char, O_SYNC|O_RDWR);
>> ret = close(fd);
>> }
>> }
>> pid2 = fork();
>> if(pid2 == 0){
>> while(1){
>> fd = open(device_block, O_SYNC|O_RDWR);
>> ret = close(fd);
>> }
>> }
>> sleep(10);
>> kill(pid1, SIGKILL);
>> kill(pid2, SIGKILL);
>> pid = wait(&status);
>> pid = wait(&status);
>> return 0;
>> }
>>
>>>
>>>> diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
>>>> index b8ae1ec14e17..147e4a11dfe4 100644
>>>> --- a/drivers/mtd/mtd_blkdevs.c
>>>> +++ b/drivers/mtd/mtd_blkdevs.c
>>>> @@ -188,6 +188,8 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode)
>>>>
>>>> kref_get(&dev->ref);
>>>>
>>>> + if (!mutex_trylock(&mtd_table_mutex))
>>>> + return ret;
>>>
>>> No, that's not really the solution.
>>>
>>> Turning the kref_get above into a kref_get_unless_zero might be better
>>> path to look into.
>>
>> Thanks, I will have a look at this.
>
> Hi Christoph,
>
> It seems this way can not stop the race to decrease/increase mtd->usecount,
> the race condition is between mtdchar_{open, close}()->(get)put_mtd_device()->__(get)put_mtd_device()
> and blktrans_{open,release}()-> __(get)put_mtd_device(), when operate the same device
> as char device(/dev/mtd1) and block device(/dev/mtdblock1), the original fix for
> this issue is 073db4a51ee4 ("mtd: fix: avoid race condition when accessing mtd->usecount"),
> Could you give some suggestion about this?

Hi,

Any suggestion about my new found?

Thanks,
Liwei.


>
> Thanks,
> Liwei.
>
>
>
>>
>> Liwei.
>>
>>
>>>

2022-09-09 15:29:23

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] mtd_blkdevs: add mtd_table_mutex lock back to blktrans_{open, release} to avoid race condition

Can you try this patch (it'll need to be split up into a few if it
works):

diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
index 60b222799871e..9eda1dd98a406 100644
--- a/drivers/mtd/mtd_blkdevs.c
+++ b/drivers/mtd/mtd_blkdevs.c
@@ -24,24 +24,16 @@

static LIST_HEAD(blktrans_majors);

-static void blktrans_dev_release(struct kref *kref)
+static void blktrans_free_disk(struct gendisk *disk)
{
- struct mtd_blktrans_dev *dev =
- container_of(kref, struct mtd_blktrans_dev, ref);
+ struct mtd_blktrans_dev *dev = disk->private_data;

- put_disk(dev->disk);
blk_mq_free_tag_set(dev->tag_set);
kfree(dev->tag_set);
list_del(&dev->list);
kfree(dev);
}

-static void blktrans_dev_put(struct mtd_blktrans_dev *dev)
-{
- kref_put(&dev->ref, blktrans_dev_release);
-}
-
-
static blk_status_t do_blktrans_request(struct mtd_blktrans_ops *tr,
struct mtd_blktrans_dev *dev,
struct request *req)
@@ -187,63 +179,58 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode)
struct mtd_blktrans_dev *dev = bdev->bd_disk->private_data;
int ret = 0;

- kref_get(&dev->ref);
+ if (disk_openers(bdev->bd_disk) > 0)
+ return 0;

- mutex_lock(&dev->lock);
-
- if (dev->open)
- goto unlock;
+ mutex_lock(&mtd_table_mutex);
+ if (!dev->mtd) {
+ mutex_lock(&mtd_table_mutex);
+ return -EINVAL;
+ }
+ ret = __get_mtd_device(dev->mtd);
+ mutex_unlock(&mtd_table_mutex);
+ if (ret)
+ return ret;

+ mutex_lock(&dev->lock);
__module_get(dev->tr->owner);
-
- if (!dev->mtd)
- goto unlock;
-
if (dev->tr->open) {
ret = dev->tr->open(dev);
if (ret)
goto error_put;
}
-
- ret = __get_mtd_device(dev->mtd);
- if (ret)
- goto error_release;
dev->file_mode = mode;
-
-unlock:
dev->open++;
mutex_unlock(&dev->lock);
- return ret;

-error_release:
- if (dev->tr->release)
- dev->tr->release(dev);
+ return 0;
+
error_put:
module_put(dev->tr->owner);
mutex_unlock(&dev->lock);
- blktrans_dev_put(dev);
+
+ put_mtd_device(dev->mtd);
return ret;
}

static void blktrans_release(struct gendisk *disk, fmode_t mode)
{
struct mtd_blktrans_dev *dev = disk->private_data;
+ struct mtd_info *mtd = NULL;

- mutex_lock(&dev->lock);
-
- if (--dev->open)
- goto unlock;
+ if (disk_openers(disk) > 0)
+ return;

+ mutex_lock(&dev->lock);
+ dev->open--;
module_put(dev->tr->owner);
-
- if (dev->mtd) {
- if (dev->tr->release)
- dev->tr->release(dev);
- __put_mtd_device(dev->mtd);
- }
-unlock:
+ mtd = dev->mtd;
+ if (mtd && dev->tr->release)
+ dev->tr->release(dev);
mutex_unlock(&dev->lock);
- blktrans_dev_put(dev);
+
+ if (mtd)
+ put_mtd_device(dev->mtd);
}

static int blktrans_getgeo(struct block_device *bdev, struct hd_geometry *geo)
@@ -266,6 +253,7 @@ static const struct block_device_operations mtd_block_ops = {
.owner = THIS_MODULE,
.open = blktrans_open,
.release = blktrans_release,
+ .free_disk = blktrans_free_disk,
.getgeo = blktrans_getgeo,
};

@@ -318,7 +306,6 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
added:

mutex_init(&new->lock);
- kref_init(&new->ref);
if (!tr->writesect)
new->readonly = 1;

@@ -410,6 +397,7 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)

int del_mtd_blktrans_dev(struct mtd_blktrans_dev *old)
{
+ struct mtd_info *old_mtd = NULL;
unsigned long flags;

lockdep_assert_held(&mtd_table_mutex);
@@ -438,13 +426,14 @@ int del_mtd_blktrans_dev(struct mtd_blktrans_dev *old)
if (old->open) {
if (old->tr->release)
old->tr->release(old);
- __put_mtd_device(old->mtd);
+ old_mtd = old->mtd;
}
-
old->mtd = NULL;
-
mutex_unlock(&old->lock);
- blktrans_dev_put(old);
+ put_disk(old->disk);
+
+ if (old->mtd)
+ put_mtd_device(old_mtd);
return 0;
}

diff --git a/include/linux/mtd/blktrans.h b/include/linux/mtd/blktrans.h
index 15cc9b95e32b5..41a81fc9f0462 100644
--- a/include/linux/mtd/blktrans.h
+++ b/include/linux/mtd/blktrans.h
@@ -7,7 +7,6 @@
#define __MTD_TRANS_H__

#include <linux/mutex.h>
-#include <linux/kref.h>
#include <linux/sysfs.h>

struct hd_geometry;
@@ -26,7 +25,6 @@ struct mtd_blktrans_dev {
unsigned long size;
int readonly;
int open;
- struct kref ref;
struct gendisk *disk;
struct attribute_group *disk_attributes;
struct request_queue *rq;

2022-09-13 06:20:26

by Liwei Song

[permalink] [raw]
Subject: Re: [PATCH] mtd_blkdevs: add mtd_table_mutex lock back to blktrans_{open, release} to avoid race condition



On 9/9/22 22:36, ChristophHellwig wrote:
> Can you try this patch (it'll need to be split up into a few if it
> works):

Hi Christoph,

Thanks for your help,
With this patch, the race condition issue I met can be fixed, but there will be
a deadlock issue as below when boot the board:

[ 10.277872] ======================================================
[ 10.282765] WARNING: possible circular locking dependency detected
[ 10.287661] 5.18.0-yocto-standard+ #4 Not tainted
[ 10.291078] ------------------------------------------------------
[ 10.295967] systemd-udevd/173 is trying to acquire lock:
[ 10.299994] ffff800009999150 (mtd_table_mutex){+.+.}-{3:3}, at: blktrans_open+0x60/0x128
[ 10.306854]
but task is already holding lock:
[ 10.310097] ffff000180832718 (&disk->open_mutex){+.+.}-{3:3}, at: blkdev_get_by_dev+0xf8/0x2f8
[ 10.317464]
which lock already depends on the new lock.

[ 10.321749]
the existing dependency chain (in reverse order) is:
[ 10.326659]
-> #1 (&disk->open_mutex){+.+.}-{3:3}:
[ 10.330373] __mutex_lock+0x90/0x8f8
[ 10.333202] mutex_lock_nested+0x54/0x70
[ 10.336365] bd_register_pending_holders+0x30/0x120
[ 10.340489] device_add_disk+0x1f4/0x370
[ 10.343669] add_mtd_blktrans_dev+0x304/0x468
[ 10.343691] mtdblock_add_mtd+0x7c/0xd0
[ 10.343702] blktrans_notify_add+0x50/0x70
[ 10.343713] add_mtd_device+0x40c/0x460
[ 10.343724] add_mtd_partitions+0xa0/0x1a8
[ 10.343735] parse_mtd_partitions+0x1b4/0x400
[ 10.343746] mtd_device_parse_register+0x94/0x2a0
[ 10.343757] spi_nor_probe+0x220/0x2e8
[ 10.343767] spi_mem_probe+0x74/0xb0
[ 10.343777] spi_probe+0x88/0xe0
[ 10.343787] really_probe+0xb0/0x268
[ 10.343801] __driver_probe_device+0x7c/0xe0
[ 10.343811] driver_probe_device+0x50/0x100
[ 10.343823] __device_attach_driver+0x98/0xd0
[ 10.343833] bus_for_each_drv+0x74/0xd8
[ 10.343844] __device_attach+0xf0/0x150
[ 10.343854] device_initial_probe+0x24/0x30
[ 10.343865] bus_probe_device+0xa4/0xb0
[ 10.343875] device_add+0x424/0x8b8
[ 10.343884] __spi_add_device+0x7c/0x120
[ 10.343898] spi_add_device+0x4c/0x80
[ 10.343909] of_register_spi_device+0x228/0x380
[ 10.343920] spi_register_controller+0x3c4/0x718
[ 10.343931] devm_spi_register_controller+0x28/0x80
[ 10.343943] cqspi_probe+0x714/0x9f8
[ 10.343952] platform_probe+0x6c/0xd8
[ 10.343962] really_probe+0xb0/0x268
[ 10.343972] __driver_probe_device+0x7c/0xe0
[ 10.343982] driver_probe_device+0x50/0x100
[ 10.343994] __driver_attach+0xa4/0x100
[ 10.344004] bus_for_each_dev+0x84/0xd8
[ 10.344014] driver_attach+0x30/0x40
[ 10.344024] bus_add_driver+0x160/0x208
[ 10.344034] driver_register+0x64/0x110
[ 10.344045] __platform_driver_register+0x34/0x40
[ 10.344054] cqspi_platform_driver_init+0x20/0x28
[ 10.344068] do_one_initcall+0xa4/0x440
[ 10.344080] kernel_init_freeable+0x320/0x378
[ 10.344092] kernel_init+0x2c/0x138
[ 10.344104] ret_from_fork+0x10/0x20
[ 10.344115]
-> #0 (mtd_table_mutex){+.+.}-{3:3}:
[ 10.344137] __lock_acquire+0x1118/0x15d8
[ 10.344148] lock_acquire+0x2b8/0x3f8
[ 10.344156] __mutex_lock+0x90/0x8f8
[ 10.344166] mutex_lock_nested+0x54/0x70
[ 10.344177] blktrans_open+0x60/0x128
[ 10.344189] blkdev_get_whole+0x3c/0xb8
[ 10.344203] blkdev_get_by_dev+0x1ac/0x2f8
[ 10.344211] blkdev_open+0x64/0xb8
[ 10.344219] do_dentry_open+0x1b0/0x3b8
[ 10.344234] vfs_open+0x38/0x48
[ 10.344243] path_openat+0x758/0x938
[ 10.344255] do_filp_open+0x94/0x118
[ 10.344267] do_sys_openat2+0x234/0x308
[ 10.344278] do_sys_open+0x84/0xc0
[ 10.344286] __arm64_sys_openat+0x2c/0x38
[ 10.344295] invoke_syscall+0x64/0x138
[ 10.344309] el0_svc_common.constprop.4+0xf8/0x118
[ 10.344320] do_el0_svc+0x80/0xa0
[ 10.344330] el0_svc+0x68/0x1c8
[ 10.344339] el0t_64_sync_handler+0x88/0xb0
[ 10.344349] el0t_64_sync+0x15c/0x160
[ 10.344358]
other info that might help us debug this:

[ 10.344362] Possible unsafe locking scenario:

[ 10.344366] CPU0 CPU1
[ 10.344370] ---- ----
[ 10.344373] lock(&disk->open_mutex);
[ 10.344384] lock(mtd_table_mutex);
[ 10.344395] lock(&disk->open_mutex);
[ 10.344406] lock(mtd_table_mutex);
[ 10.344416]
*** DEADLOCK ***

[ 10.344420] 1 lock held by systemd-udevd/173:
[ 10.344427] #0: ffff000180832718 (&disk->open_mutex){+.+.}-{3:3}, at: blkdev_get_by_dev+0xf8/0x2f8
[ 10.344457]
stack backtrace:
[ 10.344465] CPU: 3 PID: 173 Comm: systemd-udevd Not tainted 5.18.0-yocto-standard+ #4
[ 10.344476] Hardware name: SoCFPGA Stratix 10 SoCDK (DT)
[ 10.344483] Call trace:
[ 10.344488] dump_backtrace.part.6+0xf4/0x100
[ 10.344498] show_stack+0x2c/0x48
[ 10.344508] dump_stack_lvl+0x9c/0xcc
[ 10.344522] dump_stack+0x14/0x2c
[ 10.344532] print_circular_bug.isra.49+0x1a8/0x200
[ 10.344546] check_noncircular+0x124/0x138
[ 10.344557] __lock_acquire+0x1118/0x15d8
[ 10.344566] lock_acquire+0x2b8/0x3f8
[ 10.344574] __mutex_lock+0x90/0x8f8
[ 10.344585] mutex_lock_nested+0x54/0x70
[ 10.344595] blktrans_open+0x60/0x128
[ 10.344607] blkdev_get_whole+0x3c/0xb8
[ 10.344618] blkdev_get_by_dev+0x1ac/0x2f8
[ 10.344627] blkdev_open+0x64/0xb8
[ 10.344636] do_dentry_open+0x1b0/0x3b8
[ 10.344647] vfs_open+0x38/0x48
[ 10.344656] path_openat+0x758/0x938
[ 10.344666] do_filp_open+0x94/0x118
[ 10.344676] do_sys_openat2+0x234/0x308
[ 10.344687] do_sys_open+0x84/0xc0
[ 10.344695] __arm64_sys_openat+0x2c/0x38
[ 10.344704] invoke_syscall+0x64/0x138
[ 10.344715] el0_svc_common.constprop.4+0xf8/0x118
[ 10.344726] do_el0_svc+0x80/0xa0
[ 10.344736] el0_svc+0x68/0x1c8
[ 10.344745] el0t_64_sync_handler+0x88/0xb0
[ 10.344754] el0t_64_sync+0x15c/0x160


Thanks,
Liwei.


>
> diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
> index 60b222799871e..9eda1dd98a406 100644
> --- a/drivers/mtd/mtd_blkdevs.c
> +++ b/drivers/mtd/mtd_blkdevs.c
> @@ -24,24 +24,16 @@
>
> static LIST_HEAD(blktrans_majors);
>
> -static void blktrans_dev_release(struct kref *kref)
> +static void blktrans_free_disk(struct gendisk *disk)
> {
> - struct mtd_blktrans_dev *dev =
> - container_of(kref, struct mtd_blktrans_dev, ref);
> + struct mtd_blktrans_dev *dev = disk->private_data;
>
> - put_disk(dev->disk);
> blk_mq_free_tag_set(dev->tag_set);
> kfree(dev->tag_set);
> list_del(&dev->list);
> kfree(dev);
> }
>
> -static void blktrans_dev_put(struct mtd_blktrans_dev *dev)
> -{
> - kref_put(&dev->ref, blktrans_dev_release);
> -}
> -
> -
> static blk_status_t do_blktrans_request(struct mtd_blktrans_ops *tr,
> struct mtd_blktrans_dev *dev,
> struct request *req)
> @@ -187,63 +179,58 @@ static int blktrans_open(struct block_device *bdev, fmode_t mode)
> struct mtd_blktrans_dev *dev = bdev->bd_disk->private_data;
> int ret = 0;
>
> - kref_get(&dev->ref);
> + if (disk_openers(bdev->bd_disk) > 0)
> + return 0;
>
> - mutex_lock(&dev->lock);
> -
> - if (dev->open)
> - goto unlock;
> + mutex_lock(&mtd_table_mutex);
> + if (!dev->mtd) {
> + mutex_lock(&mtd_table_mutex);
> + return -EINVAL;
> + }
> + ret = __get_mtd_device(dev->mtd);
> + mutex_unlock(&mtd_table_mutex);
> + if (ret)
> + return ret;
>
> + mutex_lock(&dev->lock);
> __module_get(dev->tr->owner);
> -
> - if (!dev->mtd)
> - goto unlock;
> -
> if (dev->tr->open) {
> ret = dev->tr->open(dev);
> if (ret)
> goto error_put;
> }
> -
> - ret = __get_mtd_device(dev->mtd);
> - if (ret)
> - goto error_release;
> dev->file_mode = mode;
> -
> -unlock:
> dev->open++;
> mutex_unlock(&dev->lock);
> - return ret;
>
> -error_release:
> - if (dev->tr->release)
> - dev->tr->release(dev);
> + return 0;
> +
> error_put:
> module_put(dev->tr->owner);
> mutex_unlock(&dev->lock);
> - blktrans_dev_put(dev);
> +
> + put_mtd_device(dev->mtd);
> return ret;
> }
>
> static void blktrans_release(struct gendisk *disk, fmode_t mode)
> {
> struct mtd_blktrans_dev *dev = disk->private_data;
> + struct mtd_info *mtd = NULL;
>
> - mutex_lock(&dev->lock);
> -
> - if (--dev->open)
> - goto unlock;
> + if (disk_openers(disk) > 0)
> + return;
>
> + mutex_lock(&dev->lock);
> + dev->open--;
> module_put(dev->tr->owner);
> -
> - if (dev->mtd) {
> - if (dev->tr->release)
> - dev->tr->release(dev);
> - __put_mtd_device(dev->mtd);
> - }
> -unlock:
> + mtd = dev->mtd;
> + if (mtd && dev->tr->release)
> + dev->tr->release(dev);
> mutex_unlock(&dev->lock);
> - blktrans_dev_put(dev);
> +
> + if (mtd)
> + put_mtd_device(dev->mtd);
> }
>
> static int blktrans_getgeo(struct block_device *bdev, struct hd_geometry *geo)
> @@ -266,6 +253,7 @@ static const struct block_device_operations mtd_block_ops = {
> .owner = THIS_MODULE,
> .open = blktrans_open,
> .release = blktrans_release,
> + .free_disk = blktrans_free_disk,
> .getgeo = blktrans_getgeo,
> };
>
> @@ -318,7 +306,6 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
> added:
>
> mutex_init(&new->lock);
> - kref_init(&new->ref);
> if (!tr->writesect)
> new->readonly = 1;
>
> @@ -410,6 +397,7 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
>
> int del_mtd_blktrans_dev(struct mtd_blktrans_dev *old)
> {
> + struct mtd_info *old_mtd = NULL;
> unsigned long flags;
>
> lockdep_assert_held(&mtd_table_mutex);
> @@ -438,13 +426,14 @@ int del_mtd_blktrans_dev(struct mtd_blktrans_dev *old)
> if (old->open) {
> if (old->tr->release)
> old->tr->release(old);
> - __put_mtd_device(old->mtd);
> + old_mtd = old->mtd;
> }
> -
> old->mtd = NULL;
> -
> mutex_unlock(&old->lock);
> - blktrans_dev_put(old);
> + put_disk(old->disk);
> +
> + if (old->mtd)
> + put_mtd_device(old_mtd);
> return 0;
> }
>
> diff --git a/include/linux/mtd/blktrans.h b/include/linux/mtd/blktrans.h
> index 15cc9b95e32b5..41a81fc9f0462 100644
> --- a/include/linux/mtd/blktrans.h
> +++ b/include/linux/mtd/blktrans.h
> @@ -7,7 +7,6 @@
> #define __MTD_TRANS_H__
>
> #include <linux/mutex.h>
> -#include <linux/kref.h>
> #include <linux/sysfs.h>
>
> struct hd_geometry;
> @@ -26,7 +25,6 @@ struct mtd_blktrans_dev {
> unsigned long size;
> int readonly;
> int open;
> - struct kref ref;
> struct gendisk *disk;
> struct attribute_group *disk_attributes;
> struct request_queue *rq;
>