2021-01-27 22:08:36

by Daniel Wagner

[permalink] [raw]
Subject: [PATCH v2] nvme-multipath: Early exit if no path is available

nvme_round_robin_path() should test if the return ns pointer is
valid. nvme_next_ns() will return a NULL pointer if there is no path
left.

Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy")
Cc: Hannes Reinecke <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
---
v2:
- moved NULL test into the if conditional statement
- added Fixes tag

drivers/nvme/host/multipath.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 9ac762b28811..282b7a4ea9a9 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
}

for (ns = nvme_next_ns(head, old);
- ns != old;
+ ns && ns != old;
ns = nvme_next_ns(head, ns)) {
if (nvme_path_is_disabled(ns))
continue;
--
2.29.2


2021-01-27 22:08:55

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available

On 1/27/21 11:30 AM, Daniel Wagner wrote:
> nvme_round_robin_path() should test if the return ns pointer is
> valid. nvme_next_ns() will return a NULL pointer if there is no path
> left.
>
> Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy")
> Cc: Hannes Reinecke <[email protected]>
> Signed-off-by: Daniel Wagner <[email protected]>
> ---
> v2:
> - moved NULL test into the if conditional statement
> - added Fixes tag
>
> drivers/nvme/host/multipath.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 9ac762b28811..282b7a4ea9a9 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
> }
>
> for (ns = nvme_next_ns(head, old);
> - ns != old;
> + ns && ns != old;
> ns = nvme_next_ns(head, ns)) {
> if (nvme_path_is_disabled(ns))
> continue;
>
Reviewed-by: Hannes Reinecke <[email protected]>

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

2021-01-28 00:08:04

by Christoph Hellwig

[permalink] [raw]

2021-01-28 01:42:10

by Chao Leng

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available



On 2021/1/27 18:30, Daniel Wagner wrote:
> nvme_round_robin_path() should test if the return ns pointer is
> valid. nvme_next_ns() will return a NULL pointer if there is no path
> left.
>
> Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy")
> Cc: Hannes Reinecke <[email protected]>
> Signed-off-by: Daniel Wagner <[email protected]>
> ---
> v2:
> - moved NULL test into the if conditional statement
> - added Fixes tag
>
> drivers/nvme/host/multipath.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 9ac762b28811..282b7a4ea9a9 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
> }
>
> for (ns = nvme_next_ns(head, old);
> - ns != old;
> + ns && ns != old;
nvme_round_robin_path just be called when !"old".
nvme_next_ns should not return NULL when !"old".
It seems unnecessary to add checking "ns".
Is there a bug that "old" in not in "head" list?
If yes, we should fix it.
> ns = nvme_next_ns(head, ns)) {
> if (nvme_path_is_disabled(ns))
> continue;
>

2021-01-28 01:43:07

by Chao Leng

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available



On 2021/1/27 18:30, Daniel Wagner wrote:
> nvme_round_robin_path() should test if the return ns pointer is
> valid. nvme_next_ns() will return a NULL pointer if there is no path
> left.
>
> Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy")
> Cc: Hannes Reinecke <[email protected]>
> Signed-off-by: Daniel Wagner <[email protected]>
> ---
> v2:
> - moved NULL test into the if conditional statement
> - added Fixes tag
>
> drivers/nvme/host/multipath.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 9ac762b28811..282b7a4ea9a9 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
> }
>
> for (ns = nvme_next_ns(head, old);
> - ns != old;
> + ns && ns != old;
nvme_round_robin_path just be called when !"old".
nvme_next_ns should not return NULL when !"old".
It seems unnecessary to add checking "ns".
Is there a bug that "old" is not in "head" list?
If yes, we should fix it.
> ns = nvme_next_ns(head, ns)) {
> if (nvme_path_is_disabled(ns))
> continue;
>

2021-01-28 08:04:50

by Daniel Wagner

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available

On Thu, Jan 28, 2021 at 09:31:30AM +0800, Chao Leng wrote:
> > --- a/drivers/nvme/host/multipath.c
> > +++ b/drivers/nvme/host/multipath.c
> > @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
> > }
> > for (ns = nvme_next_ns(head, old);
> > - ns != old;
> > + ns && ns != old;
> nvme_round_robin_path just be called when !"old".
> nvme_next_ns should not return NULL when !"old".
> It seems unnecessary to add checking "ns".

The problem is when we enter nvme_round_robin_path() and there is no
path available. In this case the initialization ns = nvme_next_ns(head,
old) could return a NULL pointer.

2021-01-28 09:24:18

by Chao Leng

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available



On 2021/1/28 15:58, Daniel Wagner wrote:
> On Thu, Jan 28, 2021 at 09:31:30AM +0800, Chao Leng wrote:
>>> --- a/drivers/nvme/host/multipath.c
>>> +++ b/drivers/nvme/host/multipath.c
>>> @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
>>> }
>>> for (ns = nvme_next_ns(head, old);
>>> - ns != old;
>>> + ns && ns != old;
>> nvme_round_robin_path just be called when !"old".
>> nvme_next_ns should not return NULL when !"old".
>> It seems unnecessary to add checking "ns".
>
> The problem is when we enter nvme_round_robin_path() and there is no
> path available. In this case the initialization ns = nvme_next_ns(head,
> old) could return a NULL pointer."old" should not be NULL, so there is at least one path that is "old".
It is impossible to return NULL for nvme_next_ns(head, old).
> .
>

2021-01-28 09:32:16

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available

On 1/28/21 10:18 AM, Chao Leng wrote:
>
>
> On 2021/1/28 15:58, Daniel Wagner wrote:
>> On Thu, Jan 28, 2021 at 09:31:30AM +0800, Chao Leng wrote:
>>>> --- a/drivers/nvme/host/multipath.c
>>>> +++ b/drivers/nvme/host/multipath.c
>>>> @@ -221,7 +221,7 @@ static struct nvme_ns
>>>> *nvme_round_robin_path(struct nvme_ns_head *head,
>>>>        }
>>>>        for (ns = nvme_next_ns(head, old);
>>>> -         ns != old;
>>>> +         ns && ns != old;
>>> nvme_round_robin_path just be called when !"old".
>>> nvme_next_ns should not return NULL when !"old".
>>> It seems unnecessary to add checking "ns".
>>
>> The problem is when we enter nvme_round_robin_path() and there is no
>> path available. In this case the initialization ns = nvme_next_ns(head,
>> old) could return a NULL pointer."old" should not be NULL, so there is
>> at least one path that is "old".
> It is impossible to return NULL for nvme_next_ns(head, old).

No. list_next_or_null_rcu()/list_first_or_null_rcu() will return NULL
when then end of the list is reached.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

2021-01-28 09:47:26

by Daniel Wagner

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available

On Thu, Jan 28, 2021 at 05:18:56PM +0800, Chao Leng wrote:
> It is impossible to return NULL for nvme_next_ns(head, old).

block nvme0n1: no usable path - requeuing I/O
block nvme0n1: no usable path - requeuing I/O
block nvme0n1: no usable path - requeuing I/O
block nvme0n1: no usable path - requeuing I/O
block nvme0n1: no usable path - requeuing I/O
block nvme0n1: no usable path - requeuing I/O
BUG: kernel NULL pointer dereference, address: 0000000000000068
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 8000000ff67bc067 P4D 8000000ff67bc067 PUD ff9ac9067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 23 PID: 15759 Comm: dt.21.15 Kdump: loaded Tainted: G E 5.3.18-0.gc9fe679-default #1 SLE15-SP2 (unreleased)
Hardware name: FUJITSU PRIMERGY RX2540 M2/D3289-B1, BIOS V5.0.0.11 R1.18.0 for D3289-B1x 02/06/2018
RIP: 0010:nvme_ns_head_make_request+0x1d1/0x430 [nvme_core]
Code: 54 24 10 0f 84 c9 01 00 00 48 8b 54 24 10 48 83 ea 30 0f 84 ba 01 00 00 48 39 d0 0f 84 01 02 00 00 31 ff eb 05 48 39 d0 74 67 <48> 8b 72 68 83 e6 04 75 13 48 8b 72 68 83 e6 01 75 0a 48 8b 72 10
RSP: 0018:ffffa69d08017af8 EFLAGS: 00010246
RAX: ffff92f261d87800 RBX: ffff92fa555b0010 RCX: ffff92fa555bc570
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000001 R08: 0000000000001000 R09: 0000000000001000
R10: ffffa69d080179a8 R11: ffff92f264f0c1c0 R12: ffff92f264f7f000
R13: ffff92fa555b0000 R14: 0000000000000001 R15: 0000000000000000
FS: 00007f3962bae700(0000) GS:ffff92f29ffc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000068 CR3: 0000000fd69a2002 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
generic_make_request+0x121/0x300
? submit_bio+0x42/0x1c0
submit_bio+0x42/0x1c0
ext4_io_submit+0x49/0x60 [ext4]
ext4_writepages+0x625/0xe90 [ext4]
? do_writepages+0x4b/0xe0
? ext4_mark_inode_dirty+0x1d0/0x1d0 [ext4]
do_writepages+0x4b/0xe0
? __generic_file_write_iter+0x192/0x1c0
? __filemap_fdatawrite_range+0xcb/0x100
__filemap_fdatawrite_range+0xcb/0x100
? ext4_file_write_iter+0x128/0x3c0 [ext4]
file_write_and_wait_range+0x5e/0xb0
__generic_file_fsync+0x22/0xb0
ext4_sync_file+0x1f7/0x3c0 [ext4]
do_fsync+0x38/0x60
__x64_sys_fsync+0x10/0x20
do_syscall_64+0x5b/0x1e0
entry_SYSCALL_64_after_hwframe+0x44/0xa9

You can't see exactly where it dies but I followed the assembly to
nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
old) which returns NULL but nvme_next_ns() is returning NULL eventually
(list_next_or_null_rcu()).

And I have positive feedback, this patch fixes the above problem.

2021-01-29 01:20:35

by Chao Leng

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available



On 2021/1/28 17:23, Hannes Reinecke wrote:
> On 1/28/21 10:18 AM, Chao Leng wrote:
>>
>>
>> On 2021/1/28 15:58, Daniel Wagner wrote:
>>> On Thu, Jan 28, 2021 at 09:31:30AM +0800, Chao Leng wrote:
>>>>> --- a/drivers/nvme/host/multipath.c
>>>>> +++ b/drivers/nvme/host/multipath.c
>>>>> @@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
>>>>>        }
>>>>>        for (ns = nvme_next_ns(head, old);
>>>>> -         ns != old;
>>>>> +         ns && ns != old;
>>>> nvme_round_robin_path just be called when !"old".
>>>> nvme_next_ns should not return NULL when !"old".
>>>> It seems unnecessary to add checking "ns".
>>>
>>> The problem is when we enter nvme_round_robin_path() and there is no
>>> path available. In this case the initialization ns = nvme_next_ns(head,
>>> old) could return a NULL pointer."old" should not be NULL, so there is at least one path that is "old".
>> It is impossible to return NULL for nvme_next_ns(head, old).
>
> No. list_next_or_null_rcu()/list_first_or_null_rcu() will return NULL when then end of the list is reached.
Although list_next_or_null_rcu()/list_first_or_null_rcu() may return
NULL, but nvme_next_ns(head, old) assume that the "old" is in the "head",
so nvme_next_ns(head, old) should not return NULL. If the "old" is not
in the "head", nvme_next_ns(head, old) will run abnormal.
So there is other bug which cause nvme_next_ns(head, old).

I review the code about head->list and head->current_path, I find 2 bugs
may cause nvme_next_ns(head, old) abnormal:
First, I already send the patch. see:
https://lore.kernel.org/linux-nvme/[email protected]/
Second, in nvme_ns_remove, list_del_rcu is before
nvme_mpath_clear_current_path. This may cause "old" is deleted from the
"head", but still use "old". I'm not sure there's any other
consideration here, I will check it and try to fix it.
>
> Cheers,
>
> Hannes

2021-01-29 01:26:50

by Chao Leng

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available



On 2021/1/28 17:40, Daniel Wagner wrote:
> On Thu, Jan 28, 2021 at 05:18:56PM +0800, Chao Leng wrote:
>> It is impossible to return NULL for nvme_next_ns(head, old).
>
> block nvme0n1: no usable path - requeuing I/O
> block nvme0n1: no usable path - requeuing I/O
> block nvme0n1: no usable path - requeuing I/O
> block nvme0n1: no usable path - requeuing I/O
> block nvme0n1: no usable path - requeuing I/O
> block nvme0n1: no usable path - requeuing I/O
> BUG: kernel NULL pointer dereference, address: 0000000000000068
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 8000000ff67bc067 P4D 8000000ff67bc067 PUD ff9ac9067 PMD 0
> Oops: 0000 [#1] SMP PTI
> CPU: 23 PID: 15759 Comm: dt.21.15 Kdump: loaded Tainted: G E 5.3.18-0.gc9fe679-default #1 SLE15-SP2 (unreleased)
> Hardware name: FUJITSU PRIMERGY RX2540 M2/D3289-B1, BIOS V5.0.0.11 R1.18.0 for D3289-B1x 02/06/2018
> RIP: 0010:nvme_ns_head_make_request+0x1d1/0x430 [nvme_core]
> Code: 54 24 10 0f 84 c9 01 00 00 48 8b 54 24 10 48 83 ea 30 0f 84 ba 01 00 00 48 39 d0 0f 84 01 02 00 00 31 ff eb 05 48 39 d0 74 67 <48> 8b 72 68 83 e6 04 75 13 48 8b 72 68 83 e6 01 75 0a 48 8b 72 10
> RSP: 0018:ffffa69d08017af8 EFLAGS: 00010246
> RAX: ffff92f261d87800 RBX: ffff92fa555b0010 RCX: ffff92fa555bc570
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: 0000000000000001 R08: 0000000000001000 R09: 0000000000001000
> R10: ffffa69d080179a8 R11: ffff92f264f0c1c0 R12: ffff92f264f7f000
> R13: ffff92fa555b0000 R14: 0000000000000001 R15: 0000000000000000
> FS: 00007f3962bae700(0000) GS:ffff92f29ffc0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000068 CR3: 0000000fd69a2002 CR4: 00000000003606e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> generic_make_request+0x121/0x300
> ? submit_bio+0x42/0x1c0
> submit_bio+0x42/0x1c0
> ext4_io_submit+0x49/0x60 [ext4]
> ext4_writepages+0x625/0xe90 [ext4]
> ? do_writepages+0x4b/0xe0
> ? ext4_mark_inode_dirty+0x1d0/0x1d0 [ext4]
> do_writepages+0x4b/0xe0
> ? __generic_file_write_iter+0x192/0x1c0
> ? __filemap_fdatawrite_range+0xcb/0x100
> __filemap_fdatawrite_range+0xcb/0x100
> ? ext4_file_write_iter+0x128/0x3c0 [ext4]
> file_write_and_wait_range+0x5e/0xb0
> __generic_file_fsync+0x22/0xb0
> ext4_sync_file+0x1f7/0x3c0 [ext4]
> do_fsync+0x38/0x60
> __x64_sys_fsync+0x10/0x20
> do_syscall_64+0x5b/0x1e0
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> You can't see exactly where it dies but I followed the assembly to
> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
> old) which returns NULL but nvme_next_ns() is returning NULL eventually
> (list_next_or_null_rcu()).
So there is other bug cause nvme_next_ns abormal.
I review the code about head->list and head->current_path, I find 2 bugs
may cause the bug:
First, I already send the patch. see:
https://lore.kernel.org/linux-nvme/[email protected]/
Second, in nvme_ns_remove, list_del_rcu is before
nvme_mpath_clear_current_path. This may cause "old" is deleted from the
"head", but still use "old". I'm not sure there's any other
consideration here, I will check it and try to fix it.
>
> And I have positive feedback, this patch fixes the above problem.
Although adding check ns can fix the crash. There may still be other problems
such as in __nvme_find_path which use list_for_each_entry_rcu.
>
> .
>

2021-01-29 01:45:24

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available


>> You can't see exactly where it dies but I followed the assembly to
>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
>> old) which returns NULL but nvme_next_ns() is returning NULL eventually
>> (list_next_or_null_rcu()).
> So there is other bug cause nvme_next_ns abormal.
> I review the code about head->list and head->current_path, I find 2 bugs
> may cause the bug:
> First, I already send the patch. see:
> https://lore.kernel.org/linux-nvme/[email protected]/
>
> Second, in nvme_ns_remove, list_del_rcu is before
> nvme_mpath_clear_current_path. This may cause "old" is deleted from the
> "head", but still use "old". I'm not sure there's any other
> consideration here, I will check it and try to fix it.

The reason why we first remove from head->list and only then clear
current_path is because the other way around there is no way
to guarantee that that the ns won't be assigned as current_path
again (because it is in head->list).

nvme_ns_remove fences continue of deletion of the ns by synchronizing
the srcu such that for sure the current_path clearance is visible.

2021-01-29 03:12:18

by Chao Leng

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available



On 2021/1/29 9:42, Sagi Grimberg wrote:
>
>>> You can't see exactly where it dies but I followed the assembly to
>>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
>>> old) which returns NULL but nvme_next_ns() is returning NULL eventually
>>> (list_next_or_null_rcu()).
>> So there is other bug cause nvme_next_ns abormal.
>> I review the code about head->list and head->current_path, I find 2 bugs
>> may cause the bug:
>> First, I already send the patch. see:
>> https://lore.kernel.org/linux-nvme/[email protected]/
>> Second, in nvme_ns_remove, list_del_rcu is before
>> nvme_mpath_clear_current_path. This may cause "old" is deleted from the
>> "head", but still use "old". I'm not sure there's any other
>> consideration here, I will check it and try to fix it.
>
> The reason why we first remove from head->list and only then clear
> current_path is because the other way around there is no way
> to guarantee that that the ns won't be assigned as current_path
> again (because it is in head->list).
ok, I see.
>
> nvme_ns_remove fences continue of deletion of the ns by synchronizing
> the srcu such that for sure the current_path clearance is visible.
The list will be like this:
head->next = ns1;
ns1->next = head;
old->next = ns1;
This may cause infinite loop in nvme_round_robin_path.
for (ns = nvme_next_ns(head, old);
ns != old;
ns = nvme_next_ns(head, ns))
The ns will always be ns1, and then infinite loop.
> .

2021-01-29 03:32:39

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available


>>>> You can't see exactly where it dies but I followed the assembly to
>>>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
>>>> old) which returns NULL but nvme_next_ns() is returning NULL eventually
>>>> (list_next_or_null_rcu()).
>>> So there is other bug cause nvme_next_ns abormal.
>>> I review the code about head->list and head->current_path, I find 2 bugs
>>> may cause the bug:
>>> First, I already send the patch. see:
>>> https://lore.kernel.org/linux-nvme/[email protected]/
>>>
>>> Second, in nvme_ns_remove, list_del_rcu is before
>>> nvme_mpath_clear_current_path. This may cause "old" is deleted from the
>>> "head", but still use "old". I'm not sure there's any other
>>> consideration here, I will check it and try to fix it.
>>
>> The reason why we first remove from head->list and only then clear
>> current_path is because the other way around there is no way
>> to guarantee that that the ns won't be assigned as current_path
>> again (because it is in head->list).
> ok, I see.
>>
>> nvme_ns_remove fences continue of deletion of the ns by synchronizing
>> the srcu such that for sure the current_path clearance is visible.
> The list will be like this:
> head->next = ns1;
> ns1->next = head;
> old->next = ns1;
> This may cause infinite loop in nvme_round_robin_path.
> for (ns = nvme_next_ns(head, old);
>     ns != old;
>     ns = nvme_next_ns(head, ns))
> The ns will always be ns1, and then infinite loop.

Who is being removed? I'm not following

2021-01-29 03:38:59

by Chao Leng

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available



On 2021/1/29 11:30, Sagi Grimberg wrote:
>
>>>>> You can't see exactly where it dies but I followed the assembly to
>>>>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
>>>>> old) which returns NULL but nvme_next_ns() is returning NULL eventually
>>>>> (list_next_or_null_rcu()).
>>>> So there is other bug cause nvme_next_ns abormal.
>>>> I review the code about head->list and head->current_path, I find 2 bugs
>>>> may cause the bug:
>>>> First, I already send the patch. see:
>>>> https://lore.kernel.org/linux-nvme/[email protected]/
>>>> Second, in nvme_ns_remove, list_del_rcu is before
>>>> nvme_mpath_clear_current_path. This may cause "old" is deleted from the
>>>> "head", but still use "old". I'm not sure there's any other
>>>> consideration here, I will check it and try to fix it.
>>>
>>> The reason why we first remove from head->list and only then clear
>>> current_path is because the other way around there is no way
>>> to guarantee that that the ns won't be assigned as current_path
>>> again (because it is in head->list).
>> ok, I see.
>>>
>>> nvme_ns_remove fences continue of deletion of the ns by synchronizing
>>> the srcu such that for sure the current_path clearance is visible.
>> The list will be like this:
>> head->next = ns1;
>> ns1->next = head;
>> old->next = ns1;
>> This may cause infinite loop in nvme_round_robin_path.
>> for (ns = nvme_next_ns(head, old);
>>      ns != old;
>>      ns = nvme_next_ns(head, ns))
>> The ns will always be ns1, and then infinite loop.
>
> Who is being removed? I'm not following
The "old" is being removed path.
Daniel Wagner report crash like this:
head->next = head;
old->next = head;
So nvme_next_ns(head, old) will return NULL, and then crash.
Although check ns can avoid crash, but can not avoid infinite loop.

Similar reason, The list will be like this:
head->next = ns1;
ns1->next = head;
old->next = ns1;
ns1 is other path.

> .

2021-01-29 07:09:15

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available

On 1/29/21 4:07 AM, Chao Leng wrote:
>
>
> On 2021/1/29 9:42, Sagi Grimberg wrote:
>>
>>>> You can't see exactly where it dies but I followed the assembly to
>>>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
>>>> old) which returns NULL but nvme_next_ns() is returning NULL eventually
>>>> (list_next_or_null_rcu()).
>>> So there is other bug cause nvme_next_ns abormal.
>>> I review the code about head->list and head->current_path, I find 2 bugs
>>> may cause the bug:
>>> First, I already send the patch. see:
>>> https://lore.kernel.org/linux-nvme/[email protected]/
>>>
>>> Second, in nvme_ns_remove, list_del_rcu is before
>>> nvme_mpath_clear_current_path. This may cause "old" is deleted from the
>>> "head", but still use "old". I'm not sure there's any other
>>> consideration here, I will check it and try to fix it.
>>
>> The reason why we first remove from head->list and only then clear
>> current_path is because the other way around there is no way
>> to guarantee that that the ns won't be assigned as current_path
>> again (because it is in head->list).
> ok, I see.
>>
>> nvme_ns_remove fences continue of deletion of the ns by synchronizing
>> the srcu such that for sure the current_path clearance is visible.
> The list will be like this:
> head->next = ns1;
> ns1->next = head;
> old->next = ns1;

Where does 'old' pointing to?

> This may cause infinite loop in nvme_round_robin_path.
> for (ns = nvme_next_ns(head, old);
>     ns != old;
>     ns = nvme_next_ns(head, ns))
> The ns will always be ns1, and then infinite loop.

No. nvme_next_ns() will return NULL.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

2021-02-01 02:20:18

by Chao Leng

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available



On 2021/1/29 17:20, Hannes Reinecke wrote:
> On 1/29/21 9:46 AM, Chao Leng wrote:
>>
>>
>> On 2021/1/29 16:33, Hannes Reinecke wrote:
>>> On 1/29/21 8:45 AM, Chao Leng wrote:
>>>>
>>>>
>>>> On 2021/1/29 15:06, Hannes Reinecke wrote:
>>>>> On 1/29/21 4:07 AM, Chao Leng wrote:
>>>>>>
>>>>>>
>>>>>> On 2021/1/29 9:42, Sagi Grimberg wrote:
>>>>>>>
>>>>>>>>> You can't see exactly where it dies but I followed the assembly to
>>>>>>>>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
>>>>>>>>> old) which returns NULL but nvme_next_ns() is returning NULL eventually
>>>>>>>>> (list_next_or_null_rcu()).
>>>>>>>> So there is other bug cause nvme_next_ns abormal.
>>>>>>>> I review the code about head->list and head->current_path, I find 2 bugs
>>>>>>>> may cause the bug:
>>>>>>>> First, I already send the patch. see:
>>>>>>>> https://lore.kernel.org/linux-nvme/[email protected]/
>>>>>>>> Second, in nvme_ns_remove, list_del_rcu is before
>>>>>>>> nvme_mpath_clear_current_path. This may cause "old" is deleted from the
>>>>>>>> "head", but still use "old". I'm not sure there's any other
>>>>>>>> consideration here, I will check it and try to fix it.
>>>>>>>
>>>>>>> The reason why we first remove from head->list and only then clear
>>>>>>> current_path is because the other way around there is no way
>>>>>>> to guarantee that that the ns won't be assigned as current_path
>>>>>>> again (because it is in head->list).
>>>>>> ok, I see.
>>>>>>>
>>>>>>> nvme_ns_remove fences continue of deletion of the ns by synchronizing
>>>>>>> the srcu such that for sure the current_path clearance is visible.
>>>>>> The list will be like this:
>>>>>> head->next = ns1;
>>>>>> ns1->next = head;
>>>>>> old->next = ns1;
>>>>>
>>>>> Where does 'old' pointing to?
>>>>>
>>>>>> This may cause infinite loop in nvme_round_robin_path.
>>>>>> for (ns = nvme_next_ns(head, old);
>>>>>>      ns != old;
>>>>>>      ns = nvme_next_ns(head, ns))
>>>>>> The ns will always be ns1, and then infinite loop.
>>>>>
>>>>> No. nvme_next_ns() will return NULL.
>>>> If there is just one path(the "old") and the "old" is deleted,
>>>> nvme_next_ns() will return NULL.
>>>> The list like this:
>>>> head->next = head;
>>>> old->next = head;
>>>> If there is two or more path and the "old" is deleted,
>>>> "for" will be infinite loop. because nvme_next_ns() will return
>>>> the path which in the list except the "old", check condition will
>>>> be true for ever.
>>>
>>> But that will be caught by the statement above:
>>>
>>> if (list_is_singular(&head->list))
>>>
>>> no?
>> Two path just a sample example.
>> If there is just two path, will enter it, may cause no path but there is
>> actually one path. It is falsely assumed that the "old" must be not deleted.
>> If there is more than two path, will cause infinite loop.
> So you mean we'll need something like this?
>
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 71696819c228..8ffccaf9c19a 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node)
>  static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
>                 struct nvme_ns *ns)
>  {
> -       ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns,
> -                       siblings);
> -       if (ns)
> -               return ns;
> +       if (ns) {
> +               ns = list_next_or_null_rcu(&head->list, &ns->siblings,
> +                                          struct nvme_ns, siblings);
> +               if (ns)
> +                       return ns;
> +       }
No, in the scenario, ns should not be NULL.
May be we can do like this:

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 282b7a4ea9a9..b895011a2cbd 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -199,30 +199,24 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node)
return found;
}

-static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
- struct nvme_ns *ns)
-{
- ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns,
- siblings);
- if (ns)
- return ns;
- return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings);
-}
+#define nvme_next_ns_condition(head, current, condition) \
+({ \
+ struct nvme_ns *__ptr = list_next_or_null_rcu(&(head)->list, \
+ &(current)->siblings, struct nvme_ns, siblings); \
+ __ptr ? __ptr : (condition) ? (condition) = false, \
+ list_first_or_null_rcu(&(head)->list, struct nvme_ns, \
+ siblings) : NULL; \
+})

static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
int node, struct nvme_ns *old)
{
struct nvme_ns *ns, *found = NULL;
+ bool first_half = true;

- if (list_is_singular(&head->list)) {
- if (nvme_path_is_disabled(old))
- return NULL;
- return old;
- }
-
- for (ns = nvme_next_ns(head, old);
+ for (ns = nvme_next_ns_condition(head, old, first_half);
ns && ns != old;
- ns = nvme_next_ns(head, ns)) {
+ ns = nvme_next_ns_condition(head, ns, first_half)) {
if (nvme_path_is_disabled(ns))
continue;

>         return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings);
>  }
>
> Cheers,
>
> Hannes

2021-02-01 07:32:32

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available

On 2/1/21 3:16 AM, Chao Leng wrote:
>
>
> On 2021/1/29 17:20, Hannes Reinecke wrote:
>> On 1/29/21 9:46 AM, Chao Leng wrote:
>>>
>>>
>>> On 2021/1/29 16:33, Hannes Reinecke wrote:
>>>> On 1/29/21 8:45 AM, Chao Leng wrote:
>>>>>
>>>>>
>>>>> On 2021/1/29 15:06, Hannes Reinecke wrote:
>>>>>> On 1/29/21 4:07 AM, Chao Leng wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2021/1/29 9:42, Sagi Grimberg wrote:
>>>>>>>>
>>>>>>>>>> You can't see exactly where it dies but I followed the
>>>>>>>>>> assembly to
>>>>>>>>>> nvme_round_robin_path(). Maybe it's not the initial
>>>>>>>>>> nvme_next_ns(head,
>>>>>>>>>> old) which returns NULL but nvme_next_ns() is returning NULL
>>>>>>>>>> eventually
>>>>>>>>>> (list_next_or_null_rcu()).
>>>>>>>>> So there is other bug cause nvme_next_ns abormal.
>>>>>>>>> I review the code about head->list and head->current_path, I
>>>>>>>>> find 2 bugs
>>>>>>>>> may cause the bug:
>>>>>>>>> First, I already send the patch. see:
>>>>>>>>> https://lore.kernel.org/linux-nvme/[email protected]/
>>>>>>>>>
>>>>>>>>> Second, in nvme_ns_remove, list_del_rcu is before
>>>>>>>>> nvme_mpath_clear_current_path. This may cause "old" is deleted
>>>>>>>>> from the
>>>>>>>>> "head", but still use "old". I'm not sure there's any other
>>>>>>>>> consideration here, I will check it and try to fix it.
>>>>>>>>
>>>>>>>> The reason why we first remove from head->list and only then clear
>>>>>>>> current_path is because the other way around there is no way
>>>>>>>> to guarantee that that the ns won't be assigned as current_path
>>>>>>>> again (because it is in head->list).
>>>>>>> ok, I see.
>>>>>>>>
>>>>>>>> nvme_ns_remove fences continue of deletion of the ns by
>>>>>>>> synchronizing
>>>>>>>> the srcu such that for sure the current_path clearance is visible.
>>>>>>> The list will be like this:
>>>>>>> head->next = ns1;
>>>>>>> ns1->next = head;
>>>>>>> old->next = ns1;
>>>>>>
>>>>>> Where does 'old' pointing to?
>>>>>>
>>>>>>> This may cause infinite loop in nvme_round_robin_path.
>>>>>>> for (ns = nvme_next_ns(head, old);
>>>>>>>      ns != old;
>>>>>>>      ns = nvme_next_ns(head, ns))
>>>>>>> The ns will always be ns1, and then infinite loop.
>>>>>>
>>>>>> No. nvme_next_ns() will return NULL.
>>>>> If there is just one path(the "old") and the "old" is deleted,
>>>>> nvme_next_ns() will return NULL.
>>>>> The list like this:
>>>>> head->next = head;
>>>>> old->next = head;
>>>>> If there is two or more path and the "old" is deleted,
>>>>> "for" will be infinite loop. because nvme_next_ns() will return
>>>>> the path which in the list except the "old", check condition will
>>>>> be true for ever.
>>>>
>>>> But that will be caught by the statement above:
>>>>
>>>> if (list_is_singular(&head->list))
>>>>
>>>> no?
>>> Two path just a sample example.
>>> If there is just two path, will enter it, may cause no path but there is
>>> actually one path. It is falsely assumed that the "old" must be not
>>> deleted.
>>> If there is more than two path, will cause infinite loop.
>> So you mean we'll need something like this?
>>
>> diff --git a/drivers/nvme/host/multipath.c
>> b/drivers/nvme/host/multipath.c
>> index 71696819c228..8ffccaf9c19a 100644
>> --- a/drivers/nvme/host/multipath.c
>> +++ b/drivers/nvme/host/multipath.c
>> @@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct
>> nvme_ns_head *head, int node)
>>   static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
>>                  struct nvme_ns *ns)
>>   {
>> -       ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct
>> nvme_ns,
>> -                       siblings);
>> -       if (ns)
>> -               return ns;
>> +       if (ns) {
>> +               ns = list_next_or_null_rcu(&head->list, &ns->siblings,
>> +                                          struct nvme_ns, siblings);
>> +               if (ns)
>> +                       return ns;
>> +       }
> No, in the scenario, ns should not be NULL.

Why not? 'ns == NULL' is precisely the corner-case this is trying to fix...

> May be we can do like this:
>
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 282b7a4ea9a9..b895011a2cbd 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -199,30 +199,24 @@ static struct nvme_ns *__nvme_find_path(struct
> nvme_ns_head *head, int node)
>         return found;
>  }
>
> -static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
> -               struct nvme_ns *ns)
> -{
> -       ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct
> nvme_ns,
> -                       siblings);
> -       if (ns)
> -               return ns;
> -       return list_first_or_null_rcu(&head->list, struct nvme_ns,
> siblings);
> -}
> +#define nvme_next_ns_condition(head, current, condition) \
> +({ \
> +       struct nvme_ns *__ptr = list_next_or_null_rcu(&(head)->list, \
> +               &(current)->siblings, struct nvme_ns, siblings); \
> +       __ptr ? __ptr : (condition) ? (condition) = false, \
> +               list_first_or_null_rcu(&(head)->list, struct nvme_ns, \
> +                       siblings) : NULL; \
> +})
>
Urgh. Please, no. That is well impossible to debug.
Can you please open-code it to demonstrate where the difference to the
current (and my fixed) versions is?
I'm still not clear where the problem is once we applied both patches.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

2021-02-01 08:50:47

by Chao Leng

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available



On 2021/2/1 15:29, Hannes Reinecke wrote:
> On 2/1/21 3:16 AM, Chao Leng wrote:
>>
>>
>> On 2021/1/29 17:20, Hannes Reinecke wrote:
>>> On 1/29/21 9:46 AM, Chao Leng wrote:
>>>>
>>>>
>>>> On 2021/1/29 16:33, Hannes Reinecke wrote:
>>>>> On 1/29/21 8:45 AM, Chao Leng wrote:
>>>>>>
>>>>>>
>>>>>> On 2021/1/29 15:06, Hannes Reinecke wrote:
>>>>>>> On 1/29/21 4:07 AM, Chao Leng wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2021/1/29 9:42, Sagi Grimberg wrote:
>>>>>>>>>
>>>>>>>>>>> You can't see exactly where it dies but I followed the assembly to
>>>>>>>>>>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
>>>>>>>>>>> old) which returns NULL but nvme_next_ns() is returning NULL eventually
>>>>>>>>>>> (list_next_or_null_rcu()).
>>>>>>>>>> So there is other bug cause nvme_next_ns abormal.
>>>>>>>>>> I review the code about head->list and head->current_path, I find 2 bugs
>>>>>>>>>> may cause the bug:
>>>>>>>>>> First, I already send the patch. see:
>>>>>>>>>> https://lore.kernel.org/linux-nvme/[email protected]/
>>>>>>>>>> Second, in nvme_ns_remove, list_del_rcu is before
>>>>>>>>>> nvme_mpath_clear_current_path. This may cause "old" is deleted from the
>>>>>>>>>> "head", but still use "old". I'm not sure there's any other
>>>>>>>>>> consideration here, I will check it and try to fix it.
>>>>>>>>>
>>>>>>>>> The reason why we first remove from head->list and only then clear
>>>>>>>>> current_path is because the other way around there is no way
>>>>>>>>> to guarantee that that the ns won't be assigned as current_path
>>>>>>>>> again (because it is in head->list).
>>>>>>>> ok, I see.
>>>>>>>>>
>>>>>>>>> nvme_ns_remove fences continue of deletion of the ns by synchronizing
>>>>>>>>> the srcu such that for sure the current_path clearance is visible.
>>>>>>>> The list will be like this:
>>>>>>>> head->next = ns1;
>>>>>>>> ns1->next = head;
>>>>>>>> old->next = ns1;
>>>>>>>
>>>>>>> Where does 'old' pointing to?
>>>>>>>
>>>>>>>> This may cause infinite loop in nvme_round_robin_path.
>>>>>>>> for (ns = nvme_next_ns(head, old);
>>>>>>>>      ns != old;
>>>>>>>>      ns = nvme_next_ns(head, ns))
>>>>>>>> The ns will always be ns1, and then infinite loop.
>>>>>>>
>>>>>>> No. nvme_next_ns() will return NULL.
>>>>>> If there is just one path(the "old") and the "old" is deleted,
>>>>>> nvme_next_ns() will return NULL.
>>>>>> The list like this:
>>>>>> head->next = head;
>>>>>> old->next = head;
>>>>>> If there is two or more path and the "old" is deleted,
>>>>>> "for" will be infinite loop. because nvme_next_ns() will return
>>>>>> the path which in the list except the "old", check condition will
>>>>>> be true for ever.
>>>>>
>>>>> But that will be caught by the statement above:
>>>>>
>>>>> if (list_is_singular(&head->list))
>>>>>
>>>>> no?
>>>> Two path just a sample example.
>>>> If there is just two path, will enter it, may cause no path but there is
>>>> actually one path. It is falsely assumed that the "old" must be not deleted.
>>>> If there is more than two path, will cause infinite loop.
>>> So you mean we'll need something like this?
>>>
>>> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
>>> index 71696819c228..8ffccaf9c19a 100644
>>> --- a/drivers/nvme/host/multipath.c
>>> +++ b/drivers/nvme/host/multipath.c
>>> @@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node)
>>>   static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
>>>                  struct nvme_ns *ns)
>>>   {
>>> -       ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns,
>>> -                       siblings);
>>> -       if (ns)
>>> -               return ns;
>>> +       if (ns) {
>>> +               ns = list_next_or_null_rcu(&head->list, &ns->siblings,
>>> +                                          struct nvme_ns, siblings);
>>> +               if (ns)
>>> +                       return ns;
>>> +       }
>> No, in the scenario, ns should not be NULL.
>
> Why not? 'ns == NULL' is precisely the corner-case this is trying to fix...
>
>> May be we can do like this:
>>
>> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
>> index 282b7a4ea9a9..b895011a2cbd 100644
>> --- a/drivers/nvme/host/multipath.c
>> +++ b/drivers/nvme/host/multipath.c
>> @@ -199,30 +199,24 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node)
>>          return found;
>>   }
>>
>> -static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
>> -               struct nvme_ns *ns)
>> -{
>> -       ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns,
>> -                       siblings);
>> -       if (ns)
>> -               return ns;
>> -       return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings);
>> -}
>> +#define nvme_next_ns_condition(head, current, condition) \
>> +({ \
>> +       struct nvme_ns *__ptr = list_next_or_null_rcu(&(head)->list, \
>> +               &(current)->siblings, struct nvme_ns, siblings); \
>> +       __ptr ? __ptr : (condition) ? (condition) = false, \
>> +               list_first_or_null_rcu(&(head)->list, struct nvme_ns, \
>> +                       siblings) : NULL; \
>> +})
>>
> Urgh. Please, no. That is well impossible to debug.
> Can you please open-code it to demonstrate where the difference to the current (and my fixed) versions is?
> I'm still not clear where the problem is once we applied both patches.
For example assume the list has three path, and all path is not NVME_ANA_OPTIMIZED:
head->next = ns1;
ns1->next = ns2;
ns2->next = head;
old->next = ns2;

My patch work flow:
nvme_next_ns_condition(head, old, true) return ns2;
nvme_next_ns_condition(head, ns2, true) change the condition to false
and return ns1;
nvme_next_ns_condition(head, ns1, false) return ns2;
nvme_next_ns_condition(head, ns2, false) return NULL;
And then the loop end.

Your patch work flow:
nvme_next_ns(head, old) return ns2;
nvme_next_ns(head, ns2) return ns1;
nvme_next_ns(head, ns1) return ns2;
nvme_next_ns(head, ns2) return ns1;
nvme_next_ns(head, ns1) return ns2;
And then the loop is infinite.
>
> Cheers,
>
> Hannes

2021-02-01 09:02:05

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available

On 2/1/21 9:47 AM, Chao Leng wrote:
>
>
> On 2021/2/1 15:29, Hannes Reinecke wrote:[ .. ]
>> Urgh. Please, no. That is well impossible to debug.
>> Can you please open-code it to demonstrate where the difference to the
>> current (and my fixed) versions is?
>> I'm still not clear where the problem is once we applied both patches.
> For example assume the list has three path, and all path is not
> NVME_ANA_OPTIMIZED:
> head->next = ns1;
> ns1->next = ns2;
> ns2->next = head;
> old->next = ns2;
>
And this is where I have issues with.
Where does 'old' come from?
Clearly it was part of the list at one point; so what happened to it?

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

2021-02-01 09:43:01

by Chao Leng

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available



On 2021/2/1 16:57, Hannes Reinecke wrote:
> On 2/1/21 9:47 AM, Chao Leng wrote:
>>
>>
>> On 2021/2/1 15:29, Hannes Reinecke wrote:[ .. ]
>>> Urgh. Please, no. That is well impossible to debug.
>>> Can you please open-code it to demonstrate where the difference to the current (and my fixed) versions is?
>>> I'm still not clear where the problem is once we applied both patches.
>> For example assume the list has three path, and all path is not NVME_ANA_OPTIMIZED:
>> head->next = ns1;
>> ns1->next = ns2;
>> ns2->next = head;
>> old->next = ns2;
>>
> And this is where I have issues with.
> Where does 'old' come from?
> Clearly it was part of the list at one point; so what happened to it?
I explained this earlier.
In nvme_ns_remove, there is a hole between list_del_rcu and
nvme_mpath_clear_current_path. If head->current_path is the "old", and
the "old" is removing. The "old" is already removed from the list by
list_del_rcu, but head->current_path is not clear to NULL by
nvme_mpath_clear_current_path.
Find path is race with nvme_ns_remove, use the "old" pass to
nvme_round_robin_path to find path.
>
> Cheers,
>
> Hannes

2021-02-01 10:47:09

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available

On 2/1/21 10:40 AM, Chao Leng wrote:
>
>
> On 2021/2/1 16:57, Hannes Reinecke wrote:
>> On 2/1/21 9:47 AM, Chao Leng wrote:
>>>
>>>
>>> On 2021/2/1 15:29, Hannes Reinecke wrote:[ .. ]
>>>> Urgh. Please, no. That is well impossible to debug.
>>>> Can you please open-code it to demonstrate where the difference to
>>>> the current (and my fixed) versions is?
>>>> I'm still not clear where the problem is once we applied both patches.
>>> For example assume the list has three path, and all path is not
>>> NVME_ANA_OPTIMIZED:
>>> head->next = ns1;
>>> ns1->next = ns2;
>>> ns2->next = head;
>>> old->next = ns2;
>>>
>> And this is where I have issues with.
>> Where does 'old' come from?
>> Clearly it was part of the list at one point; so what happened to it?
> I explained this earlier.
> In nvme_ns_remove, there is a hole between list_del_rcu and
> nvme_mpath_clear_current_path. If head->current_path is the "old", and
> the "old" is removing. The "old" is already removed from the list by
> list_del_rcu, but head->current_path is not clear to NULL by
> nvme_mpath_clear_current_path.
> Find path is race with nvme_ns_remove, use the "old" pass to
> nvme_round_robin_path to find path.

Ah. So this should be better:

@@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct
nvme_ns_head *head, int node)
static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
struct nvme_ns *ns)
{
- ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct
nvme_ns,
- siblings);
- if (ns)
- return ns;
+ if (ns && !test_bit(NVME_NS_REMOVING, &ns->flags)) {
+ ns = list_next_or_null_rcu(&head->list, &ns->siblings,
+ struct nvme_ns, siblings);
+ if (ns)
+ return ns;
+ }
return list_first_or_null_rcu(&head->list, struct nvme_ns,
siblings);
}

The 'NVME_NS_REMOVING' bit is set before list_del_rcu() is called, so it
should guard against the issue you mentioned.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

2021-02-02 01:17:35

by Chao Leng

[permalink] [raw]
Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available



On 2021/2/1 18:45, Hannes Reinecke wrote:
> On 2/1/21 10:40 AM, Chao Leng wrote:
>>
>>
>> On 2021/2/1 16:57, Hannes Reinecke wrote:
>>> On 2/1/21 9:47 AM, Chao Leng wrote:
>>>>
>>>>
>>>> On 2021/2/1 15:29, Hannes Reinecke wrote:[ .. ]
>>>>> Urgh. Please, no. That is well impossible to debug.
>>>>> Can you please open-code it to demonstrate where the difference to the current (and my fixed) versions is?
>>>>> I'm still not clear where the problem is once we applied both patches.
>>>> For example assume the list has three path, and all path is not NVME_ANA_OPTIMIZED:
>>>> head->next = ns1;
>>>> ns1->next = ns2;
>>>> ns2->next = head;
>>>> old->next = ns2;
>>>>
>>> And this is where I have issues with.
>>> Where does 'old' come from?
>>> Clearly it was part of the list at one point; so what happened to it?
>> I explained this earlier.
>> In nvme_ns_remove, there is a hole between list_del_rcu and
>> nvme_mpath_clear_current_path. If head->current_path is the "old", and
>> the "old" is removing. The "old" is already removed from the list by
>> list_del_rcu, but head->current_path is not clear to NULL by
>> nvme_mpath_clear_current_path.
>> Find path is race with nvme_ns_remove, use the "old" pass to
>> nvme_round_robin_path to find path.
>
> Ah. So this should be better:
>
> @@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node)
>  static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
>                 struct nvme_ns *ns)
>  {
> -       ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns,
> -                       siblings);
> -       if (ns)
> -               return ns;
> +       if (ns && !test_bit(NVME_NS_REMOVING, &ns->flags)) {
> +               ns = list_next_or_null_rcu(&head->list, &ns->siblings,
> +                                          struct nvme_ns, siblings);
> +               if (ns)
> +                       return ns;
> +       }
>         return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings);
>  }
>
> The 'NVME_NS_REMOVING' bit is set before list_del_rcu() is called, so it should guard against the issue you mentioned.
Looks useless, it is still infinite loop. You can check the workflow for the scenario I mentioned.
>
> Cheers,
>
> Hannes