2023-12-05 00:11:48

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE

On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
>
> Daisuke Matsuda (7):
> RDMA/rxe: Always defer tasks on responder and completer to workqueue
> RDMA/rxe: Make MR functions accessible from other rxe source code
> RDMA/rxe: Move resp_states definition to rxe_verbs.h
> RDMA/rxe: Add page invalidation support
> RDMA/rxe: Allow registering MRs for On-Demand Paging
> RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
> RDMA/rxe: Add support for the traditional Atomic operations with ODP

What is the current situation with rxe? I don't recall seeing the bugs
that were reported get fixed?

I'm reluctant to dig a deeper hold until it is done?

Thanks,
Jason


2023-12-05 01:51:02

by Zhu Yanjun

[permalink] [raw]
Subject: Re: [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE

在 2023/12/5 8:11, Jason Gunthorpe 写道:
> On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
>>
>> Daisuke Matsuda (7):
>> RDMA/rxe: Always defer tasks on responder and completer to workqueue
>> RDMA/rxe: Make MR functions accessible from other rxe source code
>> RDMA/rxe: Move resp_states definition to rxe_verbs.h
>> RDMA/rxe: Add page invalidation support
>> RDMA/rxe: Allow registering MRs for On-Demand Paging
>> RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
>> RDMA/rxe: Add support for the traditional Atomic operations with ODP
>
> What is the current situation with rxe? I don't recall seeing the bugs
> that were reported get fixed?

Exactly. A problem is reported in the link
https://www.spinics.net/lists/linux-rdma/msg120947.html

It seems that a variable 'entry' set but not used
[-Wunused-but-set-variable]

And ODP is an important feature. Should we suggest to add a test case
about this ODP in rdma-core to verify this ODP feature?

Zhu Yanjun

>
> I'm reluctant to dig a deeper hold until it is done?
>
> Thanks,
> Jason

Subject: RE: [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE

On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote:
>
> 在 2023/12/5 8:11, Jason Gunthorpe 写道:
> > On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
> >>
> >> Daisuke Matsuda (7):
> >> RDMA/rxe: Always defer tasks on responder and completer to workqueue
> >> RDMA/rxe: Make MR functions accessible from other rxe source code
> >> RDMA/rxe: Move resp_states definition to rxe_verbs.h
> >> RDMA/rxe: Add page invalidation support
> >> RDMA/rxe: Allow registering MRs for On-Demand Paging
> >> RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
> >> RDMA/rxe: Add support for the traditional Atomic operations with ODP
> >
> > What is the current situation with rxe? I don't recall seeing the bugs
> > that were reported get fixed?

Well, I suppose Jason is mentioning "blktests srp/002 hang".
cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/

It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel",
so the hang looks not specific to rxe.
cf. https://lore.kernel.org/all/[email protected]/T/
I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue.


There is another issue that causes kernel panic.
[bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size
cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/

https://patchwork.kernel.org/project/linux-rdma/list/?series=798592&state=*
Zhijian has submitted patches to fix this, and he got some comments.
It looks he is involved in CXL driver intensively these days.
I guess he is still working on it.

>
> Exactly. A problem is reported in the link
> https://www.spinics.net/lists/linux-rdma/msg120947.html
>
> It seems that a variable 'entry' set but not used
> [-Wunused-but-set-variable]

Yeah, I can revise the patch anytime.

>
> And ODP is an important feature. Should we suggest to add a test case
> about this ODP in rdma-core to verify this ODP feature?

Rxe can share the same tests with mlx5.
I added test cases for Write, Read and Atomic operations with ODP,
and we can add more tests if there are any suggestions.
Cf. https://github.com/linux-rdma/rdma-core/blob/master/tests/test_odp.py

Thanks,
Daisuke Matsuda

>
> Zhu Yanjun
>
> >
> > I'm reluctant to dig a deeper hold until it is done?
> >
> > Thanks,
> > Jason

2023-12-12 18:08:05

by Zhu Yanjun

[permalink] [raw]
Subject: Re: [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE

在 2023/12/7 14:37, Daisuke Matsuda (Fujitsu) 写道:
> On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote:
>>
>> 在 2023/12/5 8:11, Jason Gunthorpe 写道:
>>> On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
>>>>
>>>> Daisuke Matsuda (7):
>>>> RDMA/rxe: Always defer tasks on responder and completer to workqueue
>>>> RDMA/rxe: Make MR functions accessible from other rxe source code
>>>> RDMA/rxe: Move resp_states definition to rxe_verbs.h
>>>> RDMA/rxe: Add page invalidation support
>>>> RDMA/rxe: Allow registering MRs for On-Demand Paging
>>>> RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
>>>> RDMA/rxe: Add support for the traditional Atomic operations with ODP
>>>
>>> What is the current situation with rxe? I don't recall seeing the bugs
>>> that were reported get fixed?
>
> Well, I suppose Jason is mentioning "blktests srp/002 hang".
> cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/
>
> It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel",
> so the hang looks not specific to rxe.
> cf. https://lore.kernel.org/all/[email protected]/T/
> I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue.
>
>
> There is another issue that causes kernel panic.
> [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size
> cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
>
> https://patchwork.kernel.org/project/linux-rdma/list/?series=798592&state=*
> Zhijian has submitted patches to fix this, and he got some comments.
> It looks he is involved in CXL driver intensively these days.
> I guess he is still working on it.
>
>>
>> Exactly. A problem is reported in the link
>> https://www.spinics.net/lists/linux-rdma/msg120947.html
>>
>> It seems that a variable 'entry' set but not used
>> [-Wunused-but-set-variable]
>
> Yeah, I can revise the patch anytime.
>
>>
>> And ODP is an important feature. Should we suggest to add a test case
>> about this ODP in rdma-core to verify this ODP feature?
>
> Rxe can share the same tests with mlx5.
> I added test cases for Write, Read and Atomic operations with ODP,
> and we can add more tests if there are any suggestions.
> Cf. https://github.com/linux-rdma/rdma-core/blob/master/tests/test_odp.py

Thanks a lot.
Do you make tests with blktests after your patches are applied with the
latest kernel?

Zhu Yanjun

>
> Thanks,
> Daisuke Matsuda
>
>>
>> Zhu Yanjun
>>
>>>
>>> I'm reluctant to dig a deeper hold until it is done?
>>>
>>> Thanks,
>>> Jason
>

Subject: RE: [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE

On Wed, Dec 13, 2023 3:08 AM Zhu Yanjun wrote:
> 在 2023/12/7 14:37, Daisuke Matsuda (Fujitsu) 写道:
> > On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote:
> >>
> >> 在 2023/12/5 8:11, Jason Gunthorpe 写道:
> >>> On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
> >>>>
> >>>> Daisuke Matsuda (7):
> >>>> RDMA/rxe: Always defer tasks on responder and completer to workqueue
> >>>> RDMA/rxe: Make MR functions accessible from other rxe source code
> >>>> RDMA/rxe: Move resp_states definition to rxe_verbs.h
> >>>> RDMA/rxe: Add page invalidation support
> >>>> RDMA/rxe: Allow registering MRs for On-Demand Paging
> >>>> RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
> >>>> RDMA/rxe: Add support for the traditional Atomic operations with ODP
> >>>
> >>> What is the current situation with rxe? I don't recall seeing the bugs
> >>> that were reported get fixed?
> >
> > Well, I suppose Jason is mentioning "blktests srp/002 hang".
> > cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/
> >
> > It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel",
> > so the hang looks not specific to rxe.
> > cf. https://lore.kernel.org/all/[email protected]/T/
> > I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue.
> >
> >
> > There is another issue that causes kernel panic.
> > [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size
> > cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
> >
> > https://patchwork.kernel.org/project/linux-rdma/list/?series=798592&state=*
> > Zhijian has submitted patches to fix this, and he got some comments.
> > It looks he is involved in CXL driver intensively these days.
> > I guess he is still working on it.
> >
> >>
> >> Exactly. A problem is reported in the link
> >> https://www.spinics.net/lists/linux-rdma/msg120947.html
> >>
> >> It seems that a variable 'entry' set but not used
> >> [-Wunused-but-set-variable]
> >
> > Yeah, I can revise the patch anytime.
> >
> >>
> >> And ODP is an important feature. Should we suggest to add a test case
> >> about this ODP in rdma-core to verify this ODP feature?
> >
> > Rxe can share the same tests with mlx5.
> > I added test cases for Write, Read and Atomic operations with ODP,
> > and we can add more tests if there are any suggestions.
> > Cf. https://github.com/linux-rdma/rdma-core/blob/master/tests/test_odp.py
>
> Thanks a lot.
> Do you make tests with blktests after your patches are applied with the
> latest kernel?

I have not done that yet, but I agree I should do it.
I will try to take time for the test before submitting v8

Thanks,
Daisuke Matsuda


>
> Zhu Yanjun
>
> >
> > Thanks,
> > Daisuke Matsuda
> >
> >>
> >> Zhu Yanjun
> >>
> >>>
> >>> I'm reluctant to dig a deeper hold until it is done?
> >>>
> >>> Thanks,
> >>> Jason
> >
>

2023-12-15 02:46:23

by Zhu Yanjun

[permalink] [raw]
Subject: Re: [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE


在 2023/12/14 13:55, Daisuke Matsuda (Fujitsu) 写道:
> On Wed, Dec 13, 2023 3:08 AM Zhu Yanjun wrote:
>> 在 2023/12/7 14:37, Daisuke Matsuda (Fujitsu) 写道:
>>> On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote:
>>>> 在 2023/12/5 8:11, Jason Gunthorpe 写道:
>>>>> On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
>>>>>> Daisuke Matsuda (7):
>>>>>> RDMA/rxe: Always defer tasks on responder and completer to workqueue
>>>>>> RDMA/rxe: Make MR functions accessible from other rxe source code
>>>>>> RDMA/rxe: Move resp_states definition to rxe_verbs.h
>>>>>> RDMA/rxe: Add page invalidation support
>>>>>> RDMA/rxe: Allow registering MRs for On-Demand Paging
>>>>>> RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
>>>>>> RDMA/rxe: Add support for the traditional Atomic operations with ODP
>>>>> What is the current situation with rxe? I don't recall seeing the bugs
>>>>> that were reported get fixed?
>>> Well, I suppose Jason is mentioning "blktests srp/002 hang".
>>> cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/
>>>
>>> It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel",
>>> so the hang looks not specific to rxe.
>>> cf. https://lore.kernel.org/all/[email protected]/T/
>>> I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue.
>>>
>>>
>>> There is another issue that causes kernel panic.
>>> [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size
>>> cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
>>>
>>> https://patchwork.kernel.org/project/linux-rdma/list/?series=798592&state=*
>>> Zhijian has submitted patches to fix this, and he got some comments.
>>> It looks he is involved in CXL driver intensively these days.
>>> I guess he is still working on it.
>>>
>>>> Exactly. A problem is reported in the link
>>>> https://www.spinics.net/lists/linux-rdma/msg120947.html
>>>>
>>>> It seems that a variable 'entry' set but not used
>>>> [-Wunused-but-set-variable]
>>> Yeah, I can revise the patch anytime.
>>>
>>>> And ODP is an important feature. Should we suggest to add a test case
>>>> about this ODP in rdma-core to verify this ODP feature?
>>> Rxe can share the same tests with mlx5.
>>> I added test cases for Write, Read and Atomic operations with ODP,
>>> and we can add more tests if there are any suggestions.
>>> Cf. https://github.com/linux-rdma/rdma-core/blob/master/tests/test_odp.py
>> Thanks a lot.
>> Do you make tests with blktests after your patches are applied with the
>> latest kernel?
> I have not done that yet, but I agree I should do it.
> I will try to take time for the test before submitting v8

Thanks. Hope blktest can work well with your commits.

Zhu Yanjun

>
> Thanks,
> Daisuke Matsuda
>
>
>> Zhu Yanjun
>>
>>> Thanks,
>>> Daisuke Matsuda
>>>
>>>> Zhu Yanjun
>>>>
>>>>> I'm reluctant to dig a deeper hold until it is done?
>>>>>
>>>>> Thanks,
>>>>> Jason

2024-01-04 14:56:43

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE

On Thu, Dec 07, 2023 at 06:37:13AM +0000, Daisuke Matsuda (Fujitsu) wrote:
> On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote:
> >
> > 在 2023/12/5 8:11, Jason Gunthorpe 写道:
> > > On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
> > >>
> > >> Daisuke Matsuda (7):
> > >> RDMA/rxe: Always defer tasks on responder and completer to workqueue
> > >> RDMA/rxe: Make MR functions accessible from other rxe source code
> > >> RDMA/rxe: Move resp_states definition to rxe_verbs.h
> > >> RDMA/rxe: Add page invalidation support
> > >> RDMA/rxe: Allow registering MRs for On-Demand Paging
> > >> RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
> > >> RDMA/rxe: Add support for the traditional Atomic operations with ODP
> > >
> > > What is the current situation with rxe? I don't recall seeing the bugs
> > > that were reported get fixed?
>
> Well, I suppose Jason is mentioning "blktests srp/002 hang".
> cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/
>
> It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel",
> so the hang looks not specific to rxe.
> cf. https://lore.kernel.org/all/[email protected]/T/
> I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue.

Bob? Is that what we think?

> There is another issue that causes kernel panic.
> [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size
> cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/

This is more understandable, and the fix of matching the MTT size to
the PAGE_SIZE seems reasonable to me.

Jason