2014-07-15 14:24:20

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v3 00/21] NFS/RDMA client patches for 3.17

The main purpose of this series is to address connection drop
recovery issues by fixing FRMR re-use to make it less likely the
client will deadlock due to a memory management operation error.

Some clean-ups and other fixes are present as well.

See topic branch nfs-rdma-for-3.17 in

git://git.linux-nfs.org/projects/cel/cel-2.6.git

I tested with NFSv3 and NFSv4 on all three supported memory
registration modes. Used cthon04, iozone, and dbench with both
Solaris and Linux NFS/RDMA servers. Used xfstests with Linux.

v3:
Only two substantive changes:

- Patch 08/21 now uses generic IB helpers for managing FRMR
rkeys

- Add Tested-by: from Steve Wise


v2:
Many patches from v1 have been written or replaced.

The MW ref counting approach in v1 is abandoned. Instead, I've
eliminated signaling FAST_REG_MR and LOCAL_INV, and added
appropriate recovery mechanisms after a transport reconnect that
should prevent rkey dis-synchrony entirely.

A couple of optimizations have been added, including:

- Allocating each MW separately rather than carving each out of a
large piece of contiguous memory

- Now that the receive CQ upcall handler dequeues a bundle of CQEs
at once, fire off the reply handler tasklet just once per upcall
to reduce context switches and how often hard IRQs are disabled

Jury is still out on the latter.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2014-07-17 05:06:49

by Devesh Sharma

[permalink] [raw]
Subject: RE: [PATCH v3 00/21] NFS/RDMA client patches for 3.17

Hi Chuck,


Tested v3 with ocrdma (linux-3.16-rc5 inbox`ed ocrdma). Both Cthon and iozone passes with and regressions. I will perform cable pull test as well and get back to you.

-Regards
Devesh

> -----Original Message-----
> From: [email protected] [mailto:linux-rdma-
> [email protected]] On Behalf Of Chuck Lever
> Sent: Tuesday, July 15, 2014 7:54 PM
> To: linux-rdma; Linux NFS Mailing List
> Subject: [PATCH v3 00/21] NFS/RDMA client patches for 3.17
>
> The main purpose of this series is to address connection drop recovery issues
> by fixing FRMR re-use to make it less likely the client will deadlock due to a
> memory management operation error.
>
> Some clean-ups and other fixes are present as well.
>
> See topic branch nfs-rdma-for-3.17 in
>
> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>
> I tested with NFSv3 and NFSv4 on all three supported memory registration
> modes. Used cthon04, iozone, and dbench with both Solaris and Linux
> NFS/RDMA servers. Used xfstests with Linux.
>
> v3:
> Only two substantive changes:
>
> - Patch 08/21 now uses generic IB helpers for managing FRMR
> rkeys
>
> - Add Tested-by: from Steve Wise
>
>
> v2:
> Many patches from v1 have been written or replaced.
>
> The MW ref counting approach in v1 is abandoned. Instead, I've eliminated
> signaling FAST_REG_MR and LOCAL_INV, and added appropriate recovery
> mechanisms after a transport reconnect that should prevent rkey dis-
> synchrony entirely.
>
> A couple of optimizations have been added, including:
>
> - Allocating each MW separately rather than carving each out of a
> large piece of contiguous memory
>
> - Now that the receive CQ upcall handler dequeues a bundle of CQEs
> at once, fire off the reply handler tasklet just once per upcall
> to reduce context switches and how often hard IRQs are disabled
>
> Jury is still out on the latter.
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to [email protected] More majordomo info at
> http://vger.kernel.org/majordomo-info.html

2014-07-17 14:21:27

by Devesh Sharma

[permalink] [raw]
Subject: RE: [PATCH v3 00/21] NFS/RDMA client patches for 3.17

Yes, kindly do it. However, I have tested this only with ocrdma

-Regards
Devesh

> -----Original Message-----
> From: Chuck Lever [mailto:[email protected]]
> Sent: Thursday, July 17, 2014 7:46 PM
> To: Devesh Sharma
> Cc: linux-rdma; Linux NFS Mailing List
> Subject: Re: [PATCH v3 00/21] NFS/RDMA client patches for 3.17
>
>
> On Jul 17, 2014, at 10:12 AM, Devesh Sharma
> <[email protected]> wrote:
>
> > Hi Chuck,
> >
> > Tested the cable pull also. V3 is passing the cable pull test also. I have tried
> following tests:
> >
> > Run iozone on nfs-rdma mount.
> > Bring down the link from switch (to simulate cable pull).
> > Wait for 10 secs.
> > Bring back the link.
> > This test passes, iozone resumes traffic.
> >
> > Run iozone on nfs-rdma mount.
> > Bring down the link from switch (to simulate cable pull).
> > Wait for 70 secs.
> > Bring back the link.
> > This test passes, iozone resumes traffic.
>
> Thanks Devesh!
>
> May I add "Tested-by: Devesh Sharma <[email protected]>" ?
>
> >
> >> -----Original Message-----
> >> From: [email protected] [mailto:linux-rdma-
> >> [email protected]] On Behalf Of Devesh Sharma
> >> Sent: Thursday, July 17, 2014 10:37 AM
> >> To: Chuck Lever; linux-rdma; Linux NFS Mailing List
> >> Subject: RE: [PATCH v3 00/21] NFS/RDMA client patches for 3.17
> >>
> >> Hi Chuck,
> >>
> >>
> >> Tested v3 with ocrdma (linux-3.16-rc5 inbox`ed ocrdma). Both Cthon
> >> and iozone passes with and regressions. I will perform cable pull
> >> test as well and get back to you.
> >>
> >> -Regards
> >> Devesh
> >>
> >>> -----Original Message-----
> >>> From: [email protected] [mailto:linux-rdma-
> >>> [email protected]] On Behalf Of Chuck Lever
> >>> Sent: Tuesday, July 15, 2014 7:54 PM
> >>> To: linux-rdma; Linux NFS Mailing List
> >>> Subject: [PATCH v3 00/21] NFS/RDMA client patches for 3.17
> >>>
> >>> The main purpose of this series is to address connection drop
> >>> recovery issues by fixing FRMR re-use to make it less likely the
> >>> client will deadlock due to a memory management operation error.
> >>>
> >>> Some clean-ups and other fixes are present as well.
> >>>
> >>> See topic branch nfs-rdma-for-3.17 in
> >>>
> >>> git://git.linux-nfs.org/projects/cel/cel-2.6.git
> >>>
> >>> I tested with NFSv3 and NFSv4 on all three supported memory
> >>> registration modes. Used cthon04, iozone, and dbench with both
> >>> Solaris and Linux NFS/RDMA servers. Used xfstests with Linux.
> >>>
> >>> v3:
> >>> Only two substantive changes:
> >>>
> >>> - Patch 08/21 now uses generic IB helpers for managing FRMR rkeys
> >>>
> >>> - Add Tested-by: from Steve Wise
> >>>
> >>>
> >>> v2:
> >>> Many patches from v1 have been written or replaced.
> >>>
> >>> The MW ref counting approach in v1 is abandoned. Instead, I've
> >>> eliminated signaling FAST_REG_MR and LOCAL_INV, and added
> >> appropriate
> >>> recovery mechanisms after a transport reconnect that should prevent
> >>> rkey dis- synchrony entirely.
> >>>
> >>> A couple of optimizations have been added, including:
> >>>
> >>> - Allocating each MW separately rather than carving each out of a
> >>> large piece of contiguous memory
> >>>
> >>> - Now that the receive CQ upcall handler dequeues a bundle of CQEs
> >>> at once, fire off the reply handler tasklet just once per upcall to
> >>> reduce context switches and how often hard IRQs are disabled
> >>>
> >>> Jury is still out on the latter.
> >>>
> >>> --
> >>> Chuck Lever
> >>> chuck[dot]lever[at]oracle[dot]com
> >>>
> >>>
> >>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> >>> in the body of a message to [email protected] More
> >> majordomo
> >>> info at http://vger.kernel.org/majordomo-info.html
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> >> in the body of a message to [email protected] More
> majordomo
> >> info at http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs"
> > in the body of a message to [email protected] More
> majordomo
> > info at http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>


2014-07-16 18:22:19

by Shirley Ma

[permalink] [raw]
Subject: Re: [PATCH v3 00/21] NFS/RDMA client patches for 3.17



On 07/16/2014 10:57 AM, Chuck Lever wrote:
>
>> On Jul 16, 2014, at 11:48 AM, Shirley Ma <[email protected]> wrote:
>>
>> These two patches have been significantly reduced interrupt rate by around 4 times.
>>
>> xprtrdma: Disable completions for FAST_REG_MR Work Requests
>> xprtrdma: Disable completions for LOCAL_INV Work Requests
>
> Thanks Shirley! This is result applies only to FRMR, correct? Also, i'd imagine the savings would be even greater for adapters that have short page list depth?

Yes, only tested FRMR with mlx4. I can hack the code to test short page page list depth to check the savings. When looking the difference between irq and softirq, it is much closer now.

>
>>
>> Same NFS read/write workload, here are interrupts rate irq/per sec report based upon /proc/interrupts:
>>
>> w/o patches:
>> -----------
>> PCI-MSI-edge mlx4-ib (204): 105176
>> PCI-MSI-edge mlx4-ib (204): 123650
>> PCI-MSI-edge mlx4-ib (204): 123690
>> PCI-MSI-edge mlx4-ib (204): 116554
>> PCI-MSI-edge mlx4-ib (204): 122864
>>
>> And perf stat irq report:
>> Performance counter stats for 'system wide':
>>
>> 2,131,870 irq:irq_handler_entry [100.00%]
>> 2,131,870 irq:irq_handler_exit [100.00%]
>> 635,587 irq:softirq_entry [100.00%]
>> 635,597 irq:softirq_exit [100.00%]
>> 636,155 irq:softirq_raise
>>
>> 25.422821792 seconds time elapsed
>>
>> w/i patches:
>> -----------
>> PCI-MSI-edge mlx4-ib (204): 31131
>> PCI-MSI-edge mlx4-ib (204): 32958
>> PCI-MSI-edge mlx4-ib (204): 31068
>> PCI-MSI-edge mlx4-ib (204): 30236
>> PCI-MSI-edge mlx4-ib (204): 33041
>>
>> And perf stat irq report:
>>
>> Performance counter stats for 'system wide':
>>
>> 653,548 irq:irq_handler_entry [100.00%]
>> 653,548 irq:irq_handler_exit [100.00%]
>> 568,138 irq:softirq_entry [100.00%]
>> 568,148 irq:softirq_exit [100.00%]
>> 568,690 irq:softirq_raise
>>
>> 21.675597062 seconds time elapsed
>>
>> Shirley
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2014-07-17 14:12:56

by Devesh Sharma

[permalink] [raw]
Subject: RE: [PATCH v3 00/21] NFS/RDMA client patches for 3.17

Hi Chuck,

Tested the cable pull also. V3 is passing the cable pull test also. I have tried following tests:

Run iozone on nfs-rdma mount.
Bring down the link from switch (to simulate cable pull).
Wait for 10 secs.
Bring back the link.
This test passes, iozone resumes traffic.

Run iozone on nfs-rdma mount.
Bring down the link from switch (to simulate cable pull).
Wait for 70 secs.
Bring back the link.
This test passes, iozone resumes traffic.

> -----Original Message-----
> From: [email protected] [mailto:linux-rdma-
> [email protected]] On Behalf Of Devesh Sharma
> Sent: Thursday, July 17, 2014 10:37 AM
> To: Chuck Lever; linux-rdma; Linux NFS Mailing List
> Subject: RE: [PATCH v3 00/21] NFS/RDMA client patches for 3.17
>
> Hi Chuck,
>
>
> Tested v3 with ocrdma (linux-3.16-rc5 inbox`ed ocrdma). Both Cthon and
> iozone passes with and regressions. I will perform cable pull test as well and
> get back to you.
>
> -Regards
> Devesh
>
> > -----Original Message-----
> > From: [email protected] [mailto:linux-rdma-
> > [email protected]] On Behalf Of Chuck Lever
> > Sent: Tuesday, July 15, 2014 7:54 PM
> > To: linux-rdma; Linux NFS Mailing List
> > Subject: [PATCH v3 00/21] NFS/RDMA client patches for 3.17
> >
> > The main purpose of this series is to address connection drop recovery
> > issues by fixing FRMR re-use to make it less likely the client will
> > deadlock due to a memory management operation error.
> >
> > Some clean-ups and other fixes are present as well.
> >
> > See topic branch nfs-rdma-for-3.17 in
> >
> > git://git.linux-nfs.org/projects/cel/cel-2.6.git
> >
> > I tested with NFSv3 and NFSv4 on all three supported memory
> > registration modes. Used cthon04, iozone, and dbench with both Solaris
> > and Linux NFS/RDMA servers. Used xfstests with Linux.
> >
> > v3:
> > Only two substantive changes:
> >
> > - Patch 08/21 now uses generic IB helpers for managing FRMR
> > rkeys
> >
> > - Add Tested-by: from Steve Wise
> >
> >
> > v2:
> > Many patches from v1 have been written or replaced.
> >
> > The MW ref counting approach in v1 is abandoned. Instead, I've
> > eliminated signaling FAST_REG_MR and LOCAL_INV, and added
> appropriate
> > recovery mechanisms after a transport reconnect that should prevent
> > rkey dis- synchrony entirely.
> >
> > A couple of optimizations have been added, including:
> >
> > - Allocating each MW separately rather than carving each out of a
> > large piece of contiguous memory
> >
> > - Now that the receive CQ upcall handler dequeues a bundle of CQEs
> > at once, fire off the reply handler tasklet just once per upcall
> > to reduce context switches and how often hard IRQs are disabled
> >
> > Jury is still out on the latter.
> >
> > --
> > Chuck Lever
> > chuck[dot]lever[at]oracle[dot]com
> >
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > in the body of a message to [email protected] More
> majordomo
> > info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to [email protected] More majordomo info at
> http://vger.kernel.org/majordomo-info.html

2014-07-16 15:48:25

by Shirley Ma

[permalink] [raw]
Subject: Re: [PATCH v3 00/21] NFS/RDMA client patches for 3.17

These two patches have been significant reduced interrupt rate by around 4 times.

xprtrdma: Disable completions for FAST_REG_MR Work Requests
xprtrdma: Disable completions for LOCAL_INV Work Requests

Same NFS read/write workload, here are interrupts rate irq/per sec report based upon /proc/interrupts:

w/o patches:
-----------
PCI-MSI-edge mlx4-ib (204): 105176
PCI-MSI-edge mlx4-ib (204): 123650
PCI-MSI-edge mlx4-ib (204): 123690
PCI-MSI-edge mlx4-ib (204): 116554
PCI-MSI-edge mlx4-ib (204): 122864

And perf stat irq report:
Performance counter stats for 'system wide':

2,131,870 irq:irq_handler_entry [100.00%]
2,131,870 irq:irq_handler_exit [100.00%]
635,587 irq:softirq_entry [100.00%]
635,597 irq:softirq_exit [100.00%]
636,155 irq:softirq_raise

25.422821792 seconds time elapsed

w/i patches:
-----------
PCI-MSI-edge mlx4-ib (204): 31131
PCI-MSI-edge mlx4-ib (204): 32958
PCI-MSI-edge mlx4-ib (204): 31068
PCI-MSI-edge mlx4-ib (204): 30236
PCI-MSI-edge mlx4-ib (204): 33041

And perf stat irq report:

Performance counter stats for 'system wide':

653,548 irq:irq_handler_entry [100.00%]
653,548 irq:irq_handler_exit [100.00%]
568,138 irq:softirq_entry [100.00%]
568,148 irq:softirq_exit [100.00%]
568,690 irq:softirq_raise

21.675597062 seconds time elapsed

Shirley


2014-07-16 17:57:41

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH v3 00/21] NFS/RDMA client patches for 3.17


> On Jul 16, 2014, at 11:48 AM, Shirley Ma <[email protected]> wrote:
>
> These two patches have been significant reduced interrupt rate by around 4 times.
>
> xprtrdma: Disable completions for FAST_REG_MR Work Requests
> xprtrdma: Disable completions for LOCAL_INV Work Requests

Thanks Shirley! This is result applies only to FRMR, correct? Also, i'd imagine the savings would be even greater for adapters that have short page list depth?


>
> Same NFS read/write workload, here are interrupts rate irq/per sec report based upon /proc/interrupts:
>
> w/o patches:
> -----------
> PCI-MSI-edge mlx4-ib (204): 105176
> PCI-MSI-edge mlx4-ib (204): 123650
> PCI-MSI-edge mlx4-ib (204): 123690
> PCI-MSI-edge mlx4-ib (204): 116554
> PCI-MSI-edge mlx4-ib (204): 122864
>
> And perf stat irq report:
> Performance counter stats for 'system wide':
>
> 2,131,870 irq:irq_handler_entry [100.00%]
> 2,131,870 irq:irq_handler_exit [100.00%]
> 635,587 irq:softirq_entry [100.00%]
> 635,597 irq:softirq_exit [100.00%]
> 636,155 irq:softirq_raise
>
> 25.422821792 seconds time elapsed
>
> w/i patches:
> -----------
> PCI-MSI-edge mlx4-ib (204): 31131
> PCI-MSI-edge mlx4-ib (204): 32958
> PCI-MSI-edge mlx4-ib (204): 31068
> PCI-MSI-edge mlx4-ib (204): 30236
> PCI-MSI-edge mlx4-ib (204): 33041
>
> And perf stat irq report:
>
> Performance counter stats for 'system wide':
>
> 653,548 irq:irq_handler_entry [100.00%]
> 653,548 irq:irq_handler_exit [100.00%]
> 568,138 irq:softirq_entry [100.00%]
> 568,148 irq:softirq_exit [100.00%]
> 568,690 irq:softirq_raise
>
> 21.675597062 seconds time elapsed
>
> Shirley
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-07-17 14:16:28

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH v3 00/21] NFS/RDMA client patches for 3.17


On Jul 17, 2014, at 10:12 AM, Devesh Sharma <[email protected]> wrote:

> Hi Chuck,
>
> Tested the cable pull also. V3 is passing the cable pull test also. I have tried following tests:
>
> Run iozone on nfs-rdma mount.
> Bring down the link from switch (to simulate cable pull).
> Wait for 10 secs.
> Bring back the link.
> This test passes, iozone resumes traffic.
>
> Run iozone on nfs-rdma mount.
> Bring down the link from switch (to simulate cable pull).
> Wait for 70 secs.
> Bring back the link.
> This test passes, iozone resumes traffic.

Thanks Devesh!

May I add "Tested-by: Devesh Sharma <[email protected]>? ?

>
>> -----Original Message-----
>> From: [email protected] [mailto:linux-rdma-
>> [email protected]] On Behalf Of Devesh Sharma
>> Sent: Thursday, July 17, 2014 10:37 AM
>> To: Chuck Lever; linux-rdma; Linux NFS Mailing List
>> Subject: RE: [PATCH v3 00/21] NFS/RDMA client patches for 3.17
>>
>> Hi Chuck,
>>
>>
>> Tested v3 with ocrdma (linux-3.16-rc5 inbox`ed ocrdma). Both Cthon and
>> iozone passes with and regressions. I will perform cable pull test as well and
>> get back to you.
>>
>> -Regards
>> Devesh
>>
>>> -----Original Message-----
>>> From: [email protected] [mailto:linux-rdma-
>>> [email protected]] On Behalf Of Chuck Lever
>>> Sent: Tuesday, July 15, 2014 7:54 PM
>>> To: linux-rdma; Linux NFS Mailing List
>>> Subject: [PATCH v3 00/21] NFS/RDMA client patches for 3.17
>>>
>>> The main purpose of this series is to address connection drop recovery
>>> issues by fixing FRMR re-use to make it less likely the client will
>>> deadlock due to a memory management operation error.
>>>
>>> Some clean-ups and other fixes are present as well.
>>>
>>> See topic branch nfs-rdma-for-3.17 in
>>>
>>> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>>>
>>> I tested with NFSv3 and NFSv4 on all three supported memory
>>> registration modes. Used cthon04, iozone, and dbench with both Solaris
>>> and Linux NFS/RDMA servers. Used xfstests with Linux.
>>>
>>> v3:
>>> Only two substantive changes:
>>>
>>> - Patch 08/21 now uses generic IB helpers for managing FRMR
>>> rkeys
>>>
>>> - Add Tested-by: from Steve Wise
>>>
>>>
>>> v2:
>>> Many patches from v1 have been written or replaced.
>>>
>>> The MW ref counting approach in v1 is abandoned. Instead, I've
>>> eliminated signaling FAST_REG_MR and LOCAL_INV, and added
>> appropriate
>>> recovery mechanisms after a transport reconnect that should prevent
>>> rkey dis- synchrony entirely.
>>>
>>> A couple of optimizations have been added, including:
>>>
>>> - Allocating each MW separately rather than carving each out of a
>>> large piece of contiguous memory
>>>
>>> - Now that the receive CQ upcall handler dequeues a bundle of CQEs
>>> at once, fire off the reply handler tasklet just once per upcall
>>> to reduce context switches and how often hard IRQs are disabled
>>>
>>> Jury is still out on the latter.
>>>
>>> --
>>> Chuck Lever
>>> chuck[dot]lever[at]oracle[dot]com
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
>>> in the body of a message to [email protected] More
>> majordomo
>>> info at http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
>> body of a message to [email protected] More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com