2014-04-30 19:16:57

by Shirley Ma

[permalink] [raw]
Subject: NFSoRDMA developers bi-weekly meeting announcement (4/30)

Attendees:
Jeff Beck (NASA)
Yan Burman (Mellanox)
Phil Cayton (Intel)
Susan Coulter (LANL)
Chuck Lever (Oracle)
Shirley Ma (Oracle)
Anna Schumaker (Net App)
Devesh Sharma (Emulex)
Steve Wise (OpenGridComputing, Chelsio)

Moderate:
Shirley Ma (Oracle)

4/30/2014 meeting summaries:
NFSoRDMA developers bi-weekly meeting is to help organizing NFSoRDMA
development and test effort from different resources to speed up
NFSoRDMA upstream kernel work and NFSoRDMA diagnosing/debugging tools
development. Hopefully the quality of NFSoRDMA upstream patches can be
improved by being tested by a quorum of HW vendors. The NFSoRDMA issues
can be tracked and followed up in time.

Today's meeting covered:
1. NFSoRDMA test strategy and plan:
What NFS related software tools are available for functional and
performance test?
Chuck has created a wiki page for NFSoRDMA test. The wiki page covers
NFSoRDMA functionality, stress and performance test. Here is the link to
the wiki:

http://wiki.linux-nfs.org/wiki/index.php/NfsRdmaClient/Home#Submitting_patches


If you have any other test suggestions like scalability test, please
send to Chuck Lever.

How are we going to organize continuous testing resources:
Each of us have our own lab with different configuration, thus we could
have a better test coverage without purchasing/installing all vendors
hardware.

2. Helping others on problem report and follow up:
How to get help from the communities?
Problems should be reported on both linux-nfs and linux-rdma mailing
list. However some others found the problems they reported had been
ignored in the past. In order to better tracking NFSoRDMA bug report, we
have created NFSoRDMA linux-nfs.org bugzilla family, which will be
mirrored to liux-nfs, linux-rdma mailing list. Here is the link to
linux-nfs.org NFSoRDMA bugzilla:

https://bugzilla.linux-nfs.org/enter_bug.cgi?product=kernel

NFSoRDMA server component: svcrdma
NFSoRDMA client component: xprtrdma

Chuck Lever and Shirley Ma will help tracking issues on NFSoRDMA bugs.
This isn't guaranteed the bugs being assigned and fixed. Chuck Lever and
Shirley Ma will work on NFSoRDMA client side issues. But we still
couldn't locate resources to support NFSoRDMA server. This is an
outstanding issue.

3. Upstream NFSoRDMA status:
Anna Schumaker has created git tree to maintain all NFSoRDMA client
stable patches. It will be easy for maintainer to pull patches to
upstream. NFSoRDMA end users can use her git tree. Here is the link to
Anna's NFSoRDMA client git tree:

git://git.linux-nfs.org/projects/anna/nfs-rdma.git

Chuck Lever's NFSoRDMA linux-nfs.org git tree is for developers only.
All patches should be reviewed and tested before going to Anna's tree.
Here is link to Chuck Lever's git tree, check out nfs-rdma-client branch.

git://git.linux-nfs.org/projects/cel/cel-2.6.git

4. NFSoRDMA debugging and diagnosis tools:
ibdump is a ConnectX sniffing tool, which can be used to monitor
InfiniBand packets on the fabrics. wireshark is able to interpret these
packets in IB but not NFSoRDMA layer. A dissector is needed for NFSoRDMA
protocol. Yan Burman will look at how much work to create NFSoRDMA
dissector for wireshark.

Chelsio is also able to replicate packets to other interface for
wireshark to monitor. Steve Wise will help on finding how-to instruction.

Next meeting topics proposal:
1. Follow up the work has been discussed from this meeting.

2. Walk through some of the stories on pivotal, link is as below:
https://www.pivotaltracker.com/s/projects/958376

3. Invite some of the developers to discuss some of their requirements
and features.

Meeting time: one hour discussion every other Wed (next meeting will be
on 5/14). A reminder will be sent out to both linux-nfs and linux-rdma
mailing list:

5/14/2014
@8:00am PST
@9:00am MST
@10:00am CST
@11:00am EST
@Bangalore @9:00pm
@Israel @6:00pm

Duration: 1 hour

Call-in number:
Israel: +972 37219638
Bangalore: +91 8039890080 (180030109800)
US: 8666824770, 408-7744073
Conference Code: 2308833
Passcode: 63767362 (it's NFSoRDMA, in case you couldn't remember)

Thanks everyone for joining the call and providing valuable inputs/work
to the community to make NFSoRDMA better. Anyone who are interested in
NFSoRDMA is welcome to join the incoming meetings on 5/14.

Shirley


2014-04-30 20:00:41

by Or Gerlitz

[permalink] [raw]
Subject: Re: NFSoRDMA developers bi-weekly meeting announcement (4/30)

On Wed, Apr 30, 2014 at 10:47 PM, Chuck Lever <[email protected]>

> If I understood Yan, he is trying to use NFS/RDMA in guests (kvm?). We
> are pretty sure that is not working at the moment,

can you provide a short 1-2 liner why/what is broken there? the only
thing which I can think of to be not-supported over mlx4 VFs is the
proprietary FMRs, but AFAIK, the nfs-rdma code doesn't even have a
mode which uses them, right?

> but that is a priority
> to get fixed. Shirley has a lab set up and has been looking into it.

2014-04-30 20:49:50

by Shirley Ma

[permalink] [raw]
Subject: Re: NFSoRDMA developers bi-weekly meeting announcement (4/30)

On 04/30/2014 01:00 PM, Or Gerlitz wrote:
> On Wed, Apr 30, 2014 at 10:47 PM, Chuck Lever <[email protected]>
>
>> If I understood Yan, he is trying to use NFS/RDMA in guests (kvm?). We
>> are pretty sure that is not working at the moment,
> can you provide a short 1-2 liner why/what is broken there? the only
> thing which I can think of to be not-supported over mlx4 VFs is the
> proprietary FMRs, but AFAIK, the nfs-rdma code doesn't even have a
> mode which uses them, right?
I've created Xen guest on DomU. Dom0 PF works which has no mtts been
enabled, however DomU I hit this problem by just mounting the file system:
mlx4_core 0000:00:04.0: Failed to allocate mtts for 66 pages(order 7)
mlx4_core 0000:00:04.0: Failed to allocate mtts for 4096 pages(order 12)
mlx4_core 0000:00:04.0: Failed to allocate mtts for 4096 pages(order 12)

RDMA microbenchmark perftest works ok. I enabled mtts scripts when
booting the Xen guest. cat /proc/mtrr:

[root@ca-nfsdev1vm1 log]# cat /proc/mtrr
reg00: base=0x0f0000000 ( 3840MB), size= 128MB, count=1: uncachable
reg01: base=0x0f8000000 ( 3968MB), size= 64MB, count=1: uncachable

lspci -v
00:04.0 InfiniBand: Mellanox Technologies MT25400 Family [ConnectX-2
Virtual Function] (rev b0)
Subsystem: Mellanox Technologies Device 61b0
Physical Slot: 4
Flags: bus master, fast devsel, latency 0
Memory at f0000000 (64-bit, prefetchable) [size=128M]
Capabilities: [60] Express Endpoint, MSI 00
Capabilities: [9c] MSI-X: Enable+ Count=4 Masked-
Kernel driver in use: mlx4_core
Kernel modules: mlx4_core

I will need to find another machine to try KVM guest. Yan might hit a
different problem.

I have ConnectX-2, FW level is 2.11.2012. Yan has ConnectX-3, he tried
it on KVM guest.
>> but that is a priority
>> to get fixed. Shirley has a lab set up and has been looking into it.
Shirley

2014-04-30 23:58:28

by Doug Ledford

[permalink] [raw]
Subject: Re: NFSoRDMA developers bi-weekly meeting announcement (4/30)

On 04/302014 Shirley Ma wrote:
> On 04/30/2014 01:00 PM, Or Gerlitz wrote:
> > On Wed, Apr 30, 2014 at 10:47 PM, Chuck Lever
> > <[email protected]>
> >
> >> If I understood Yan, he is trying to use NFS/RDMA in guests
> >> (kvm?). We
> >> are pretty sure that is not working at the moment,
> > can you provide a short 1-2 liner why/what is broken there? the
> > only
> > thing which I can think of to be not-supported over mlx4 VFs is the
> > proprietary FMRs, but AFAIK, the nfs-rdma code doesn't even have a
> > mode which uses them, right?
> I've created Xen guest on DomU. Dom0 PF works which has no mtts been
> enabled, however DomU I hit this problem by just mounting the file
> system:
> mlx4_core 0000:00:04.0: Failed to allocate mtts for 66 pages(order 7)
> mlx4_core 0000:00:04.0: Failed to allocate mtts for 4096 pages(order
> 12)
> mlx4_core 0000:00:04.0: Failed to allocate mtts for 4096 pages(order
> 12)
>
> RDMA microbenchmark perftest works ok. I enabled mtts scripts when
> booting the Xen guest. cat /proc/mtrr:

What OS/RDMA stack are you using? I'm not familiar with any mtts
scripts, however I know there is an mtrr fixup script I wrote for
the RDMA stack in Fedora/RHEL (and so I assume it's in Oracle Linux
too, but I haven't checked). In fact, I assume that's the script
you are referring to based on the fact that your next bit of your
email cats the /proc/mtrr file. But I don't believe whether there
is an mtrr setting mixup or not that is should have any impact on
the mtts allocations in the driver. Even if your mtrr registers
were set incorrectly, the problem then becomes either A) a serious
performance bottleneck (in the case of Intel hardware that needs
write combining in order to get more than about 50MByte/s of
throughput on their cards) or B) failed operation because MMIO
writes to the card are being cached/write combined when they should
not be.

I suspect this is more likely Xen related than mtts/mtrr related.

> [root@ca-nfsdev1vm1 log]# cat /proc/mtrr
> reg00: base=0x0f0000000 ( 3840MB), size= 128MB, count=1: uncachable
> reg01: base=0x0f8000000 ( 3968MB), size= 64MB, count=1: uncachable
>
> lspci -v
> 00:04.0 InfiniBand: Mellanox Technologies MT25400 Family [ConnectX-2
> Virtual Function] (rev b0)
> Subsystem: Mellanox Technologies Device 61b0
> Physical Slot: 4
> Flags: bus master, fast devsel, latency 0
> Memory at f0000000 (64-bit, prefetchable) [size=128M]
> Capabilities: [60] Express Endpoint, MSI 00
> Capabilities: [9c] MSI-X: Enable+ Count=4 Masked-
> Kernel driver in use: mlx4_core
> Kernel modules: mlx4_core
>
> I will need to find another machine to try KVM guest. Yan might hit a
> different problem.
>
> I have ConnectX-2, FW level is 2.11.2012. Yan has ConnectX-3, he
> tried
> it on KVM guest.
> >> but that is a priority
> >> to get fixed. Shirley has a lab set up and has been looking into
> >> it.
> Shirley
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD
http://people.redhat.com/dledford


2014-04-30 19:39:40

by Or Gerlitz

[permalink] [raw]
Subject: Re: NFSoRDMA developers bi-weekly meeting announcement (4/30)

On Wed, Apr 30, 2014 at 10:16 PM, Shirley Ma <[email protected]> wrote:
[...]
> 3. Upstream NFSoRDMA status:


So does it currently works...? I understand that Yan tried it out
today, and @ least one side just crashed.

Chuck, I assume there is a configuration which basically works for you
and allow you to develop the upstream patches, you send, right? can
you send us your .config and exact NFS/rNFS options you use and works
basically OK?

Or.

2014-04-30 19:48:24

by Chuck Lever III

[permalink] [raw]
Subject: Re: NFSoRDMA developers bi-weekly meeting announcement (4/30)

Hi Or-

On Apr 30, 2014, at 3:39 PM, Or Gerlitz <[email protected]> wrote:

> On Wed, Apr 30, 2014 at 10:16 PM, Shirley Ma <[email protected]> wrote:
> [...]
>> 3. Upstream NFSoRDMA status:
>
>
> So does it currently works...? I understand that Yan tried it out
> today, and @ least one side just crashed.
>
> Chuck, I assume there is a configuration which basically works for you
> and allow you to develop the upstream patches, you send, right? can
> you send us your .config and exact NFS/rNFS options you use and works
> basically OK?

If I understood Yan, he is trying to use NFS/RDMA in guests (kvm?). We
are pretty sure that is not working at the moment, but that is a priority
to get fixed. Shirley has a lab set up and has been looking into it.

So, first step would be to set up v3.15-rc3 on bare metal.

I think the critical stability patches for both client and server are
already upstream, unless you are exporting tmpfs.

The patches I just posted fix some other issues, feel free to apply them.
But the basics should be working now in v3.15-rc3.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2014-05-01 05:16:37

by Shirley Ma

[permalink] [raw]
Subject: Re: NFSoRDMA developers bi-weekly meeting announcement (4/30)


On 04/30/2014 04:58 PM, Doug Ledford wrote:
> On 04/302014 Shirley Ma wrote:
>> On 04/30/2014 01:00 PM, Or Gerlitz wrote:
>>> On Wed, Apr 30, 2014 at 10:47 PM, Chuck Lever
>>> <[email protected]>
>>>
>>>> If I understood Yan, he is trying to use NFS/RDMA in guests
>>>> (kvm?). We
>>>> are pretty sure that is not working at the moment,
>>> can you provide a short 1-2 liner why/what is broken there? the
>>> only
>>> thing which I can think of to be not-supported over mlx4 VFs is the
>>> proprietary FMRs, but AFAIK, the nfs-rdma code doesn't even have a
>>> mode which uses them, right?
>> I've created Xen guest on DomU. Dom0 PF works which has no mtts been
>> enabled, however DomU I hit this problem by just mounting the file
>> system:
>> mlx4_core 0000:00:04.0: Failed to allocate mtts for 66 pages(order 7)
>> mlx4_core 0000:00:04.0: Failed to allocate mtts for 4096 pages(order
>> 12)
>> mlx4_core 0000:00:04.0: Failed to allocate mtts for 4096 pages(order
>> 12)
>>
>> RDMA microbenchmark perftest works ok. I enabled mtts scripts when
>> booting the Xen guest. cat /proc/mtrr:
> What OS/RDMA stack are you using? I'm not familiar with any mtts
> scripts, however I know there is an mtrr fixup script I wrote for
> the RDMA stack in Fedora/RHEL (and so I assume it's in Oracle Linux
> too, but I haven't checked). In fact, I assume that's the script
> you are referring to based on the fact that your next bit of your
> email cats the /proc/mtrr file. But I don't believe whether there
> is an mtrr setting mixup or not that is should have any impact on
> the mtts allocations in the driver. Even if your mtrr registers
> were set incorrectly, the problem then becomes either A) a serious
> performance bottleneck (in the case of Intel hardware that needs
> write combining in order to get more than about 50MByte/s of
> throughput on their cards) or B) failed operation because MMIO
> writes to the card are being cached/write combined when they should
> not be.
>
> I suspect this is more likely Xen related than mtts/mtrr related.
Yes. That's the script I used. I wonder whether it's possible to disable
mtrr on DomU guest to debug this. I am new to Xen.
>> [root@ca-nfsdev1vm1 log]# cat /proc/mtrr
>> reg00: base=0x0f0000000 ( 3840MB), size= 128MB, count=1: uncachable
>> reg01: base=0x0f8000000 ( 3968MB), size= 64MB, count=1: uncachable
>>
>> lspci -v
>> 00:04.0 InfiniBand: Mellanox Technologies MT25400 Family [ConnectX-2
>> Virtual Function] (rev b0)
>> Subsystem: Mellanox Technologies Device 61b0
>> Physical Slot: 4
>> Flags: bus master, fast devsel, latency 0
>> Memory at f0000000 (64-bit, prefetchable) [size=128M]
>> Capabilities: [60] Express Endpoint, MSI 00
>> Capabilities: [9c] MSI-X: Enable+ Count=4 Masked-
>> Kernel driver in use: mlx4_core
>> Kernel modules: mlx4_core
>>
>> I will need to find another machine to try KVM guest. Yan might hit a
>> different problem.
>>
>> I have ConnectX-2, FW level is 2.11.2012. Yan has ConnectX-3, he
>> tried
>> it on KVM guest.
>>>> but that is a priority
>>>> to get fixed. Shirley has a lab set up and has been looking into
>>>> it.
>> Shirley
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
>> in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>


2014-05-01 16:04:13

by Doug Ledford

[permalink] [raw]
Subject: Re: NFSoRDMA developers bi-weekly meeting announcement (4/30)

On 05/01/2014, Shirley Ma wrote:
> On 04/30/2014 04:58 PM, Doug Ledford wrote:
> > On 04/302014 Shirley Ma wrote:
> >> I've created Xen guest on DomU. Dom0 PF works which has no mtts
> >> been
> >> enabled, however DomU I hit this problem by just mounting the file
> >> system:
> >> mlx4_core 0000:00:04.0: Failed to allocate mtts for 66 pages(order
> >> 7)
> >> mlx4_core 0000:00:04.0: Failed to allocate mtts for 4096
> >> pages(order
> >> 12)
> >> mlx4_core 0000:00:04.0: Failed to allocate mtts for 4096
> >> pages(order
> >> 12)
> >>
> >> RDMA microbenchmark perftest works ok. I enabled mtts scripts when
> >> booting the Xen guest. cat /proc/mtrr:
> > What OS/RDMA stack are you using? I'm not familiar with any mtts
> > scripts, however I know there is an mtrr fixup script I wrote for
> > the RDMA stack in Fedora/RHEL (and so I assume it's in Oracle Linux
> > too, but I haven't checked). In fact, I assume that's the script
> > you are referring to based on the fact that your next bit of your
> > email cats the /proc/mtrr file. But I don't believe whether there
> > is an mtrr setting mixup or not that is should have any impact on
> > the mtts allocations in the driver. Even if your mtrr registers
> > were set incorrectly, the problem then becomes either A) a serious
> > performance bottleneck (in the case of Intel hardware that needs
> > write combining in order to get more than about 50MByte/s of
> > throughput on their cards) or B) failed operation because MMIO
> > writes to the card are being cached/write combined when they should
> > not be.
> >
> > I suspect this is more likely Xen related than mtts/mtrr related.
> Yes. That's the script I used. I wonder whether it's possible to
> disable
> mtrr on DomU guest to debug this. I am new to Xen.

No, it's not possible to disable mtrr and expect any pass through
PCI devices to work. The mtrr registers merely indicate what
portion of the memory map should be treated as normal memory (meaning
cacheable) and what should be treated as MMIO memory (meaning generally
non-cacheable). That's all they do. The mtts table allocation
failures are actually totally different. In the VF on DomU they
are passed to the PF on Dom0 and the command is done there. However,
the number of mtts available for the slave are limited (check the
code in resource_tracker.c). In addition, the number of mtts
allocated is proportionally related to memory size in the guest
and inversely related to the log2_mtts_per_seg (probably both on
Dom0 and DomU, which I suspect need to agree on the log2_mtts_per_seg
module parameter). I would try a combination of reducing the memory
in the quest, or increasing the log2_mtts_per_seg, or both, and
see if you can get it to work.

--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD
http://people.redhat.com/dledford


2014-05-01 16:04:13

by Doug Ledford

[permalink] [raw]
Subject: Re: NFSoRDMA developers bi-weekly meeting announcement (4/30)

On 05/01/2014, Shirley Ma wrote:
> On 04/30/2014 04:58 PM, Doug Ledford wrote:
> > On 04/302014 Shirley Ma wrote:
> >> I've created Xen guest on DomU. Dom0 PF works which has no mtts
> >> been
> >> enabled, however DomU I hit this problem by just mounting the file
> >> system:
> >> mlx4_core 0000:00:04.0: Failed to allocate mtts for 66 pages(order
> >> 7)
> >> mlx4_core 0000:00:04.0: Failed to allocate mtts for 4096
> >> pages(order
> >> 12)
> >> mlx4_core 0000:00:04.0: Failed to allocate mtts for 4096
> >> pages(order
> >> 12)
> >>
> >> RDMA microbenchmark perftest works ok. I enabled mtts scripts when
> >> booting the Xen guest. cat /proc/mtrr:
> > What OS/RDMA stack are you using? I'm not familiar with any mtts
> > scripts, however I know there is an mtrr fixup script I wrote for
> > the RDMA stack in Fedora/RHEL (and so I assume it's in Oracle Linux
> > too, but I haven't checked). In fact, I assume that's the script
> > you are referring to based on the fact that your next bit of your
> > email cats the /proc/mtrr file. But I don't believe whether there
> > is an mtrr setting mixup or not that is should have any impact on
> > the mtts allocations in the driver. Even if your mtrr registers
> > were set incorrectly, the problem then becomes either A) a serious
> > performance bottleneck (in the case of Intel hardware that needs
> > write combining in order to get more than about 50MByte/s of
> > throughput on their cards) or B) failed operation because MMIO
> > writes to the card are being cached/write combined when they should
> > not be.
> >
> > I suspect this is more likely Xen related than mtts/mtrr related.
> Yes. That's the script I used. I wonder whether it's possible to
> disable
> mtrr on DomU guest to debug this. I am new to Xen.

No, it's not possible to disable mtrr and expect any pass through
PCI devices to work. The mtrr registers merely indicate what
portion of the memory map should be treated as normal memory (meaning
cacheable) and what should be treated as MMIO memory (meaning generally
non-cacheable). That's all they do. The mtts table allocation
failures are actually totally different. In the VF on DomU they
are passed to the PF on Dom0 and the command is done there. However,
the number of mtts available for the slave are limited (check the
code in resource_tracker.c). In addition, the number of mtts
allocated is proportionally related to memory size in the guest
and inversely related to the log2_mtts_per_seg (probably both on
Dom0 and DomU, which I suspect need to agree on the log2_mtts_per_seg
module parameter). I would try a combination of reducing the memory
in the quest, or increasing the log2_mtts_per_seg, or both, and
see if you can get it to work.

--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD
http://people.redhat.com/dledford


2014-05-01 16:03:41

by Doug Ledford

[permalink] [raw]
Subject: Re: NFSoRDMA developers bi-weekly meeting announcement (4/30)

On 05/01/2014, Shirley Ma wrote:
> On 04/30/2014 04:58 PM, Doug Ledford wrote:
> > On 04/302014 Shirley Ma wrote:
> >> I've created Xen guest on DomU. Dom0 PF works which has no mtts
> >> been
> >> enabled, however DomU I hit this problem by just mounting the file
> >> system:
> >> mlx4_core 0000:00:04.0: Failed to allocate mtts for 66 pages(order
> >> 7)
> >> mlx4_core 0000:00:04.0: Failed to allocate mtts for 4096
> >> pages(order
> >> 12)
> >> mlx4_core 0000:00:04.0: Failed to allocate mtts for 4096
> >> pages(order
> >> 12)
> >>
> >> RDMA microbenchmark perftest works ok. I enabled mtts scripts when
> >> booting the Xen guest. cat /proc/mtrr:
> > What OS/RDMA stack are you using? I'm not familiar with any mtts
> > scripts, however I know there is an mtrr fixup script I wrote for
> > the RDMA stack in Fedora/RHEL (and so I assume it's in Oracle Linux
> > too, but I haven't checked). In fact, I assume that's the script
> > you are referring to based on the fact that your next bit of your
> > email cats the /proc/mtrr file. But I don't believe whether there
> > is an mtrr setting mixup or not that is should have any impact on
> > the mtts allocations in the driver. Even if your mtrr registers
> > were set incorrectly, the problem then becomes either A) a serious
> > performance bottleneck (in the case of Intel hardware that needs
> > write combining in order to get more than about 50MByte/s of
> > throughput on their cards) or B) failed operation because MMIO
> > writes to the card are being cached/write combined when they should
> > not be.
> >
> > I suspect this is more likely Xen related than mtts/mtrr related.
> Yes. That's the script I used. I wonder whether it's possible to
> disable
> mtrr on DomU guest to debug this. I am new to Xen.

No, it's not possible to disable mtrr and expect any pass through
PCI devices to work. The mtrr registers merely indicate what
portion of the memory map should be treated as normal memory (meaning
cacheable) and what should be treated as MMIO memory (meaning generally
non-cacheable). That's all they do. The mtts table allocation
failures are actually totally different. In the VF on DomU they
are passed to the PF on Dom0 and the command is done there. However,
the number of mtts available for the slave are limited (check the
code in resource_tracker.c). In addition, the number of mtts
allocated is proportionally related to memory size in the guest
and inversely related to the log2_mtts_per_seg (probably both on
Dom0 and DomU, which I suspect need to agree on the log2_mtts_per_seg
module parameter). I would try a combination of reducing the memory
in the quest, or increasing the log2_mtts_per_seg, or both, and
see if you can get it to work.

--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD
http://people.redhat.com/dledford


2014-04-30 23:58:28

by Doug Ledford

[permalink] [raw]
Subject: Re: NFSoRDMA developers bi-weekly meeting announcement (4/30)

On 04/302014 Shirley Ma wrote:
> On 04/30/2014 01:00 PM, Or Gerlitz wrote:
> > On Wed, Apr 30, 2014 at 10:47 PM, Chuck Lever
> > <[email protected]>
> >
> >> If I understood Yan, he is trying to use NFS/RDMA in guests
> >> (kvm?). We
> >> are pretty sure that is not working at the moment,
> > can you provide a short 1-2 liner why/what is broken there? the
> > only
> > thing which I can think of to be not-supported over mlx4 VFs is the
> > proprietary FMRs, but AFAIK, the nfs-rdma code doesn't even have a
> > mode which uses them, right?
> I've created Xen guest on DomU. Dom0 PF works which has no mtts been
> enabled, however DomU I hit this problem by just mounting the file
> system:
> mlx4_core 0000:00:04.0: Failed to allocate mtts for 66 pages(order 7)
> mlx4_core 0000:00:04.0: Failed to allocate mtts for 4096 pages(order
> 12)
> mlx4_core 0000:00:04.0: Failed to allocate mtts for 4096 pages(order
> 12)
>
> RDMA microbenchmark perftest works ok. I enabled mtts scripts when
> booting the Xen guest. cat /proc/mtrr:

What OS/RDMA stack are you using? I'm not familiar with any mtts
scripts, however I know there is an mtrr fixup script I wrote for
the RDMA stack in Fedora/RHEL (and so I assume it's in Oracle Linux
too, but I haven't checked). In fact, I assume that's the script
you are referring to based on the fact that your next bit of your
email cats the /proc/mtrr file. But I don't believe whether there
is an mtrr setting mixup or not that is should have any impact on
the mtts allocations in the driver. Even if your mtrr registers
were set incorrectly, the problem then becomes either A) a serious
performance bottleneck (in the case of Intel hardware that needs
write combining in order to get more than about 50MByte/s of
throughput on their cards) or B) failed operation because MMIO
writes to the card are being cached/write combined when they should
not be.

I suspect this is more likely Xen related than mtts/mtrr related.

> [root@ca-nfsdev1vm1 log]# cat /proc/mtrr
> reg00: base=0x0f0000000 ( 3840MB), size= 128MB, count=1: uncachable
> reg01: base=0x0f8000000 ( 3968MB), size= 64MB, count=1: uncachable
>
> lspci -v
> 00:04.0 InfiniBand: Mellanox Technologies MT25400 Family [ConnectX-2
> Virtual Function] (rev b0)
> Subsystem: Mellanox Technologies Device 61b0
> Physical Slot: 4
> Flags: bus master, fast devsel, latency 0
> Memory at f0000000 (64-bit, prefetchable) [size=128M]
> Capabilities: [60] Express Endpoint, MSI 00
> Capabilities: [9c] MSI-X: Enable+ Count=4 Masked-
> Kernel driver in use: mlx4_core
> Kernel modules: mlx4_core
>
> I will need to find another machine to try KVM guest. Yan might hit a
> different problem.
>
> I have ConnectX-2, FW level is 2.11.2012. Yan has ConnectX-3, he
> tried
> it on KVM guest.
> >> but that is a priority
> >> to get fixed. Shirley has a lab set up and has been looking into
> >> it.
> Shirley
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD
http://people.redhat.com/dledford


2014-04-30 23:58:07

by Doug Ledford

[permalink] [raw]
Subject: Re: NFSoRDMA developers bi-weekly meeting announcement (4/30)

On 04/302014 Shirley Ma wrote:
> On 04/30/2014 01:00 PM, Or Gerlitz wrote:
> > On Wed, Apr 30, 2014 at 10:47 PM, Chuck Lever
> > <[email protected]>
> >
> >> If I understood Yan, he is trying to use NFS/RDMA in guests
> >> (kvm?). We
> >> are pretty sure that is not working at the moment,
> > can you provide a short 1-2 liner why/what is broken there? the
> > only
> > thing which I can think of to be not-supported over mlx4 VFs is the
> > proprietary FMRs, but AFAIK, the nfs-rdma code doesn't even have a
> > mode which uses them, right?
> I've created Xen guest on DomU. Dom0 PF works which has no mtts been
> enabled, however DomU I hit this problem by just mounting the file
> system:
> mlx4_core 0000:00:04.0: Failed to allocate mtts for 66 pages(order 7)
> mlx4_core 0000:00:04.0: Failed to allocate mtts for 4096 pages(order
> 12)
> mlx4_core 0000:00:04.0: Failed to allocate mtts for 4096 pages(order
> 12)
>
> RDMA microbenchmark perftest works ok. I enabled mtts scripts when
> booting the Xen guest. cat /proc/mtrr:

What OS/RDMA stack are you using? I'm not familiar with any mtts
scripts, however I know there is an mtrr fixup script I wrote for
the RDMA stack in Fedora/RHEL (and so I assume it's in Oracle Linux
too, but I haven't checked). In fact, I assume that's the script
you are referring to based on the fact that your next bit of your
email cats the /proc/mtrr file. But I don't believe whether there
is an mtrr setting mixup or not that is should have any impact on
the mtts allocations in the driver. Even if your mtrr registers
were set incorrectly, the problem then becomes either A) a serious
performance bottleneck (in the case of Intel hardware that needs
write combining in order to get more than about 50MByte/s of
throughput on their cards) or B) failed operation because MMIO
writes to the card are being cached/write combined when they should
not be.

I suspect this is more likely Xen related than mtts/mtrr related.

> [root@ca-nfsdev1vm1 log]# cat /proc/mtrr
> reg00: base=0x0f0000000 ( 3840MB), size= 128MB, count=1: uncachable
> reg01: base=0x0f8000000 ( 3968MB), size= 64MB, count=1: uncachable
>
> lspci -v
> 00:04.0 InfiniBand: Mellanox Technologies MT25400 Family [ConnectX-2
> Virtual Function] (rev b0)
> Subsystem: Mellanox Technologies Device 61b0
> Physical Slot: 4
> Flags: bus master, fast devsel, latency 0
> Memory at f0000000 (64-bit, prefetchable) [size=128M]
> Capabilities: [60] Express Endpoint, MSI 00
> Capabilities: [9c] MSI-X: Enable+ Count=4 Masked-
> Kernel driver in use: mlx4_core
> Kernel modules: mlx4_core
>
> I will need to find another machine to try KVM guest. Yan might hit a
> different problem.
>
> I have ConnectX-2, FW level is 2.11.2012. Yan has ConnectX-3, he
> tried
> it on KVM guest.
> >> but that is a priority
> >> to get fixed. Shirley has a lab set up and has been looking into
> >> it.
> Shirley
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD
http://people.redhat.com/dledford