2013-02-06 15:48:22

by Yan Burman

[permalink] [raw]
Subject: NFS over RDMA crashing

Hi.

I have been trying to create a setup with NFS/RDMA, but I am getting
crashes.

I am using Mellanox ConnectX 3 HCA with SRIOV enabled with two KVM VMs
with RHEL 6.3 getting one VF each.
My test case is trying to use one VM's storage from another using NFS
over RDMA (192.168.20.210 server, 192.168.20.211 client)
I started with two physical hosts, but because of crashes moved to VMs
which are easier to debug.

I have functional ipoib connection between the two VMs and rping is
working between them also.

My /etc/exports has the following entry:
/mnt/tmp *(fsid=1,rw,async,insecure,all_squash)
while /mnt/tmp has tmpfs mounted on it.

My mount command is:
mount -t nfs -o rdma,port=2050 192.168.20.210:/mnt/tmp /mnt/tmp


I have tried latest net-next kernel first, but I was getting the
following errors:

=============================================
[ INFO: possible recursive locking detected ]
3.8.0-rc5+ #4 Not tainted
---------------------------------------------
kworker/6:0/49 is trying to acquire lock:
(&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa05e7813>]
rdma_destroy_id+0x33/0x250 [rdma_cm]

but task is already holding lock:
(&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa05e317b>]
cma_disable_callback+0x2b/0x60 [rdma_cm]

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&id_priv->handler_mutex);
lock(&id_priv->handler_mutex);

*** DEADLOCK ***

May be due to missing lock nesting notation

3 locks held by kworker/6:0/49:
#0: (ib_cm){.+.+.+}, at: [<ffffffff81068f50>]
process_one_work+0x160/0x720
#1: ((&(&work->work)->work)){+.+.+.}, at: [<ffffffff81068f50>]
process_one_work+0x160/0x720
#2: (&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa05e317b>]
cma_disable_callback+0x2b/0x60 [rdma_cm]

stack backtrace:
Pid: 49, comm: kworker/6:0 Not tainted 3.8.0-rc5+ #4
Call Trace:
[<ffffffff8109f99c>] validate_chain+0xdcc/0x11f0
[<ffffffff8109bdcf>] ? save_trace+0x3f/0xc0
[<ffffffff810a0760>] __lock_acquire+0x440/0xc30
[<ffffffff810a0760>] ? __lock_acquire+0x440/0xc30
[<ffffffff810a0fe5>] lock_acquire+0x95/0x1e0
[<ffffffffa05e7813>] ? rdma_destroy_id+0x33/0x250 [rdma_cm]
[<ffffffffa05e7813>] ? rdma_destroy_id+0x33/0x250 [rdma_cm]
[<ffffffff814a9aff>] mutex_lock_nested+0x5f/0x3b0
[<ffffffffa05e7813>] ? rdma_destroy_id+0x33/0x250 [rdma_cm]
[<ffffffff8109d68d>] ? trace_hardirqs_on_caller+0x10d/0x1a0
[<ffffffff8109d72d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff814aca3d>] ? _raw_spin_unlock_irqrestore+0x3d/0x80
[<ffffffffa05e7813>] rdma_destroy_id+0x33/0x250 [rdma_cm]
[<ffffffffa05e8f99>] cma_req_handler+0x719/0x730 [rdma_cm]
[<ffffffff814aca04>] ? _raw_spin_unlock_irqrestore+0x4/0x80
[<ffffffffa05d5772>] cm_process_work+0x22/0x170 [ib_cm]
[<ffffffffa05d6acd>] cm_req_handler+0x67d/0xa70 [ib_cm]
[<ffffffffa05d6fed>] cm_work_handler+0x12d/0x1218 [ib_cm]
[<ffffffff81068fc2>] process_one_work+0x1d2/0x720
[<ffffffff81068f50>] ? process_one_work+0x160/0x720
[<ffffffffa05d6ec0>] ? cm_req_handler+0xa70/0xa70 [ib_cm]
[<ffffffff81069930>] worker_thread+0x120/0x460
[<ffffffff814ab4b4>] ? preempt_schedule+0x44/0x60
[<ffffffff81069810>] ? manage_workers+0x300/0x300
[<ffffffff81071df6>] kthread+0xd6/0xe0
[<ffffffff81071d20>] ? __init_kthread_worker+0x70/0x70
[<ffffffff814b462c>] ret_from_fork+0x7c/0xb0
[<ffffffff81071d20>] ? __init_kthread_worker+0x70/0x70


When killing mount command that got stuck:
-------------------------------------------

BUG: unable to handle kernel paging request at ffff880324dc7ff8
IP: [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma]
PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 8000000324dc7161
Oops: 0003 [#1] PREEMPT SMP
Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm iw_cm
ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables
iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables
nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock
target_core_file target_core_pscsi target_core_mod configfs 8021q bridge
stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan
tun uinput iTCO_wdt iTCO_vendor_support kvm_intel kvm crc32c_intel
microcode pcspkr joydev i2c_i801 lpc_ich mfd_core ehci_pci ehci_hcd sg
ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core igb
hwmon dca ptp pps_core button dm_mod ext3 jbd sd_mod ata_piix libata
uhci_hcd megaraid_sas scsi_mod
CPU 6
Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro
X8DTH-i/6/iF/6F/X8DTH
RIP: 0010:[<ffffffffa05f3dfb>] [<ffffffffa05f3dfb>]
rdma_read_xdr+0x8bb/0xd40 [svcrdma]
RSP: 0018:ffff880324c3dbf8 EFLAGS: 00010297
RAX: ffff880324dc8000 RBX: 0000000000000001 RCX: ffff880324dd8428
RDX: ffff880324dc7ff8 RSI: ffff880324dd8428 RDI: ffffffff81149618
RBP: ffff880324c3dd78 R08: 000060f9c0000860 R09: 0000000000000001
R10: ffff880324dd8000 R11: 0000000000000001 R12: ffff8806299dcb10
R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000010
FS: 0000000000000000(0000) GS:ffff88063fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff880324dc7ff8 CR3: 0000000001a0b000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nfsd (pid: 4744, threadinfo ffff880324c3c000, task ffff880330550000)
Stack:
ffff880324c3dc78 ffff880324c3dcd8 0000000000000282 ffff880631cec000
ffff880324dd8000 ffff88062ed33040 0000000124c3dc48 ffff880324dd8000
ffff88062ed33058 ffff880630ce2b90 ffff8806299e8000 0000000000000003
Call Trace:
[<ffffffffa05f466e>] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma]
[<ffffffff81086540>] ? try_to_wake_up+0x2f0/0x2f0
[<ffffffffa045963f>] svc_recv+0x3ef/0x4b0 [sunrpc]
[<ffffffffa0571db0>] ? nfsd_svc+0x740/0x740 [nfsd]
[<ffffffffa0571e5d>] nfsd+0xad/0x130 [nfsd]
[<ffffffffa0571db0>] ? nfsd_svc+0x740/0x740 [nfsd]
[<ffffffff81071df6>] kthread+0xd6/0xe0
[<ffffffff81071d20>] ? __init_kthread_worker+0x70/0x70
[<ffffffff814b462c>] ret_from_fork+0x7c/0xb0
[<ffffffff81071d20>] ? __init_kthread_worker+0x70/0x70
Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00 00
48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00 <48> c7
40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00
RIP [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma]
RSP <ffff880324c3dbf8>
CR2: ffff880324dc7ff8
---[ end trace 06d0384754e9609a ]---


It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e "nfsd4:
cleanup: replace rq_resused count by rq_next_page pointer"
is responsible for the crash (it seems to be crashing in
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527)
It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and
CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet.

When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I was no
longer getting the server crashes,
so the reset of my tests were done using that point (it is somewhere in
the middle of 3.7.0-rc2).


Now, I was getting the client stuck on running following messages all
over again:
rpcrdma: connection to 192.168.20.210:2050 on mlx4_0, memreg 6 slots 32
ird 16
rpcrdma: connection to 192.168.20.210:2050 closed (-103)

The next step was to change the memory policy to RPCRDMA_ALLPHYSICAL,
since there were IB_WC_LOC_ACCESS_ERR errors.
I found that doing "echo 6 > /proc/sys/sunrpc/rdma_memreg_strategy" was
not enough.
We had to change the code a little bit for it to work. Here's the change
for the test:

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c
b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 62e4f9b..c660f61 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -970,6 +970,8 @@ static struct svc_xprt *svc_rdma_accept(struct
svc_xprt *xprt)
* NB: iWARP requires remote write access for the data sink
* of an RDMA_READ. IB does not.
*/
+ devattr.device_cap_flags &=
~(IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY);
+
if (devattr.device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) {
newxprt->sc_frmr_pg_list_len =
devattr.max_fast_reg_page_list_len;
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 745973b..0e3da28 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -488,10 +488,12 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct
sockaddr *addr, int memreg)
goto out2;
}

+#if 0
if (devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY) {
ia->ri_have_dma_lkey = 1;
ia->ri_dma_lkey = ia->ri_id->device->local_dma_lkey;
}
+#endif

switch (memreg) {
case RPCRDMA_MEMWINDOWS:


Once changed to RPCRDMA_ALLPHYSICAL memory policy both on server and
client, I was able to mount
successfully from the client, but when I tried to execute the following
command: "dd if=/dev/zero of=/mnt/tmp/test bs=1M count=100"
I got crashes on the server side (worked fine with regular non-rdma mount).

Here's what I'm getting on the server with rpcdebug enabled:

svcrdma: rqstp=ffff88007bd76000
svcrdma: processing ctxt=ffff88007a3b9900 on xprt=ffff880079922000,
rqstp=ffff88007bd76000, status=0
svc: transport ffff880079922000 busy, not enqueued
svc: got len=0
svc: transport ffff880079922000 served by daemon ffff88007bd74000
svc: transport ffff880079922000 busy, not enqueued
svc: server ffff88007bd76000 waiting for data (to = 900000)
svc: server ffff88007bd74000, pool 0, transport ffff880079922000, inuse=55
svcrdma: rqstp=ffff88007bd74000
svcrdma: deferred read ret=262312, rq_arg.len =263900,
rq_arg.head[0].iov_base=ffff88007a3cb634, rq_arg.head[0].iov_len = 152
svc: got len=262312
svc: transport ffff880079922000 served by daemon ffff88007bd76000
svc: transport ffff880079922000 busy, not enqueued
general protection fault: 0000 [#1] PREEMPT SMP
Modules linked in: md5 xprtrdma svcrdma netconsole configfs nfsv3 nfsv4
nfs ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm
ib_addr nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc 8021q
ipv6 dm_mirror dm_region_hash dm_log uinput microcode joydev pcspkr
mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core virtio_balloon cirrus ttm
drm_kms_helper sysimgblt sysfillrect syscopyarea i2c_piix4 button dm_mod
ext3 jbd virtio_blk virtio_net virtio_pci virtio_ring virtio uhci_hcd
CPU 1
Pid: 2479, comm: nfsd Not tainted 3.7.0-rc2+ #1 Red Hat KVM
RIP: 0010:[<ffffffffa024015f>] [<ffffffffa024015f>]
svc_process_common+0x6f/0x690 [sunrpc]
RSP: 0018:ffff88007b93fd98 EFLAGS: 00010212
RAX: 0000000000000000 RBX: ffff88007bd741d8 RCX: 000000003fbd438d
RDX: 0005080000000000 RSI: ffff88007bd74198 RDI: ffff88007bd74000
RBP: ffff88007b93fe08 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007bd74000
R13: ffff880037af2c70 R14: ffff88007b949000 R15: ffff88007bd74198
FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f8c86040000 CR3: 00000000799d7000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nfsd (pid: 2479, threadinfo ffff88007b93e000, task ffff880037af2c70)
Stack:
ffff88007b949000 ffff880079922000 0000000000000000 ffff880079922038
ffff88007b93fdd8 ffffffffa02524f4 0000000000000001 ffff88007bd74000
ffff88007b93fe28 ffff88007bd74000 ffff88007b949000 ffff880037af2c70
Call Trace:
[<ffffffffa02524f4>] ? svc_xprt_received+0x34/0x60 [sunrpc]
[<ffffffffa029adb0>] ? nfsd_svc+0x740/0x740 [nfsd]
[<ffffffffa0240abd>] svc_process+0xfd/0x150 [sunrpc]
[<ffffffffa029ae6f>] nfsd+0xbf/0x130 [nfsd]
[<ffffffffa029adb0>] ? nfsd_svc+0x740/0x740 [nfsd]
[<ffffffff810724f6>] kthread+0xd6/0xe0
[<ffffffff81072420>] ? __init_kthread_worker+0x70/0x70
[<ffffffff814a9b6c>] ret_from_fork+0x7c/0xb0
[<ffffffff81072420>] ? __init_kthread_worker+0x70/0x70
Code: 01 00 00 00 c7 87 88 01 00 00 01 00 00 00 c6 87 b8 1a 00 00 00 48
8b 40 08 ff 50 20 48 8b 43 08 41 8b 8c 24 70 1a 00 00 48 8b 13 <89> 0c
02 48 83 43 08 04 49 8b 17 8b 02 48 83 c2 04 49 83 6f 08
RIP [<ffffffffa024015f>] svc_process_common+0x6f/0x690 [sunrpc]
RSP <ffff88007b93fd98>
svc: transport ffff880079922000 busy, not enqueued
svc: transport ffff880079922000 busy, not enqueued
svc: server ffff88007bd76000, pool 0, transport ffff880079922000, inuse=4
svcrdma: rqstp=ffff88007bd76000
svcrdma: deferred read ret=262312, rq_arg.len =263900,
rq_arg.head[0].iov_base=ffff88007a23a634, rq_arg.head[0].iov_len = 152
---[ end trace 46c56fc306f2fb0b ]---
svc: got len=262312
svc: transport ffff880079922000 served by daemon ffff88007bd72000
svc: server ffff88007bd72000, pool 0, transport ffff880079922000, inuse=5
svcrdma: rqstp=ffff88007bd72000
svcrdma: processing ctxt=ffff88007a3a4b00 on xprt=ffff880079922000,
rqstp=ffff88007bd72000, status=0
svc: got len=0
svc: transport ffff880079922000 served by daemon ffff88007bd70000
svc: transport ffff880079922000 busy, not enqueued
svc: server ffff88007bd72000 waiting for data (to = 900000)
svc: server ffff88007bd70000, pool 0, transport ffff880079922000, inuse=70
svcrdma: rqstp=ffff88007bd70000
svcrdma: processing ctxt=ffff88007a3b8000 on xprt=ffff880079922000,
rqstp=ffff88007bd70000, status=0

And the same crash continues...

Please advise

Yan



2013-02-08 06:07:25

by Tom Tucker

[permalink] [raw]
Subject: Re: NFS over RDMA crashing

On 2/6/13 3:28 PM, Steve Wise wrote:
> On 2/6/2013 4:24 PM, J. Bruce Fields wrote:
>> On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote:
>>> When killing mount command that got stuck:
>>> -------------------------------------------
>>>
>>> BUG: unable to handle kernel paging request at ffff880324dc7ff8
>>> IP: [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma]
>>> PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 8000000324dc7161
>>> Oops: 0003 [#1] PREEMPT SMP
>>> Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm iw_cm
>>> ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables
>>> iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables
>>> nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock
>>> target_core_file target_core_pscsi target_core_mod configfs 8021q
>>> bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net
>>> macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel
>>> kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core
>>> ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core
>>> mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3 jbd
>>> sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod
>>> CPU 6
>>> Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro
>>> X8DTH-i/6/iF/6F/X8DTH
>>> RIP: 0010:[<ffffffffa05f3dfb>] [<ffffffffa05f3dfb>]
>>> rdma_read_xdr+0x8bb/0xd40 [svcrdma]
>>> RSP: 0018:ffff880324c3dbf8 EFLAGS: 00010297
>>> RAX: ffff880324dc8000 RBX: 0000000000000001 RCX: ffff880324dd8428
>>> RDX: ffff880324dc7ff8 RSI: ffff880324dd8428 RDI: ffffffff81149618
>>> RBP: ffff880324c3dd78 R08: 000060f9c0000860 R09: 0000000000000001
>>> R10: ffff880324dd8000 R11: 0000000000000001 R12: ffff8806299dcb10
>>> R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000010
>>> FS: 0000000000000000(0000) GS:ffff88063fc00000(0000)
>>> knlGS:0000000000000000
>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> CR2: ffff880324dc7ff8 CR3: 0000000001a0b000 CR4: 00000000000007e0
>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> Process nfsd (pid: 4744, threadinfo ffff880324c3c000, task
>>> ffff880330550000)
>>> Stack:
>>> ffff880324c3dc78 ffff880324c3dcd8 0000000000000282 ffff880631cec000
>>> ffff880324dd8000 ffff88062ed33040 0000000124c3dc48 ffff880324dd8000
>>> ffff88062ed33058 ffff880630ce2b90 ffff8806299e8000 0000000000000003
>>> Call Trace:
>>> [<ffffffffa05f466e>] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma]
>>> [<ffffffff81086540>] ? try_to_wake_up+0x2f0/0x2f0
>>> [<ffffffffa045963f>] svc_recv+0x3ef/0x4b0 [sunrpc]
>>> [<ffffffffa0571db0>] ? nfsd_svc+0x740/0x740 [nfsd]
>>> [<ffffffffa0571e5d>] nfsd+0xad/0x130 [nfsd]
>>> [<ffffffffa0571db0>] ? nfsd_svc+0x740/0x740 [nfsd]
>>> [<ffffffff81071df6>] kthread+0xd6/0xe0
>>> [<ffffffff81071d20>] ? __init_kthread_worker+0x70/0x70
>>> [<ffffffff814b462c>] ret_from_fork+0x7c/0xb0
>>> [<ffffffff81071d20>] ? __init_kthread_worker+0x70/0x70
>>> Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00
>>> 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00
>>> <48> c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00
>>> RIP [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma]
>>> RSP <ffff880324c3dbf8>
>>> CR2: ffff880324dc7ff8
>>> ---[ end trace 06d0384754e9609a ]---
>>>
>>>
>>> It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e
>>> "nfsd4: cleanup: replace rq_resused count by rq_next_page pointer"
>>> is responsible for the crash (it seems to be crashing in
>>> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527)
>>> It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and
>>> CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet.
>>>
>>> When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I
>>> was no longer getting the server crashes,
>>> so the reset of my tests were done using that point (it is somewhere
>>> in the middle of 3.7.0-rc2).
>> OK, so this part's clearly my fault--I'll work on a patch, but the
>> rdma's use of the ->rq_pages array is pretty confusing.
>
> Maybe Tom can shed some light?

Yes, the RDMA transport has two confusing tweaks on rq_pages. Most
transports (UDP/TCP) use the rq_pages allocated by SVC. For RDMA,
however, the RQ already contains pre-allocated memory that will contain
inbound NFS requests from the client. Instead of copying this data from
the per-registered receive buffer into the buffer in rq_pages, I just
replace the page in rq_pages with the one that already contains the data.

The second somewhat strange thing is that the NFS request contains an
NFSRDMA header. This is just like TCP (i.e. the 4B length), however, the
difference is that (unlike TCP) this header is needed for the response
because it maps out where in the client the response data will be written.

Tom

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2013-02-11 18:13:41

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS over RDMA crashing

On Mon, Feb 11, 2013 at 03:19:42PM +0000, Yan Burman wrote:
> > -----Original Message-----
> > From: J. Bruce Fields [mailto:[email protected]]
> > Sent: Thursday, February 07, 2013 18:42
> > To: Yan Burman
> > Cc: [email protected]; [email protected]; linux-
> > [email protected]; Or Gerlitz
> > Subject: Re: NFS over RDMA crashing
> >
> > On Wed, Feb 06, 2013 at 05:24:35PM -0500, J. Bruce Fields wrote:
> > > On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote:
> > > > When killing mount command that got stuck:
> > > > -------------------------------------------
> > > >
> > > > BUG: unable to handle kernel paging request at ffff880324dc7ff8
> > > > IP: [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma] PGD
> > > > 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 8000000324dc7161
> > > > Oops: 0003 [#1] PREEMPT SMP
> > > > Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm
> > iw_cm
> > > > ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables
> > > > iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables
> > > > nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock
> > > > target_core_file target_core_pscsi target_core_mod configfs 8021q
> > > > bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net
> > > > macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel
> > > > kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core
> > > > ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core
> > > > mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3
> > jbd
> > > > sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod CPU 6
> > > > Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro
> > > > X8DTH-i/6/iF/6F/X8DTH
> > > > RIP: 0010:[<ffffffffa05f3dfb>] [<ffffffffa05f3dfb>]
> > > > rdma_read_xdr+0x8bb/0xd40 [svcrdma]
> > > > RSP: 0018:ffff880324c3dbf8 EFLAGS: 00010297
> > > > RAX: ffff880324dc8000 RBX: 0000000000000001 RCX: ffff880324dd8428
> > > > RDX: ffff880324dc7ff8 RSI: ffff880324dd8428 RDI: ffffffff81149618
> > > > RBP: ffff880324c3dd78 R08: 000060f9c0000860 R09: 0000000000000001
> > > > R10: ffff880324dd8000 R11: 0000000000000001 R12: ffff8806299dcb10
> > > > R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000010
> > > > FS: 0000000000000000(0000) GS:ffff88063fc00000(0000)
> > > > knlGS:0000000000000000
> > > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > > CR2: ffff880324dc7ff8 CR3: 0000000001a0b000 CR4: 00000000000007e0
> > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > > Process nfsd (pid: 4744, threadinfo ffff880324c3c000, task
> > > > ffff880330550000)
> > > > Stack:
> > > > ffff880324c3dc78 ffff880324c3dcd8 0000000000000282 ffff880631cec000
> > > > ffff880324dd8000 ffff88062ed33040 0000000124c3dc48 ffff880324dd8000
> > > > ffff88062ed33058 ffff880630ce2b90 ffff8806299e8000 0000000000000003
> > > > Call Trace:
> > > > [<ffffffffa05f466e>] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma]
> > > > [<ffffffff81086540>] ? try_to_wake_up+0x2f0/0x2f0
> > > > [<ffffffffa045963f>] svc_recv+0x3ef/0x4b0 [sunrpc]
> > > > [<ffffffffa0571db0>] ? nfsd_svc+0x740/0x740 [nfsd]
> > > > [<ffffffffa0571e5d>] nfsd+0xad/0x130 [nfsd] [<ffffffffa0571db0>] ?
> > > > nfsd_svc+0x740/0x740 [nfsd] [<ffffffff81071df6>] kthread+0xd6/0xe0
> > > > [<ffffffff81071d20>] ? __init_kthread_worker+0x70/0x70
> > > > [<ffffffff814b462c>] ret_from_fork+0x7c/0xb0 [<ffffffff81071d20>] ?
> > > > __init_kthread_worker+0x70/0x70
> > > > Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00
> > > > 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00
> > > > <48> c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00 RIP
> > > > [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP
> > > > <ffff880324c3dbf8>
> > > > CR2: ffff880324dc7ff8
> > > > ---[ end trace 06d0384754e9609a ]---
> > > >
> > > >
> > > > It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e
> > > > "nfsd4: cleanup: replace rq_resused count by rq_next_page pointer"
> > > > is responsible for the crash (it seems to be crashing in
> > > > net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527)
> > > > It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and
> > > > CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet.
> > > >
> > > > When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I
> > > > was no longer getting the server crashes, so the reset of my tests
> > > > were done using that point (it is somewhere in the middle of
> > > > 3.7.0-rc2).
> > >
> > > OK, so this part's clearly my fault--I'll work on a patch, but the
> > > rdma's use of the ->rq_pages array is pretty confusing.
> >
> > Does this help?
> >
> > They must have added this for some reason, but I'm not seeing how it could
> > have ever done anything....
> >
> > --b.
> >
> > diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > index 0ce7552..e8f25ec 100644
> > --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > @@ -520,13 +520,6 @@ next_sge:
> > for (ch_no = 0; &rqstp->rq_pages[ch_no] < rqstp->rq_respages;
> > ch_no++)
> > rqstp->rq_pages[ch_no] = NULL;
> >
> > - /*
> > - * Detach res pages. If svc_release sees any it will attempt to
> > - * put them.
> > - */
> > - while (rqstp->rq_next_page != rqstp->rq_respages)
> > - *(--rqstp->rq_next_page) = NULL;
> > -
> > return err;
> > }
> >
>
> I've been trying to reproduce the problem, but for some reason it does not happen anymore.
> The crash is not happening even without the patch now, but NFS over RDMA in 3.8.0-rc5 from net-next is not working.
> When running server and client in VM with SRIOV, it times out when trying to mount and oopses on the client when mount command is interrupted.
> When running two physical hosts, I get to mount the remote directory, but reading or writing fails with IO error.
>
> I am still doing some checks - I will post my findings when I will have more information.

OK, thanks for keeping us updated.

--b.

2013-02-11 15:23:10

by Yan Burman

[permalink] [raw]
Subject: RE: NFS over RDMA crashing

> -----Original Message-----
> From: J. Bruce Fields [mailto:[email protected]]
> Sent: Thursday, February 07, 2013 18:42
> To: Yan Burman
> Cc: [email protected]; [email protected]; linux-
> [email protected]; Or Gerlitz
> Subject: Re: NFS over RDMA crashing
>
> On Wed, Feb 06, 2013 at 05:24:35PM -0500, J. Bruce Fields wrote:
> > On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote:
> > > When killing mount command that got stuck:
> > > -------------------------------------------
> > >
> > > BUG: unable to handle kernel paging request at ffff880324dc7ff8
> > > IP: [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma] PGD
> > > 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 8000000324dc7161
> > > Oops: 0003 [#1] PREEMPT SMP
> > > Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm
> iw_cm
> > > ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables
> > > iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables
> > > nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock
> > > target_core_file target_core_pscsi target_core_mod configfs 8021q
> > > bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net
> > > macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel
> > > kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core
> > > ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core
> > > mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3
> jbd
> > > sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod CPU 6
> > > Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro
> > > X8DTH-i/6/iF/6F/X8DTH
> > > RIP: 0010:[<ffffffffa05f3dfb>] [<ffffffffa05f3dfb>]
> > > rdma_read_xdr+0x8bb/0xd40 [svcrdma]
> > > RSP: 0018:ffff880324c3dbf8 EFLAGS: 00010297
> > > RAX: ffff880324dc8000 RBX: 0000000000000001 RCX: ffff880324dd8428
> > > RDX: ffff880324dc7ff8 RSI: ffff880324dd8428 RDI: ffffffff81149618
> > > RBP: ffff880324c3dd78 R08: 000060f9c0000860 R09: 0000000000000001
> > > R10: ffff880324dd8000 R11: 0000000000000001 R12: ffff8806299dcb10
> > > R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000010
> > > FS: 0000000000000000(0000) GS:ffff88063fc00000(0000)
> > > knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > CR2: ffff880324dc7ff8 CR3: 0000000001a0b000 CR4: 00000000000007e0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > Process nfsd (pid: 4744, threadinfo ffff880324c3c000, task
> > > ffff880330550000)
> > > Stack:
> > > ffff880324c3dc78 ffff880324c3dcd8 0000000000000282 ffff880631cec000
> > > ffff880324dd8000 ffff88062ed33040 0000000124c3dc48 ffff880324dd8000
> > > ffff88062ed33058 ffff880630ce2b90 ffff8806299e8000 0000000000000003
> > > Call Trace:
> > > [<ffffffffa05f466e>] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma]
> > > [<ffffffff81086540>] ? try_to_wake_up+0x2f0/0x2f0
> > > [<ffffffffa045963f>] svc_recv+0x3ef/0x4b0 [sunrpc]
> > > [<ffffffffa0571db0>] ? nfsd_svc+0x740/0x740 [nfsd]
> > > [<ffffffffa0571e5d>] nfsd+0xad/0x130 [nfsd] [<ffffffffa0571db0>] ?
> > > nfsd_svc+0x740/0x740 [nfsd] [<ffffffff81071df6>] kthread+0xd6/0xe0
> > > [<ffffffff81071d20>] ? __init_kthread_worker+0x70/0x70
> > > [<ffffffff814b462c>] ret_from_fork+0x7c/0xb0 [<ffffffff81071d20>] ?
> > > __init_kthread_worker+0x70/0x70
> > > Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00
> > > 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00
> > > <48> c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00 RIP
> > > [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP
> > > <ffff880324c3dbf8>
> > > CR2: ffff880324dc7ff8
> > > ---[ end trace 06d0384754e9609a ]---
> > >
> > >
> > > It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e
> > > "nfsd4: cleanup: replace rq_resused count by rq_next_page pointer"
> > > is responsible for the crash (it seems to be crashing in
> > > net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527)
> > > It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and
> > > CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet.
> > >
> > > When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I
> > > was no longer getting the server crashes, so the reset of my tests
> > > were done using that point (it is somewhere in the middle of
> > > 3.7.0-rc2).
> >
> > OK, so this part's clearly my fault--I'll work on a patch, but the
> > rdma's use of the ->rq_pages array is pretty confusing.
>
> Does this help?
>
> They must have added this for some reason, but I'm not seeing how it could
> have ever done anything....
>
> --b.
>
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index 0ce7552..e8f25ec 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -520,13 +520,6 @@ next_sge:
> for (ch_no = 0; &rqstp->rq_pages[ch_no] < rqstp->rq_respages;
> ch_no++)
> rqstp->rq_pages[ch_no] = NULL;
>
> - /*
> - * Detach res pages. If svc_release sees any it will attempt to
> - * put them.
> - */
> - while (rqstp->rq_next_page != rqstp->rq_respages)
> - *(--rqstp->rq_next_page) = NULL;
> -
> return err;
> }
>

I've been trying to reproduce the problem, but for some reason it does not happen anymore.
The crash is not happening even without the patch now, but NFS over RDMA in 3.8.0-rc5 from net-next is not working.
When running server and client in VM with SRIOV, it times out when trying to mount and oopses on the client when mount command is interrupted.
When running two physical hosts, I get to mount the remote directory, but reading or writing fails with IO error.

I am still doing some checks - I will post my findings when I will have more information.


2013-02-06 16:20:46

by Steve Wise

[permalink] [raw]
Subject: Re: NFS over RDMA crashing

On 2/6/2013 9:48 AM, Yan Burman wrote:
> When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I was
> no longer getting the server crashes,
> so the reset of my tests were done using that point (it is somewhere
> in the middle of 3.7.0-rc2).
>

+tom tucker

I'd try going back a few kernels, like to 3.5.x and see if things are
more stable. If you find a point that works, then git bisect might help
identify the regression.

2013-02-18 11:44:52

by Yan Burman

[permalink] [raw]
Subject: RE: NFS over RDMA crashing


> -----Original Message-----
> From: J. Bruce Fields [mailto:[email protected]]
> Sent: Friday, February 15, 2013 17:28
> To: Yan Burman
> Cc: [email protected]; [email protected]; linux-
> [email protected]; Or Gerlitz
> Subject: Re: NFS over RDMA crashing
>
> On Mon, Feb 11, 2013 at 03:19:42PM +0000, Yan Burman wrote:
> > > -----Original Message-----
> > > From: J. Bruce Fields [mailto:[email protected]]
> > > Sent: Thursday, February 07, 2013 18:42
> > > To: Yan Burman
> > > Cc: [email protected]; [email protected]; linux-
> > > [email protected]; Or Gerlitz
> > > Subject: Re: NFS over RDMA crashing
> > >
> > > On Wed, Feb 06, 2013 at 05:24:35PM -0500, J. Bruce Fields wrote:
> > > > On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote:
> > > > > When killing mount command that got stuck:
> > > > > -------------------------------------------
> > > > >
> > > > > BUG: unable to handle kernel paging request at ffff880324dc7ff8
> > > > > IP: [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma] PGD
> > > > > 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 8000000324dc7161
> > > > > Oops: 0003 [#1] PREEMPT SMP
> > > > > Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm
> > > iw_cm
> > > > > ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables
> > > > > iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables
> > > > > x_tables
> > > > > nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock
> > > > > target_core_file target_core_pscsi target_core_mod configfs
> > > > > 8021q bridge stp llc ipv6 dm_mirror dm_region_hash dm_log
> > > > > vhost_net macvtap macvlan tun uinput iTCO_wdt
> > > > > iTCO_vendor_support kvm_intel kvm crc32c_intel microcode pcspkr
> > > > > joydev i2c_i801 lpc_ich mfd_core ehci_pci ehci_hcd sg ioatdma
> > > > > ixgbe mdio mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core igb
> > > > > hwmon dca ptp pps_core button dm_mod ext3
> > > jbd
> > > > > sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod CPU 6
> > > > > Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro
> > > > > X8DTH-i/6/iF/6F/X8DTH
> > > > > RIP: 0010:[<ffffffffa05f3dfb>] [<ffffffffa05f3dfb>]
> > > > > rdma_read_xdr+0x8bb/0xd40 [svcrdma]
> > > > > RSP: 0018:ffff880324c3dbf8 EFLAGS: 00010297
> > > > > RAX: ffff880324dc8000 RBX: 0000000000000001 RCX:
> > > > > ffff880324dd8428
> > > > > RDX: ffff880324dc7ff8 RSI: ffff880324dd8428 RDI:
> > > > > ffffffff81149618
> > > > > RBP: ffff880324c3dd78 R08: 000060f9c0000860 R09:
> > > > > 0000000000000001
> > > > > R10: ffff880324dd8000 R11: 0000000000000001 R12:
> > > > > ffff8806299dcb10
> > > > > R13: 0000000000000003 R14: 0000000000000001 R15:
> > > > > 0000000000000010
> > > > > FS: 0000000000000000(0000) GS:ffff88063fc00000(0000)
> > > > > knlGS:0000000000000000
> > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > > > CR2: ffff880324dc7ff8 CR3: 0000000001a0b000 CR4:
> > > > > 00000000000007e0
> > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > > 0000000000000000
> > > > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > > > > 0000000000000400 Process nfsd (pid: 4744, threadinfo
> > > > > ffff880324c3c000, task
> > > > > ffff880330550000)
> > > > > Stack:
> > > > > ffff880324c3dc78 ffff880324c3dcd8 0000000000000282
> > > > > ffff880631cec000
> > > > > ffff880324dd8000 ffff88062ed33040 0000000124c3dc48
> > > > > ffff880324dd8000
> > > > > ffff88062ed33058 ffff880630ce2b90 ffff8806299e8000
> > > > > 0000000000000003 Call Trace:
> > > > > [<ffffffffa05f466e>] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma]
> > > > > [<ffffffff81086540>] ? try_to_wake_up+0x2f0/0x2f0
> > > > > [<ffffffffa045963f>] svc_recv+0x3ef/0x4b0 [sunrpc]
> > > > > [<ffffffffa0571db0>] ? nfsd_svc+0x740/0x740 [nfsd]
> > > > > [<ffffffffa0571e5d>] nfsd+0xad/0x130 [nfsd] [<ffffffffa0571db0>] ?
> > > > > nfsd_svc+0x740/0x740 [nfsd] [<ffffffff81071df6>]
> > > > > kthread+0xd6/0xe0 [<ffffffff81071d20>] ?
> > > > > __init_kthread_worker+0x70/0x70 [<ffffffff814b462c>]
> ret_from_fork+0x7c/0xb0 [<ffffffff81071d20>] ?
> > > > > __init_kthread_worker+0x70/0x70
> > > > > Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40
> > > > > 0a 00
> > > > > 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00
> > > > > 00 <48> c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a
> > > > > 00 RIP [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma]
> > > > > RSP <ffff880324c3dbf8>
> > > > > CR2: ffff880324dc7ff8
> > > > > ---[ end trace 06d0384754e9609a ]---
> > > > >
> > > > >
> > > > > It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e
> > > > > "nfsd4: cleanup: replace rq_resused count by rq_next_page pointer"
> > > > > is responsible for the crash (it seems to be crashing in
> > > > > net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527)
> > > > > It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and
> > > > > CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet.
> > > > >
> > > > > When I moved to commit
> 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e
> > > > > I was no longer getting the server crashes, so the reset of my
> > > > > tests were done using that point (it is somewhere in the middle
> > > > > of 3.7.0-rc2).
> > > >
> > > > OK, so this part's clearly my fault--I'll work on a patch, but the
> > > > rdma's use of the ->rq_pages array is pretty confusing.
> > >
> > > Does this help?
> > >
> > > They must have added this for some reason, but I'm not seeing how it
> > > could have ever done anything....
> > >
> > > --b.
> > >
> > > diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > > b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > > index 0ce7552..e8f25ec 100644
> > > --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > > +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > > @@ -520,13 +520,6 @@ next_sge:
> > > for (ch_no = 0; &rqstp->rq_pages[ch_no] < rqstp->rq_respages;
> > > ch_no++)
> > > rqstp->rq_pages[ch_no] = NULL;
> > >
> > > - /*
> > > - * Detach res pages. If svc_release sees any it will attempt to
> > > - * put them.
> > > - */
> > > - while (rqstp->rq_next_page != rqstp->rq_respages)
> > > - *(--rqstp->rq_next_page) = NULL;
> > > -
> > > return err;
> > > }
> > >
> >
> > I've been trying to reproduce the problem, but for some reason it does not
> happen anymore.
> > The crash is not happening even without the patch now, but NFS over RDMA
> in 3.8.0-rc5 from net-next is not working.
> > When running server and client in VM with SRIOV, it times out when trying
> to mount and oopses on the client when mount command is interrupted.
> > When running two physical hosts, I get to mount the remote directory, but
> reading or writing fails with IO error.
> >
> > I am still doing some checks - I will post my findings when I will have more
> information.
> >
>
> Any luck reproducing the problem or any results running with the above
> patch?
>
> --b.

Right now I am not being able to reproduce the error - I am starting to suspect that it was a compilation issue.
I do get a crash in VM, but in a different place.

RPC: Registered rdma transport module.
rpcrdma: connection to 192.168.20.210:2050 on mlx4_0, memreg 5 slots 32 ird 16
kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
BUG: unable to handle kernel paging request at ffff88007ae98998
IP: [<ffff88007ae98998>] 0xffff88007ae98997
PGD 180c063 PUD 1fffc067 PMD 7bd7c063 PTE 800000007ae98163
Oops: 0011 [#1] PREEMPT SMP
Modules linked in: xprtrdma netconsole configfs nfsv3 nfs_acl nfsv4 auth_rpcgss nfs lockd ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr autofs4 sunrpc 8021q ipv6 dm_mirror dm_region_hash dm_log uinput joydev microcode pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core virtio_balloon cirrus ttm drm_kms_helper sysimgblt sysfillrect syscopyarea i2c_piix4 button dm_mod ext3 jbd virtio_blk virtio_net virtio_pci virtio_ring virtio uhci_hcd
CPU 1
Pid: 2885, comm: mount.nfs Tainted: G W 3.7.6 #2 Red Hat KVM
RIP: 0010:[<ffff88007ae98998>] [<ffff88007ae98998>] 0xffff88007ae98997
RSP: 0018:ffff88007fd03e38 EFLAGS: 00010282
RAX: 0000000000000004 RBX: ffff88007ae98998 RCX: 0000000000000002
RDX: 0000000000000002 RSI: ffff8800715b8610 RDI: ffff88007a5d41b0
RBP: ffff88007fd03e60 R08: 0000000000000003 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007a5d41b0
R13: ffff88007a5d41d0 R14: 0000000000000282 R15: ffff88007126ba10
FS: 00007f02ac5da700(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff88007ae98998 CR3: 0000000079aa8000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mount.nfs (pid: 2885, threadinfo ffff880071452000, task ffff8800715b8000)
Stack:
ffffffffa037afe0 ffffffffa03813e0 ffffffffa03813e8 0000000000000000
0000000000000030 ffff88007fd03e90 ffffffff81052685 0000000000000040
0000000000000001 ffffffff818040b0 0000000000000006 ffff88007fd03f30
Call Trace:
<IRQ>
[<ffffffffa037afe0>] ? rpcrdma_run_tasklet+0x60/0x90 [xprtrdma]
[<ffffffff81052685>] tasklet_action+0xd5/0xe0
[<ffffffff810530b1>] __do_softirq+0xf1/0x3b0
[<ffffffff814a6c0c>] call_softirq+0x1c/0x30
[<ffffffff81004345>] do_softirq+0x85/0xc0
[<ffffffff81052cee>] irq_exit+0x9e/0xc0
[<ffffffff81003ad1>] do_IRQ+0x61/0xd0
[<ffffffff8149e26f>] common_interrupt+0x6f/0x6f
<EOI>
[<ffffffff8123a090>] ? delay_loop+0x20/0x30
[<ffffffff8123a20c>] __const_udelay+0x2c/0x30
[<ffffffff8106f2f4>] __rcu_read_unlock+0x54/0xa0
[<ffffffff8116f23d>] __d_lookup+0x16d/0x320
[<ffffffff8116f0d0>] ? d_delete+0x190/0x190
[<ffffffff8149b1cb>] ? mutex_lock_nested+0x2db/0x3a0
[<ffffffff8116f420>] d_lookup+0x30/0x50
[<ffffffffa024d5cc>] ? rpc_depopulate.clone.3+0x3c/0x70 [sunrpc]
[<ffffffffa024d350>] __rpc_depopulate.clone.1+0x50/0xd0 [sunrpc]
[<ffffffffa024d600>] ? rpc_depopulate.clone.3+0x70/0x70 [sunrpc]
[<ffffffffa024d5da>] rpc_depopulate.clone.3+0x4a/0x70 [sunrpc]
[<ffffffffa024d600>] ? rpc_depopulate.clone.3+0x70/0x70 [sunrpc]
[<ffffffffa024d615>] rpc_clntdir_depopulate+0x15/0x20 [sunrpc]
[<ffffffffa024c41d>] rpc_rmdir_depopulate+0x4d/0x90 [sunrpc]
[<ffffffffa024c490>] rpc_remove_client_dir+0x10/0x20 [sunrpc]
[<ffffffffa022fb02>] __rpc_clnt_remove_pipedir+0x42/0x60 [sunrpc]
[<ffffffffa022fb51>] rpc_clnt_remove_pipedir+0x31/0x50 [sunrpc]
[<ffffffffa022fc8d>] rpc_free_client+0x11d/0x3f0 [sunrpc]
[<ffffffffa022fb9e>] ? rpc_free_client+0x2e/0x3f0 [sunrpc]
[<ffffffffa022ffc8>] rpc_release_client+0x68/0xa0 [sunrpc]
[<ffffffffa0230512>] rpc_shutdown_client+0x52/0x240 [sunrpc]
[<ffffffffa0230f60>] ? rpc_new_client+0x3a0/0x550 [sunrpc]
[<ffffffffa0230468>] ? rpc_ping+0x58/0x70 [sunrpc]
[<ffffffffa0231646>] rpc_create+0x186/0x1f0 [sunrpc]
[<ffffffff810aaf79>] ? __module_address+0x119/0x160
[<ffffffffa02eb314>] nfs_create_rpc_client+0xc4/0x100 [nfs]
[<ffffffffa035a5c7>] nfs4_init_client+0x77/0x310 [nfsv4]
[<ffffffffa02ec060>] ? nfs_get_client+0x110/0x640 [nfs]
[<ffffffffa02ec424>] nfs_get_client+0x4d4/0x640 [nfs]
[<ffffffffa02ec060>] ? nfs_get_client+0x110/0x640 [nfs]
[<ffffffff810a0475>] ? lockdep_init_map+0x65/0x540
[<ffffffff810a0475>] ? lockdep_init_map+0x65/0x540
[<ffffffffa0358df5>] nfs4_set_client+0x75/0xf0 [nfsv4]
[<ffffffffa023add8>] ? __rpc_init_priority_wait_queue+0xa8/0xf0 [sunrpc]
[<ffffffffa02ea916>] ? nfs_alloc_server+0xf6/0x130 [nfs]
[<ffffffffa03592ab>] nfs4_create_server+0xdb/0x360 [nfsv4]
[<ffffffffa0350623>] nfs4_remote_mount+0x33/0x60 [nfsv4]
[<ffffffff8115a11e>] mount_fs+0x3e/0x1a0
[<ffffffff8111f08b>] ? __alloc_percpu+0xb/0x10
[<ffffffff8117b12d>] vfs_kern_mount+0x6d/0x100
[<ffffffffa0350270>] nfs_do_root_mount+0x90/0xe0 [nfsv4]
[<ffffffffa035056f>] nfs4_try_mount+0x3f/0xc0 [nfsv4]
[<ffffffffa02ecefc>] ? get_nfs_version+0x2c/0x80 [nfs]
[<ffffffffa02f5d2c>] nfs_fs_mount+0x19c/0xc10 [nfs]
[<ffffffffa02f6e60>] ? nfs_clone_super+0x140/0x140 [nfs]
[<ffffffffa02f6c20>] ? nfs_clone_sb_security+0x60/0x60 [nfs]
[<ffffffff8115a11e>] mount_fs+0x3e/0x1a0
[<ffffffff8111f08b>] ? __alloc_percpu+0xb/0x10
[<ffffffff8117b12d>] vfs_kern_mount+0x6d/0x100
[<ffffffff8117b23d>] do_kern_mount+0x4d/0x110
[<ffffffff8105623f>] ? ns_capable+0x3f/0x80
[<ffffffff8117b54c>] do_mount+0x24c/0x800
[<ffffffff81179c7d>] ? copy_mount_options+0xfd/0x1b0
[<ffffffff8117bb8b>] sys_mount+0x8b/0xe0
[<ffffffff814a5a52>] system_call_fastpath+0x16/0x1b
Code: ff ff ff 00 00 00 00 00 00 00 00 00 02 38 a0 ff ff ff ff 01 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 20 00 00 00 00 00 00 00 <a0> 41 5d 7a 00 88 ff ff c8 41 5d 7a 00 88 ff ff 00 00 00 00 00
RIP [<ffff88007ae98998>] 0xffff88007ae98997
RSP <ffff88007fd03e38>
CR2: ffff88007ae98998
---[ end trace 5ff8c4860160ebd8 ]---
Kernel panic - not syncing: Fatal exception in interrupt
panic occurred, switching back to text console

Sorry for the delayed answers, I just had to switch to something with higher priority right now.
I plan to get back to this issue in a week or two.

Yan


2013-02-07 15:54:35

by Yan Burman

[permalink] [raw]
Subject: RE: NFS over RDMA crashing

> -----Original Message-----
> From: Jeff Becker [mailto:[email protected]]
> Sent: Wednesday, February 06, 2013 19:07
> To: Steve Wise
> Cc: Yan Burman; [email protected]; [email protected]; linux-
> [email protected]; Or Gerlitz; Tom Tucker
> Subject: Re: NFS over RDMA crashing
>
> Hi. In case you're interested, I did the NFS/RDMA backports for OFED. I
> tested that NFS/RDMA in OFED 3.5 works on kernel 3.5, and also the RHEL
> 6.3 kernel. However, I did not test it with SRIOV. If you test it
> (OFED-3.5-rc6 was released last week), I'd like to know how it goes. Thanks.
>
> Jeff Becker
>
> On 02/06/2013 07:58 AM, Steve Wise wrote:
> > On 2/6/2013 9:48 AM, Yan Burman wrote:
> >> When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I
> was
> >> no longer getting the server crashes, so the reset of my tests were
> >> done using that point (it is somewhere in the middle of 3.7.0-rc2)
> >>
> > +tom tucker
> >
> > I'd try going back a few kernels, like to 3.5.x and see if things are
> > more stable. If you find a point that works, then git bisect might
> > help identify the regression.
> > --

Vanilla 3.5.7 seem to work OK out of the box.
I will try 3.6 next week + performance comparisons.

Yan

2013-02-06 22:24:36

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS over RDMA crashing

On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote:
> When killing mount command that got stuck:
> -------------------------------------------
>
> BUG: unable to handle kernel paging request at ffff880324dc7ff8
> IP: [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma]
> PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 8000000324dc7161
> Oops: 0003 [#1] PREEMPT SMP
> Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm iw_cm
> ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables
> iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables
> nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock
> target_core_file target_core_pscsi target_core_mod configfs 8021q
> bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net
> macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel
> kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core
> ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core
> mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3 jbd
> sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod
> CPU 6
> Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro
> X8DTH-i/6/iF/6F/X8DTH
> RIP: 0010:[<ffffffffa05f3dfb>] [<ffffffffa05f3dfb>]
> rdma_read_xdr+0x8bb/0xd40 [svcrdma]
> RSP: 0018:ffff880324c3dbf8 EFLAGS: 00010297
> RAX: ffff880324dc8000 RBX: 0000000000000001 RCX: ffff880324dd8428
> RDX: ffff880324dc7ff8 RSI: ffff880324dd8428 RDI: ffffffff81149618
> RBP: ffff880324c3dd78 R08: 000060f9c0000860 R09: 0000000000000001
> R10: ffff880324dd8000 R11: 0000000000000001 R12: ffff8806299dcb10
> R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000010
> FS: 0000000000000000(0000) GS:ffff88063fc00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: ffff880324dc7ff8 CR3: 0000000001a0b000 CR4: 00000000000007e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process nfsd (pid: 4744, threadinfo ffff880324c3c000, task ffff880330550000)
> Stack:
> ffff880324c3dc78 ffff880324c3dcd8 0000000000000282 ffff880631cec000
> ffff880324dd8000 ffff88062ed33040 0000000124c3dc48 ffff880324dd8000
> ffff88062ed33058 ffff880630ce2b90 ffff8806299e8000 0000000000000003
> Call Trace:
> [<ffffffffa05f466e>] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma]
> [<ffffffff81086540>] ? try_to_wake_up+0x2f0/0x2f0
> [<ffffffffa045963f>] svc_recv+0x3ef/0x4b0 [sunrpc]
> [<ffffffffa0571db0>] ? nfsd_svc+0x740/0x740 [nfsd]
> [<ffffffffa0571e5d>] nfsd+0xad/0x130 [nfsd]
> [<ffffffffa0571db0>] ? nfsd_svc+0x740/0x740 [nfsd]
> [<ffffffff81071df6>] kthread+0xd6/0xe0
> [<ffffffff81071d20>] ? __init_kthread_worker+0x70/0x70
> [<ffffffff814b462c>] ret_from_fork+0x7c/0xb0
> [<ffffffff81071d20>] ? __init_kthread_worker+0x70/0x70
> Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00
> 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00
> <48> c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00
> RIP [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma]
> RSP <ffff880324c3dbf8>
> CR2: ffff880324dc7ff8
> ---[ end trace 06d0384754e9609a ]---
>
>
> It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e
> "nfsd4: cleanup: replace rq_resused count by rq_next_page pointer"
> is responsible for the crash (it seems to be crashing in
> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527)
> It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and
> CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet.
>
> When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I
> was no longer getting the server crashes,
> so the reset of my tests were done using that point (it is somewhere
> in the middle of 3.7.0-rc2).

OK, so this part's clearly my fault--I'll work on a patch, but the
rdma's use of the ->rq_pages array is pretty confusing.

--b.

2013-02-07 16:41:35

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS over RDMA crashing

On Wed, Feb 06, 2013 at 05:24:35PM -0500, J. Bruce Fields wrote:
> On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote:
> > When killing mount command that got stuck:
> > -------------------------------------------
> >
> > BUG: unable to handle kernel paging request at ffff880324dc7ff8
> > IP: [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma]
> > PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 8000000324dc7161
> > Oops: 0003 [#1] PREEMPT SMP
> > Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm iw_cm
> > ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables
> > iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables
> > nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock
> > target_core_file target_core_pscsi target_core_mod configfs 8021q
> > bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net
> > macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel
> > kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core
> > ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core
> > mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3 jbd
> > sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod
> > CPU 6
> > Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro
> > X8DTH-i/6/iF/6F/X8DTH
> > RIP: 0010:[<ffffffffa05f3dfb>] [<ffffffffa05f3dfb>]
> > rdma_read_xdr+0x8bb/0xd40 [svcrdma]
> > RSP: 0018:ffff880324c3dbf8 EFLAGS: 00010297
> > RAX: ffff880324dc8000 RBX: 0000000000000001 RCX: ffff880324dd8428
> > RDX: ffff880324dc7ff8 RSI: ffff880324dd8428 RDI: ffffffff81149618
> > RBP: ffff880324c3dd78 R08: 000060f9c0000860 R09: 0000000000000001
> > R10: ffff880324dd8000 R11: 0000000000000001 R12: ffff8806299dcb10
> > R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000010
> > FS: 0000000000000000(0000) GS:ffff88063fc00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: ffff880324dc7ff8 CR3: 0000000001a0b000 CR4: 00000000000007e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process nfsd (pid: 4744, threadinfo ffff880324c3c000, task ffff880330550000)
> > Stack:
> > ffff880324c3dc78 ffff880324c3dcd8 0000000000000282 ffff880631cec000
> > ffff880324dd8000 ffff88062ed33040 0000000124c3dc48 ffff880324dd8000
> > ffff88062ed33058 ffff880630ce2b90 ffff8806299e8000 0000000000000003
> > Call Trace:
> > [<ffffffffa05f466e>] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma]
> > [<ffffffff81086540>] ? try_to_wake_up+0x2f0/0x2f0
> > [<ffffffffa045963f>] svc_recv+0x3ef/0x4b0 [sunrpc]
> > [<ffffffffa0571db0>] ? nfsd_svc+0x740/0x740 [nfsd]
> > [<ffffffffa0571e5d>] nfsd+0xad/0x130 [nfsd]
> > [<ffffffffa0571db0>] ? nfsd_svc+0x740/0x740 [nfsd]
> > [<ffffffff81071df6>] kthread+0xd6/0xe0
> > [<ffffffff81071d20>] ? __init_kthread_worker+0x70/0x70
> > [<ffffffff814b462c>] ret_from_fork+0x7c/0xb0
> > [<ffffffff81071d20>] ? __init_kthread_worker+0x70/0x70
> > Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00
> > 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00
> > <48> c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00
> > RIP [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma]
> > RSP <ffff880324c3dbf8>
> > CR2: ffff880324dc7ff8
> > ---[ end trace 06d0384754e9609a ]---
> >
> >
> > It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e
> > "nfsd4: cleanup: replace rq_resused count by rq_next_page pointer"
> > is responsible for the crash (it seems to be crashing in
> > net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527)
> > It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and
> > CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet.
> >
> > When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I
> > was no longer getting the server crashes,
> > so the reset of my tests were done using that point (it is somewhere
> > in the middle of 3.7.0-rc2).
>
> OK, so this part's clearly my fault--I'll work on a patch, but the
> rdma's use of the ->rq_pages array is pretty confusing.

Does this help?

They must have added this for some reason, but I'm not seeing how it
could have ever done anything....

--b.

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 0ce7552..e8f25ec 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -520,13 +520,6 @@ next_sge:
for (ch_no = 0; &rqstp->rq_pages[ch_no] < rqstp->rq_respages; ch_no++)
rqstp->rq_pages[ch_no] = NULL;

- /*
- * Detach res pages. If svc_release sees any it will attempt to
- * put them.
- */
- while (rqstp->rq_next_page != rqstp->rq_respages)
- *(--rqstp->rq_next_page) = NULL;
-
return err;
}


2013-02-15 15:27:48

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS over RDMA crashing

On Mon, Feb 11, 2013 at 03:19:42PM +0000, Yan Burman wrote:
> > -----Original Message-----
> > From: J. Bruce Fields [mailto:[email protected]]
> > Sent: Thursday, February 07, 2013 18:42
> > To: Yan Burman
> > Cc: [email protected]; [email protected]; linux-
> > [email protected]; Or Gerlitz
> > Subject: Re: NFS over RDMA crashing
> >
> > On Wed, Feb 06, 2013 at 05:24:35PM -0500, J. Bruce Fields wrote:
> > > On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote:
> > > > When killing mount command that got stuck:
> > > > -------------------------------------------
> > > >
> > > > BUG: unable to handle kernel paging request at ffff880324dc7ff8
> > > > IP: [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma] PGD
> > > > 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 8000000324dc7161
> > > > Oops: 0003 [#1] PREEMPT SMP
> > > > Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm
> > iw_cm
> > > > ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables
> > > > iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables
> > > > nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock
> > > > target_core_file target_core_pscsi target_core_mod configfs 8021q
> > > > bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net
> > > > macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel
> > > > kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core
> > > > ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core
> > > > mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3
> > jbd
> > > > sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod CPU 6
> > > > Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro
> > > > X8DTH-i/6/iF/6F/X8DTH
> > > > RIP: 0010:[<ffffffffa05f3dfb>] [<ffffffffa05f3dfb>]
> > > > rdma_read_xdr+0x8bb/0xd40 [svcrdma]
> > > > RSP: 0018:ffff880324c3dbf8 EFLAGS: 00010297
> > > > RAX: ffff880324dc8000 RBX: 0000000000000001 RCX: ffff880324dd8428
> > > > RDX: ffff880324dc7ff8 RSI: ffff880324dd8428 RDI: ffffffff81149618
> > > > RBP: ffff880324c3dd78 R08: 000060f9c0000860 R09: 0000000000000001
> > > > R10: ffff880324dd8000 R11: 0000000000000001 R12: ffff8806299dcb10
> > > > R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000010
> > > > FS: 0000000000000000(0000) GS:ffff88063fc00000(0000)
> > > > knlGS:0000000000000000
> > > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > > CR2: ffff880324dc7ff8 CR3: 0000000001a0b000 CR4: 00000000000007e0
> > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > > Process nfsd (pid: 4744, threadinfo ffff880324c3c000, task
> > > > ffff880330550000)
> > > > Stack:
> > > > ffff880324c3dc78 ffff880324c3dcd8 0000000000000282 ffff880631cec000
> > > > ffff880324dd8000 ffff88062ed33040 0000000124c3dc48 ffff880324dd8000
> > > > ffff88062ed33058 ffff880630ce2b90 ffff8806299e8000 0000000000000003
> > > > Call Trace:
> > > > [<ffffffffa05f466e>] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma]
> > > > [<ffffffff81086540>] ? try_to_wake_up+0x2f0/0x2f0
> > > > [<ffffffffa045963f>] svc_recv+0x3ef/0x4b0 [sunrpc]
> > > > [<ffffffffa0571db0>] ? nfsd_svc+0x740/0x740 [nfsd]
> > > > [<ffffffffa0571e5d>] nfsd+0xad/0x130 [nfsd] [<ffffffffa0571db0>] ?
> > > > nfsd_svc+0x740/0x740 [nfsd] [<ffffffff81071df6>] kthread+0xd6/0xe0
> > > > [<ffffffff81071d20>] ? __init_kthread_worker+0x70/0x70
> > > > [<ffffffff814b462c>] ret_from_fork+0x7c/0xb0 [<ffffffff81071d20>] ?
> > > > __init_kthread_worker+0x70/0x70
> > > > Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00
> > > > 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00
> > > > <48> c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00 RIP
> > > > [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma] RSP
> > > > <ffff880324c3dbf8>
> > > > CR2: ffff880324dc7ff8
> > > > ---[ end trace 06d0384754e9609a ]---
> > > >
> > > >
> > > > It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e
> > > > "nfsd4: cleanup: replace rq_resused count by rq_next_page pointer"
> > > > is responsible for the crash (it seems to be crashing in
> > > > net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527)
> > > > It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and
> > > > CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet.
> > > >
> > > > When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I
> > > > was no longer getting the server crashes, so the reset of my tests
> > > > were done using that point (it is somewhere in the middle of
> > > > 3.7.0-rc2).
> > >
> > > OK, so this part's clearly my fault--I'll work on a patch, but the
> > > rdma's use of the ->rq_pages array is pretty confusing.
> >
> > Does this help?
> >
> > They must have added this for some reason, but I'm not seeing how it could
> > have ever done anything....
> >
> > --b.
> >
> > diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > index 0ce7552..e8f25ec 100644
> > --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > @@ -520,13 +520,6 @@ next_sge:
> > for (ch_no = 0; &rqstp->rq_pages[ch_no] < rqstp->rq_respages;
> > ch_no++)
> > rqstp->rq_pages[ch_no] = NULL;
> >
> > - /*
> > - * Detach res pages. If svc_release sees any it will attempt to
> > - * put them.
> > - */
> > - while (rqstp->rq_next_page != rqstp->rq_respages)
> > - *(--rqstp->rq_next_page) = NULL;
> > -
> > return err;
> > }
> >
>
> I've been trying to reproduce the problem, but for some reason it does not happen anymore.
> The crash is not happening even without the patch now, but NFS over RDMA in 3.8.0-rc5 from net-next is not working.
> When running server and client in VM with SRIOV, it times out when trying to mount and oopses on the client when mount command is interrupted.
> When running two physical hosts, I get to mount the remote directory, but reading or writing fails with IO error.
>
> I am still doing some checks - I will post my findings when I will have more information.
>

Any luck reproducing the problem or any results running with the above
patch?

--b.

Subject: Re: NFS over RDMA crashing

Hi. In case you're interested, I did the NFS/RDMA backports for OFED. I
tested that NFS/RDMA in OFED 3.5 works on kernel 3.5, and also the RHEL
6.3 kernel. However, I did not test it with SRIOV. If you test it
(OFED-3.5-rc6 was released last week), I'd like to know how it goes. Thanks.

Jeff Becker

On 02/06/2013 07:58 AM, Steve Wise wrote:
> On 2/6/2013 9:48 AM, Yan Burman wrote:
>> When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I was
>> no longer getting the server crashes,
>> so the reset of my tests were done using that point (it is somewhere
>> in the middle of 3.7.0-rc2)
>>
> +tom tucker
>
> I'd try going back a few kernels, like to 3.5.x and see if things are
> more stable. If you find a point that works, then git bisect might help
> identify the regression.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2013-02-06 22:28:09

by Steve Wise

[permalink] [raw]
Subject: Re: NFS over RDMA crashing

On 2/6/2013 4:24 PM, J. Bruce Fields wrote:
> On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote:
>> When killing mount command that got stuck:
>> -------------------------------------------
>>
>> BUG: unable to handle kernel paging request at ffff880324dc7ff8
>> IP: [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma]
>> PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 8000000324dc7161
>> Oops: 0003 [#1] PREEMPT SMP
>> Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm iw_cm
>> ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables
>> iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables
>> nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock
>> target_core_file target_core_pscsi target_core_mod configfs 8021q
>> bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net
>> macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel
>> kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core
>> ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core
>> mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3 jbd
>> sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod
>> CPU 6
>> Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro
>> X8DTH-i/6/iF/6F/X8DTH
>> RIP: 0010:[<ffffffffa05f3dfb>] [<ffffffffa05f3dfb>]
>> rdma_read_xdr+0x8bb/0xd40 [svcrdma]
>> RSP: 0018:ffff880324c3dbf8 EFLAGS: 00010297
>> RAX: ffff880324dc8000 RBX: 0000000000000001 RCX: ffff880324dd8428
>> RDX: ffff880324dc7ff8 RSI: ffff880324dd8428 RDI: ffffffff81149618
>> RBP: ffff880324c3dd78 R08: 000060f9c0000860 R09: 0000000000000001
>> R10: ffff880324dd8000 R11: 0000000000000001 R12: ffff8806299dcb10
>> R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000010
>> FS: 0000000000000000(0000) GS:ffff88063fc00000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: ffff880324dc7ff8 CR3: 0000000001a0b000 CR4: 00000000000007e0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Process nfsd (pid: 4744, threadinfo ffff880324c3c000, task ffff880330550000)
>> Stack:
>> ffff880324c3dc78 ffff880324c3dcd8 0000000000000282 ffff880631cec000
>> ffff880324dd8000 ffff88062ed33040 0000000124c3dc48 ffff880324dd8000
>> ffff88062ed33058 ffff880630ce2b90 ffff8806299e8000 0000000000000003
>> Call Trace:
>> [<ffffffffa05f466e>] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma]
>> [<ffffffff81086540>] ? try_to_wake_up+0x2f0/0x2f0
>> [<ffffffffa045963f>] svc_recv+0x3ef/0x4b0 [sunrpc]
>> [<ffffffffa0571db0>] ? nfsd_svc+0x740/0x740 [nfsd]
>> [<ffffffffa0571e5d>] nfsd+0xad/0x130 [nfsd]
>> [<ffffffffa0571db0>] ? nfsd_svc+0x740/0x740 [nfsd]
>> [<ffffffff81071df6>] kthread+0xd6/0xe0
>> [<ffffffff81071d20>] ? __init_kthread_worker+0x70/0x70
>> [<ffffffff814b462c>] ret_from_fork+0x7c/0xb0
>> [<ffffffff81071d20>] ? __init_kthread_worker+0x70/0x70
>> Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00
>> 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00
>> <48> c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00
>> RIP [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma]
>> RSP <ffff880324c3dbf8>
>> CR2: ffff880324dc7ff8
>> ---[ end trace 06d0384754e9609a ]---
>>
>>
>> It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e
>> "nfsd4: cleanup: replace rq_resused count by rq_next_page pointer"
>> is responsible for the crash (it seems to be crashing in
>> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527)
>> It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and
>> CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet.
>>
>> When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I
>> was no longer getting the server crashes,
>> so the reset of my tests were done using that point (it is somewhere
>> in the middle of 3.7.0-rc2).
> OK, so this part's clearly my fault--I'll work on a patch, but the
> rdma's use of the ->rq_pages array is pretty confusing.

Maybe Tom can shed some light?


2014-03-07 17:05:22

by Steve Wise

[permalink] [raw]
Subject: RE: NFS over RDMA crashing

Resurrecting an old issue :)

More inline below...

> -----Original Message-----
> From: [email protected] [mailto:linux-nfs-
> [email protected]] On Behalf Of J. Bruce Fields
> Sent: Thursday, February 07, 2013 10:42 AM
> To: Yan Burman
> Cc: [email protected]; [email protected]; linux-
> [email protected]; Or Gerlitz
> Subject: Re: NFS over RDMA crashing
>
> On Wed, Feb 06, 2013 at 05:24:35PM -0500, J. Bruce Fields wrote:
> > On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote:
> > > When killing mount command that got stuck:
> > > -------------------------------------------
> > >
> > > BUG: unable to handle kernel paging request at ffff880324dc7ff8
> > > IP: [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma]
> > > PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 8000000324dc7161
> > > Oops: 0003 [#1] PREEMPT SMP
> > > Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm
> iw_cm
> > > ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables
> > > iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables
x_tables
> > > nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock
> > > target_core_file target_core_pscsi target_core_mod configfs 8021q
> > > bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net
> > > macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel
> > > kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core
> > > ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad
> ib_core
> > > mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3
> jbd
> > > sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod
> > > CPU 6
> > > Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro
> > > X8DTH-i/6/iF/6F/X8DTH
> > > RIP: 0010:[<ffffffffa05f3dfb>] [<ffffffffa05f3dfb>]
> > > rdma_read_xdr+0x8bb/0xd40 [svcrdma]
> > > RSP: 0018:ffff880324c3dbf8 EFLAGS: 00010297
> > > RAX: ffff880324dc8000 RBX: 0000000000000001 RCX:
> ffff880324dd8428
> > > RDX: ffff880324dc7ff8 RSI: ffff880324dd8428 RDI: ffffffff81149618
> > > RBP: ffff880324c3dd78 R08: 000060f9c0000860 R09:
> 0000000000000001
> > > R10: ffff880324dd8000 R11: 0000000000000001 R12: ffff8806299dcb10
> > > R13: 0000000000000003 R14: 0000000000000001 R15:
> 0000000000000010
> > > FS: 0000000000000000(0000) GS:ffff88063fc00000(0000)
> knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > CR2: ffff880324dc7ff8 CR3: 0000000001a0b000 CR4:
> 00000000000007e0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> > > Process nfsd (pid: 4744, threadinfo ffff880324c3c000, task
> ffff880330550000)
> > > Stack:
> > > ffff880324c3dc78 ffff880324c3dcd8 0000000000000282
> ffff880631cec000
> > > ffff880324dd8000 ffff88062ed33040 0000000124c3dc48
> ffff880324dd8000
> > > ffff88062ed33058 ffff880630ce2b90 ffff8806299e8000
> 0000000000000003
> > > Call Trace:
> > > [<ffffffffa05f466e>] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma]
> > > [<ffffffff81086540>] ? try_to_wake_up+0x2f0/0x2f0
> > > [<ffffffffa045963f>] svc_recv+0x3ef/0x4b0 [sunrpc]
> > > [<ffffffffa0571db0>] ? nfsd_svc+0x740/0x740 [nfsd]
> > > [<ffffffffa0571e5d>] nfsd+0xad/0x130 [nfsd]
> > > [<ffffffffa0571db0>] ? nfsd_svc+0x740/0x740 [nfsd]
> > > [<ffffffff81071df6>] kthread+0xd6/0xe0
> > > [<ffffffff81071d20>] ? __init_kthread_worker+0x70/0x70
> > > [<ffffffff814b462c>] ret_from_fork+0x7c/0xb0
> > > [<ffffffff81071d20>] ? __init_kthread_worker+0x70/0x70
> > > Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a
00
> > > 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00
> > > <48> c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00
> > > RIP [<ffffffffa05f3dfb>] rdma_read_xdr+0x8bb/0xd40 [svcrdma]
> > > RSP <ffff880324c3dbf8>
> > > CR2: ffff880324dc7ff8
> > > ---[ end trace 06d0384754e9609a ]---
> > >
> > >
> > > It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e
> > > "nfsd4: cleanup: replace rq_resused count by rq_next_page pointer"
> > > is responsible for the crash (it seems to be crashing in
> > > net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527)
> > > It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and
> > > CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet.
> > >
> > > When I moved to commit
> 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I
> > > was no longer getting the server crashes,
> > > so the reset of my tests were done using that point (it is
somewhere
> > > in the middle of 3.7.0-rc2).
> >
> > OK, so this part's clearly my fault--I'll work on a patch, but the
> > rdma's use of the ->rq_pages array is pretty confusing.
>
> Does this help?
>
> They must have added this for some reason, but I'm not seeing how it
> could have ever done anything....
>
> --b.
>
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index 0ce7552..e8f25ec 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -520,13 +520,6 @@ next_sge:
> for (ch_no = 0; &rqstp->rq_pages[ch_no] < rqstp->rq_respages;
> ch_no++)
> rqstp->rq_pages[ch_no] = NULL;
>
> - /*
> - * Detach res pages. If svc_release sees any it will attempt to
> - * put them.
> - */
> - while (rqstp->rq_next_page != rqstp->rq_respages)
> - *(--rqstp->rq_next_page) = NULL;
> -
> return err;
> }
>

I can reproduce this server crash readily on a recent net-next tree. I
added the above change, and see a different crash:

[ 192.764773] BUG: unable to handle kernel paging request at
0000100000000000
[ 192.765688] IP: [<ffffffff8113c159>] put_page+0x9/0x50
[ 192.765688] PGD 0
[ 192.765688] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 192.765688] Modules linked in: nfsd lockd nfs_acl exportfs
auth_rpcgss oid_registry svcrdma tg3 ip6table_filter ip6_tables
ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter
ip_tables bridge stp llc autofs4 sunrpc rdma_ucm rdma_cm iw_cm ib_ipoib
ib_cm ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb4 iw_cxgb3 cxgb3 mdio
ib_qib dca mlx4_en ib_mthca vhost_net macvtap macvlan vhost tun
kvm_intel kvm uinput ipmi_si ipmi_msghandler iTCO_wdt
iTCO_vendor_support dcdbas sg microcode pcspkr mlx4_ib ib_sa serio_raw
ib_mad ib_core ib_addr ipv6 ptp pps_core lpc_ich mfd_core i5100_edac
edac_core mlx4_core cxgb4 ext4 jbd2 mbcache sd_mod crc_t10dif
crct10dif_common sr_mod cdrom pata_acpi ata_generic ata_piix radeon ttm
drm_kms_helper drm i2c_algo_bit
[ 192.765688] i2c_core dm_mirror dm_region_hash dm_log dm_mod [last
unloaded: tg3]
[ 192.765688] CPU: 1 PID: 6590 Comm: nfsd Not tainted
3.14.0-rc3-pending+ #5
[ 192.765688] Hardware name: Dell Inc. PowerEdge R300/0TY179, BIOS
1.3.0 08/15/2008
[ 192.765688] task: ffff8800b75c62c0 ti: ffff8801faa4a000 task.ti:
ffff8801faa4a000
[ 192.765688] RIP: 0010:[<ffffffff8113c159>] [<ffffffff8113c159>]
put_page+0x9/0x50
[ 192.765688] RSP: 0018:ffff8801faa4be28 EFLAGS: 00010206
[ 192.765688] RAX: ffff8801fa9542a8 RBX: ffff8801fa954000 RCX:
0000000000000001
[ 192.765688] RDX: ffff8801fa953e10 RSI: 0000000000000200 RDI:
0000100000000000
[ 192.765688] RBP: ffff8801faa4be28 R08: 000000009b8d39b9 R09:
0000000000000017
[ 192.765688] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff8800cb2e7c00
[ 192.765688] R13: ffff8801fa954210 R14: 0000000000000000 R15:
0000000000000000
[ 192.765688] FS: 0000000000000000(0000) GS:ffff88022ec80000(0000)
knlGS:0000000000000000
[ 192.765688] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 192.765688] CR2: 0000100000000000 CR3: 00000000b9a5a000 CR4:
00000000000007e0
[ 192.765688] Stack:
[ 192.765688] ffff8801faa4be58 ffffffffa0881f4e ffff880204dd0e00
ffff8801fa954000
[ 192.765688] ffff880204dd0e00 ffff8800cb2e7c00 ffff8801faa4be88
ffffffffa08825f5
[ 192.765688] ffff8801fa954000 ffff8800b75c62c0 ffffffff81ae5ac0
ffffffffa08cf930
[ 192.765688] Call Trace:
[ 192.765688] [<ffffffffa0881f4e>] svc_xprt_release+0x6e/0xf0 [sunrpc]
[ 192.765688] [<ffffffffa08825f5>] svc_recv+0x165/0x190 [sunrpc]
[ 192.765688] [<ffffffffa08cf930>] ? nfsd_pool_stats_release+0x60/0x60
[nfsd]
[ 192.765688] [<ffffffffa08cf9e5>] nfsd+0xb5/0x160 [nfsd]
[ 192.765688] [<ffffffffa08cf930>] ? nfsd_pool_stats_release+0x60/0x60
[nfsd]
[ 192.765688] [<ffffffff8107471e>] kthread+0xce/0xf0
[ 192.765688] [<ffffffff81074650>] ?
kthread_freezable_should_stop+0x70/0x70
[ 192.765688] [<ffffffff81584e2c>] ret_from_fork+0x7c/0xb0
[ 192.765688] [<ffffffff81074650>] ?
kthread_freezable_should_stop+0x70/0x70
[ 192.765688] Code: 8d 7b 10 e8 ea fa ff ff 48 c7 03 00 00 00 00 48 83
c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66
66 90 <66> f7 07 00 c0 75 32 8b 47 1c 48 8d 57 1c 85 c0 74 1c f0 ff 0a
[ 192.765688] RIP [<ffffffff8113c159>] put_page+0x9/0x50
[ 192.765688] RSP <ffff8801faa4be28>
[ 192.765688] CR2: 0000100000000000
crash>


2014-03-12 13:33:08

by Jeff Layton

[permalink] [raw]
Subject: Re: NFS over RDMA crashing

On Sat, 08 Mar 2014 14:13:44 -0600
Steve Wise <[email protected]> wrote:

> On 3/8/2014 1:20 PM, Steve Wise wrote:
> >
> >> I removed your change and started debugging original crash that
> >> happens on top-o-tree. Seems like rq_next_pages is screwed up. It
> >> should always be >= rq_respages, yes? I added a BUG_ON() to assert
> >> this in rdma_read_xdr() we hit the BUG_ON(). Look
> >>
> >> crash> svc_rqst.rq_next_page 0xffff8800b84e6000
> >> rq_next_page = 0xffff8800b84e6228
> >> crash> svc_rqst.rq_respages 0xffff8800b84e6000
> >> rq_respages = 0xffff8800b84e62a8
> >>
> >> Any ideas Bruce/Tom?
> >>
> >
> > Guys, the patch below seems to fix the problem. Dunno if it is
> > correct though. What do you think?
> >
> > diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > index 0ce7552..6d62411 100644
> > --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
> > sge_no++;
> > }
> > rqstp->rq_respages = &rqstp->rq_pages[sge_no];
> > + rqstp->rq_next_page = rqstp->rq_respages;
> >
> > /* We should never run out of SGE because the limit is defined to
> > * support the max allowed RPC data length
> > @@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct
> > svcxprt_rdma *xprt,
> >
> > /* rq_respages points one past arg pages */
> > rqstp->rq_respages = &rqstp->rq_arg.pages[page_no];
> > + rqstp->rq_next_page = rqstp->rq_respages;
> >
> > /* Create the reply and chunk maps */
> > offset = 0;
> >
> >
>
> While this patch avoids the crashing, it apparently isn't correct...I'm
> getting IO errors reading files over the mount. :)
>

I hit the same oops and tested your patch and it seems to have fixed
that particular panic, but I still see a bunch of other mem corruption
oopses even with it. I'll look more closely at that when I get some
time.

FWIW, I can easily reproduce that by simply doing something like:

$ dd if=/dev/urandom of=/file/on/nfsordma/mount bs=4k count=1

I'm not sure why you're not seeing any panics with your patch in place.
Perhaps it's due to hw differences between our test rigs.

The EIO problem that you're seeing is likely the same client bug that
Chuck recently fixed in this patch:

[PATCH 2/8] SUNRPC: Fix large reads on NFS/RDMA

AIUI, Trond is merging that set for 3.15, so I'd make sure your client
has those patches when testing.

Finally, I also have a forthcoming patch to fix non-page aligned NFS
READs as well. I'm hesitant to send that out though until I can at
least run the connectathon testsuite against this server. The WRITE
oopses sort of prevent that for now...

--
Jeff Layton <[email protected]>

2014-03-12 14:05:34

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS over RDMA crashing


On Mar 12, 2014, at 9:33, Jeff Layton <[email protected]> wrote:

> On Sat, 08 Mar 2014 14:13:44 -0600
> Steve Wise <[email protected]> wrote:
>
>> On 3/8/2014 1:20 PM, Steve Wise wrote:
>>>
>>>> I removed your change and started debugging original crash that
>>>> happens on top-o-tree. Seems like rq_next_pages is screwed up. It
>>>> should always be >= rq_respages, yes? I added a BUG_ON() to assert
>>>> this in rdma_read_xdr() we hit the BUG_ON(). Look
>>>>
>>>> crash> svc_rqst.rq_next_page 0xffff8800b84e6000
>>>> rq_next_page = 0xffff8800b84e6228
>>>> crash> svc_rqst.rq_respages 0xffff8800b84e6000
>>>> rq_respages = 0xffff8800b84e62a8
>>>>
>>>> Any ideas Bruce/Tom?
>>>>
>>>
>>> Guys, the patch below seems to fix the problem. Dunno if it is
>>> correct though. What do you think?
>>>
>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>> index 0ce7552..6d62411 100644
>>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>> @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
>>> sge_no++;
>>> }
>>> rqstp->rq_respages = &rqstp->rq_pages[sge_no];
>>> + rqstp->rq_next_page = rqstp->rq_respages;
>>>
>>> /* We should never run out of SGE because the limit is defined to
>>> * support the max allowed RPC data length
>>> @@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct
>>> svcxprt_rdma *xprt,
>>>
>>> /* rq_respages points one past arg pages */
>>> rqstp->rq_respages = &rqstp->rq_arg.pages[page_no];
>>> + rqstp->rq_next_page = rqstp->rq_respages;
>>>
>>> /* Create the reply and chunk maps */
>>> offset = 0;
>>>
>>>
>>
>> While this patch avoids the crashing, it apparently isn't correct...I'm
>> getting IO errors reading files over the mount. :)
>>
>
> I hit the same oops and tested your patch and it seems to have fixed
> that particular panic, but I still see a bunch of other mem corruption
> oopses even with it. I'll look more closely at that when I get some
> time.
>
> FWIW, I can easily reproduce that by simply doing something like:
>
> $ dd if=/dev/urandom of=/file/on/nfsordma/mount bs=4k count=1
>
> I'm not sure why you're not seeing any panics with your patch in place.
> Perhaps it's due to hw differences between our test rigs.
>
> The EIO problem that you're seeing is likely the same client bug that
> Chuck recently fixed in this patch:
>
> [PATCH 2/8] SUNRPC: Fix large reads on NFS/RDMA
>
> AIUI, Trond is merging that set for 3.15, so I'd make sure your client
> has those patches when testing.
>

Nothing is in my queue yet.

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-07 20:41:07

by Steve Wise

[permalink] [raw]
Subject: RE: NFS over RDMA crashing

> >
> > Does this help?
> >
> > They must have added this for some reason, but I'm not seeing how it
> > could have ever done anything....
> >
> > --b.
> >
> > diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > index 0ce7552..e8f25ec 100644
> > --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > @@ -520,13 +520,6 @@ next_sge:
> > for (ch_no = 0; &rqstp->rq_pages[ch_no] < rqstp->rq_respages;
> > ch_no++)
> > rqstp->rq_pages[ch_no] = NULL;
> >
> > - /*
> > - * Detach res pages. If svc_release sees any it will attempt to
> > - * put them.
> > - */
> > - while (rqstp->rq_next_page != rqstp->rq_respages)
> > - *(--rqstp->rq_next_page) = NULL;
> > -
> > return err;
> > }
> >
>
> I can reproduce this server crash readily on a recent net-next tree.
I
> added the above change, and see a different crash:
>
> [ 192.764773] BUG: unable to handle kernel paging request at
> 0000100000000000
> [ 192.765688] IP: [<ffffffff8113c159>] put_page+0x9/0x50
> [ 192.765688] PGD 0
> [ 192.765688] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> [ 192.765688] Modules linked in: nfsd lockd nfs_acl exportfs
> auth_rpcgss oid_registry svcrdma tg3 ip6table_filter ip6_tables
> ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
> nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter
> ip_tables bridge stp llc autofs4 sunrpc rdma_ucm rdma_cm iw_cm
ib_ipoib
> ib_cm ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb4 iw_cxgb3 cxgb3 mdio
> ib_qib dca mlx4_en ib_mthca vhost_net macvtap macvlan vhost tun
> kvm_intel kvm uinput ipmi_si ipmi_msghandler iTCO_wdt
> iTCO_vendor_support dcdbas sg microcode pcspkr mlx4_ib ib_sa serio_raw
> ib_mad ib_core ib_addr ipv6 ptp pps_core lpc_ich mfd_core i5100_edac
> edac_core mlx4_core cxgb4 ext4 jbd2 mbcache sd_mod crc_t10dif
> crct10dif_common sr_mod cdrom pata_acpi ata_generic ata_piix radeon
> ttm
> drm_kms_helper drm i2c_algo_bit
> [ 192.765688] i2c_core dm_mirror dm_region_hash dm_log dm_mod
> [last
> unloaded: tg3]
> [ 192.765688] CPU: 1 PID: 6590 Comm: nfsd Not tainted
> 3.14.0-rc3-pending+ #5
> [ 192.765688] Hardware name: Dell Inc. PowerEdge R300/0TY179, BIOS
> 1.3.0 08/15/2008
> [ 192.765688] task: ffff8800b75c62c0 ti: ffff8801faa4a000 task.ti:
> ffff8801faa4a000
> [ 192.765688] RIP: 0010:[<ffffffff8113c159>] [<ffffffff8113c159>]
> put_page+0x9/0x50
> [ 192.765688] RSP: 0018:ffff8801faa4be28 EFLAGS: 00010206
> [ 192.765688] RAX: ffff8801fa9542a8 RBX: ffff8801fa954000 RCX:
> 0000000000000001
> [ 192.765688] RDX: ffff8801fa953e10 RSI: 0000000000000200 RDI:
> 0000100000000000
> [ 192.765688] RBP: ffff8801faa4be28 R08: 000000009b8d39b9 R09:
> 0000000000000017
> [ 192.765688] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffff8800cb2e7c00
> [ 192.765688] R13: ffff8801fa954210 R14: 0000000000000000 R15:
> 0000000000000000
> [ 192.765688] FS: 0000000000000000(0000) GS:ffff88022ec80000(0000)
> knlGS:0000000000000000
> [ 192.765688] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 192.765688] CR2: 0000100000000000 CR3: 00000000b9a5a000 CR4:
> 00000000000007e0
> [ 192.765688] Stack:
> [ 192.765688] ffff8801faa4be58 ffffffffa0881f4e ffff880204dd0e00
> ffff8801fa954000
> [ 192.765688] ffff880204dd0e00 ffff8800cb2e7c00 ffff8801faa4be88
> ffffffffa08825f5
> [ 192.765688] ffff8801fa954000 ffff8800b75c62c0 ffffffff81ae5ac0
> ffffffffa08cf930
> [ 192.765688] Call Trace:
> [ 192.765688] [<ffffffffa0881f4e>] svc_xprt_release+0x6e/0xf0
[sunrpc]
> [ 192.765688] [<ffffffffa08825f5>] svc_recv+0x165/0x190 [sunrpc]
> [ 192.765688] [<ffffffffa08cf930>] ?
nfsd_pool_stats_release+0x60/0x60
> [nfsd]
> [ 192.765688] [<ffffffffa08cf9e5>] nfsd+0xb5/0x160 [nfsd]
> [ 192.765688] [<ffffffffa08cf930>] ?
nfsd_pool_stats_release+0x60/0x60
> [nfsd]
> [ 192.765688] [<ffffffff8107471e>] kthread+0xce/0xf0
> [ 192.765688] [<ffffffff81074650>] ?
> kthread_freezable_should_stop+0x70/0x70
> [ 192.765688] [<ffffffff81584e2c>] ret_from_fork+0x7c/0xb0
> [ 192.765688] [<ffffffff81074650>] ?
> kthread_freezable_should_stop+0x70/0x70
> [ 192.765688] Code: 8d 7b 10 e8 ea fa ff ff 48 c7 03 00 00 00 00 48
83
> c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66
66
> 66 90 <66> f7 07 00 c0 75 32 8b 47 1c 48 8d 57 1c 85 c0 74 1c f0 ff 0a
> [ 192.765688] RIP [<ffffffff8113c159>] put_page+0x9/0x50
> [ 192.765688] RSP <ffff8801faa4be28>
> [ 192.765688] CR2: 0000100000000000
> crash>

This new crash is here calling put_page() on garbage I guess:

static inline void svc_free_res_pages(struct svc_rqst *rqstp)
{
while (rqstp->rq_next_page != rqstp->rq_respages) {
struct page **pp = --rqstp->rq_next_page;
if (*pp) {
put_page(*pp);
*pp = NULL;
}
}
}



2014-03-08 17:11:13

by Steve Wise

[permalink] [raw]
Subject: Re: NFS over RDMA crashing

On 3/7/2014 2:41 PM, Steve Wise wrote:
>>> Does this help?
>>>
>>> They must have added this for some reason, but I'm not seeing how it
>>> could have ever done anything....
>>>
>>> --b.
>>>
>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>> index 0ce7552..e8f25ec 100644
>>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>> @@ -520,13 +520,6 @@ next_sge:
>>> for (ch_no = 0; &rqstp->rq_pages[ch_no] < rqstp->rq_respages;
>>> ch_no++)
>>> rqstp->rq_pages[ch_no] = NULL;
>>>
>>> - /*
>>> - * Detach res pages. If svc_release sees any it will attempt to
>>> - * put them.
>>> - */
>>> - while (rqstp->rq_next_page != rqstp->rq_respages)
>>> - *(--rqstp->rq_next_page) = NULL;
>>> -
>>> return err;
>>> }
>>>
>> I can reproduce this server crash readily on a recent net-next tree.
> I
>> added the above change, and see a different crash:
>>
>> [ 192.764773] BUG: unable to handle kernel paging request at
>> 0000100000000000
>> [ 192.765688] IP: [<ffffffff8113c159>] put_page+0x9/0x50
>> [ 192.765688] PGD 0
>> [ 192.765688] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
>> [ 192.765688] Modules linked in: nfsd lockd nfs_acl exportfs
>> auth_rpcgss oid_registry svcrdma tg3 ip6table_filter ip6_tables
>> ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
>> nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter
>> ip_tables bridge stp llc autofs4 sunrpc rdma_ucm rdma_cm iw_cm
> ib_ipoib
>> ib_cm ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb4 iw_cxgb3 cxgb3 mdio
>> ib_qib dca mlx4_en ib_mthca vhost_net macvtap macvlan vhost tun
>> kvm_intel kvm uinput ipmi_si ipmi_msghandler iTCO_wdt
>> iTCO_vendor_support dcdbas sg microcode pcspkr mlx4_ib ib_sa serio_raw
>> ib_mad ib_core ib_addr ipv6 ptp pps_core lpc_ich mfd_core i5100_edac
>> edac_core mlx4_core cxgb4 ext4 jbd2 mbcache sd_mod crc_t10dif
>> crct10dif_common sr_mod cdrom pata_acpi ata_generic ata_piix radeon
>> ttm
>> drm_kms_helper drm i2c_algo_bit
>> [ 192.765688] i2c_core dm_mirror dm_region_hash dm_log dm_mod
>> [last
>> unloaded: tg3]
>> [ 192.765688] CPU: 1 PID: 6590 Comm: nfsd Not tainted
>> 3.14.0-rc3-pending+ #5
>> [ 192.765688] Hardware name: Dell Inc. PowerEdge R300/0TY179, BIOS
>> 1.3.0 08/15/2008
>> [ 192.765688] task: ffff8800b75c62c0 ti: ffff8801faa4a000 task.ti:
>> ffff8801faa4a000
>> [ 192.765688] RIP: 0010:[<ffffffff8113c159>] [<ffffffff8113c159>]
>> put_page+0x9/0x50
>> [ 192.765688] RSP: 0018:ffff8801faa4be28 EFLAGS: 00010206
>> [ 192.765688] RAX: ffff8801fa9542a8 RBX: ffff8801fa954000 RCX:
>> 0000000000000001
>> [ 192.765688] RDX: ffff8801fa953e10 RSI: 0000000000000200 RDI:
>> 0000100000000000
>> [ 192.765688] RBP: ffff8801faa4be28 R08: 000000009b8d39b9 R09:
>> 0000000000000017
>> [ 192.765688] R10: 0000000000000000 R11: 0000000000000000 R12:
>> ffff8800cb2e7c00
>> [ 192.765688] R13: ffff8801fa954210 R14: 0000000000000000 R15:
>> 0000000000000000
>> [ 192.765688] FS: 0000000000000000(0000) GS:ffff88022ec80000(0000)
>> knlGS:0000000000000000
>> [ 192.765688] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [ 192.765688] CR2: 0000100000000000 CR3: 00000000b9a5a000 CR4:
>> 00000000000007e0
>> [ 192.765688] Stack:
>> [ 192.765688] ffff8801faa4be58 ffffffffa0881f4e ffff880204dd0e00
>> ffff8801fa954000
>> [ 192.765688] ffff880204dd0e00 ffff8800cb2e7c00 ffff8801faa4be88
>> ffffffffa08825f5
>> [ 192.765688] ffff8801fa954000 ffff8800b75c62c0 ffffffff81ae5ac0
>> ffffffffa08cf930
>> [ 192.765688] Call Trace:
>> [ 192.765688] [<ffffffffa0881f4e>] svc_xprt_release+0x6e/0xf0
> [sunrpc]
>> [ 192.765688] [<ffffffffa08825f5>] svc_recv+0x165/0x190 [sunrpc]
>> [ 192.765688] [<ffffffffa08cf930>] ?
> nfsd_pool_stats_release+0x60/0x60
>> [nfsd]
>> [ 192.765688] [<ffffffffa08cf9e5>] nfsd+0xb5/0x160 [nfsd]
>> [ 192.765688] [<ffffffffa08cf930>] ?
> nfsd_pool_stats_release+0x60/0x60
>> [nfsd]
>> [ 192.765688] [<ffffffff8107471e>] kthread+0xce/0xf0
>> [ 192.765688] [<ffffffff81074650>] ?
>> kthread_freezable_should_stop+0x70/0x70
>> [ 192.765688] [<ffffffff81584e2c>] ret_from_fork+0x7c/0xb0
>> [ 192.765688] [<ffffffff81074650>] ?
>> kthread_freezable_should_stop+0x70/0x70
>> [ 192.765688] Code: 8d 7b 10 e8 ea fa ff ff 48 c7 03 00 00 00 00 48
> 83
>> c4 08 5b c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66
> 66
>> 66 90 <66> f7 07 00 c0 75 32 8b 47 1c 48 8d 57 1c 85 c0 74 1c f0 ff 0a
>> [ 192.765688] RIP [<ffffffff8113c159>] put_page+0x9/0x50
>> [ 192.765688] RSP <ffff8801faa4be28>
>> [ 192.765688] CR2: 0000100000000000
>> crash>
> This new crash is here calling put_page() on garbage I guess:
>
> static inline void svc_free_res_pages(struct svc_rqst *rqstp)
> {
> while (rqstp->rq_next_page != rqstp->rq_respages) {
> struct page **pp = --rqstp->rq_next_page;
> if (*pp) {
> put_page(*pp);
> *pp = NULL;
> }
> }
> }
>

I removed your change and started debugging original crash that happens
on top-o-tree. Seems like rq_next_pages is screwed up. It should
always be >= rq_respages, yes? I added a BUG_ON() to assert this in
rdma_read_xdr() we hit the BUG_ON(). Look

crash> svc_rqst.rq_next_page 0xffff8800b84e6000
rq_next_page = 0xffff8800b84e6228
crash> svc_rqst.rq_respages 0xffff8800b84e6000
rq_respages = 0xffff8800b84e62a8

Any ideas Bruce/Tom?

Here are the BUG_ON()s I added:

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 04e7632..ab91905 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -339,6 +339,7 @@ xdr_ressize_check(struct svc_rqst *rqstp, __be32 *p)

static inline void svc_free_res_pages(struct svc_rqst *rqstp)
{
+ BUG_ON((unsigned long)rqstp->rq_next_page < (unsigned
long)rqstp->rq_respages);
while (rqstp->rq_next_page != rqstp->rq_respages) {
struct page **pp = --rqstp->rq_next_page;
if (*pp) {
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 0ce7552..fa49d40 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -524,6 +524,7 @@ next_sge:
* Detach res pages. If svc_release sees any it will attempt to
* put them.
*/
+ BUG_ON((unsigned long)rqstp->rq_next_page < (unsigned
long)rqstp->rq_respages);
while (rqstp->rq_next_page != rqstp->rq_respages)
*(--rqstp->rq_next_page) = NULL;


Here's the stack:

Backtrace:
# 0: [RSP: 0xffff88020540d970, RIP: 0xffffffff8103c994] machine_kexec
(struct kimage * arg = 0xffff880223b26c00)
# 1: [RSP: 0xffff88020540da40, RIP: 0xffffffff810d1e98] crash_kexec
(struct pt_regs * arg = 0xffff88020540dba8)
# 2: [RSP: 0xffff88020540da70, RIP: 0xffffffff8157d650] oops_end
(unsigned long arg = 0x296, struct pt_regs * arg = 0xffff88020540dba8,
int arg = 0x1)
# 3: [RSP: 0xffff88020540daa0, RIP: 0xffffffff810072fb] die (const
char * arg = 0xffffffff817d2d0a, struct pt_regs * arg =
0xffff88020540dba8, long arg = 0x0)
# 4: [RSP: 0xffff88020540db00, RIP: 0xffffffff8157d19b] do_trap (int
arg = 0x6, int arg = 0x4, char * arg = 0xffffffff817d2d0a, struct
pt_regs * arg = 0xffff88020540dba8, long arg = 0x0, siginfo_t * arg =
0xffff88020540db08)
# 5: [RSP: 0xffff88020540dba0, RIP: 0xffffffff81004555] do_invalid_op
(struct pt_regs * arg = 0xffff88020540dba8, long arg = 0x0)
# 6: [RSP: 0xffff88020540dc50, RIP: 0xffffffff815863c8] invalid_op (void)
# 7: [RSP: 0xffff88020540ddc0, RIP: 0xffffffffa0521b93] rdma_read_xdr
(struct svcxprt_rdma * arg = 0xffff88022273bc00, struct rpcrdma_msg *
arg = 0xffff88020b6a6000, struct svc_rqst * arg = 0xffff8800b84e6000,
struct svc_rdma_op_ctxt * arg = 0xffff88022397e300)
# 8: [RSP: 0xffff88020540de20, RIP: 0xffffffffa0521da4]
svc_rdma_recvfrom (struct svc_rqst * arg = 0xffff8800b84e6000)
# 9: [RSP: 0xffff88020540de60, RIP: 0xffffffffa0881bca]
svc_handle_xprt (struct svc_rqst * arg = 0xffff8800b84e6000, struct
svc_xprt * arg = unknown)
# 10: [RSP: 0xffff88020540de90, RIP: 0xffffffffa0882577] svc_recv
(struct svc_rqst * arg = 0xffff8800b84e6000, long arg = 0x36ee80)
crash> svc_rqst.rq_next_page 0xffff8800b84e6000
rq_next_page = 0xffff8800b84e6228
crash> svc_rqst.rq_respages 0xffff8800b84e6000
rq_respages = 0xffff8800b84e62a8



2014-03-08 20:13:45

by Steve Wise

[permalink] [raw]
Subject: Re: NFS over RDMA crashing

On 3/8/2014 1:20 PM, Steve Wise wrote:
>
>> I removed your change and started debugging original crash that
>> happens on top-o-tree. Seems like rq_next_pages is screwed up. It
>> should always be >= rq_respages, yes? I added a BUG_ON() to assert
>> this in rdma_read_xdr() we hit the BUG_ON(). Look
>>
>> crash> svc_rqst.rq_next_page 0xffff8800b84e6000
>> rq_next_page = 0xffff8800b84e6228
>> crash> svc_rqst.rq_respages 0xffff8800b84e6000
>> rq_respages = 0xffff8800b84e62a8
>>
>> Any ideas Bruce/Tom?
>>
>
> Guys, the patch below seems to fix the problem. Dunno if it is
> correct though. What do you think?
>
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index 0ce7552..6d62411 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
> sge_no++;
> }
> rqstp->rq_respages = &rqstp->rq_pages[sge_no];
> + rqstp->rq_next_page = rqstp->rq_respages;
>
> /* We should never run out of SGE because the limit is defined to
> * support the max allowed RPC data length
> @@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct
> svcxprt_rdma *xprt,
>
> /* rq_respages points one past arg pages */
> rqstp->rq_respages = &rqstp->rq_arg.pages[page_no];
> + rqstp->rq_next_page = rqstp->rq_respages;
>
> /* Create the reply and chunk maps */
> offset = 0;
>
>

While this patch avoids the crashing, it apparently isn't correct...I'm
getting IO errors reading files over the mount. :)


2014-03-12 14:28:23

by Jeff Layton

[permalink] [raw]
Subject: Re: NFS over RDMA crashing

On Wed, 12 Mar 2014 10:05:24 -0400
Trond Myklebust <[email protected]> wrote:

>
> On Mar 12, 2014, at 9:33, Jeff Layton <[email protected]> wrote:
>
> > On Sat, 08 Mar 2014 14:13:44 -0600
> > Steve Wise <[email protected]> wrote:
> >
> >> On 3/8/2014 1:20 PM, Steve Wise wrote:
> >>>
> >>>> I removed your change and started debugging original crash that
> >>>> happens on top-o-tree. Seems like rq_next_pages is screwed
> >>>> up. It should always be >= rq_respages, yes? I added a
> >>>> BUG_ON() to assert this in rdma_read_xdr() we hit the BUG_ON().
> >>>> Look
> >>>>
> >>>> crash> svc_rqst.rq_next_page 0xffff8800b84e6000
> >>>> rq_next_page = 0xffff8800b84e6228
> >>>> crash> svc_rqst.rq_respages 0xffff8800b84e6000
> >>>> rq_respages = 0xffff8800b84e62a8
> >>>>
> >>>> Any ideas Bruce/Tom?
> >>>>
> >>>
> >>> Guys, the patch below seems to fix the problem. Dunno if it is
> >>> correct though. What do you think?
> >>>
> >>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >>> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >>> index 0ce7552..6d62411 100644
> >>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >>> @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst
> >>> *rqstp, sge_no++;
> >>> }
> >>> rqstp->rq_respages = &rqstp->rq_pages[sge_no];
> >>> + rqstp->rq_next_page = rqstp->rq_respages;
> >>>
> >>> /* We should never run out of SGE because the limit is
> >>> defined to
> >>> * support the max allowed RPC data length
> >>> @@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct
> >>> svcxprt_rdma *xprt,
> >>>
> >>> /* rq_respages points one past arg pages */
> >>> rqstp->rq_respages = &rqstp->rq_arg.pages[page_no];
> >>> + rqstp->rq_next_page = rqstp->rq_respages;
> >>>
> >>> /* Create the reply and chunk maps */
> >>> offset = 0;
> >>>
> >>>
> >>
> >> While this patch avoids the crashing, it apparently isn't
> >> correct...I'm getting IO errors reading files over the mount. :)
> >>
> >
> > I hit the same oops and tested your patch and it seems to have fixed
> > that particular panic, but I still see a bunch of other mem
> > corruption oopses even with it. I'll look more closely at that when
> > I get some time.
> >
> > FWIW, I can easily reproduce that by simply doing something like:
> >
> > $ dd if=/dev/urandom of=/file/on/nfsordma/mount bs=4k count=1
> >
> > I'm not sure why you're not seeing any panics with your patch in
> > place. Perhaps it's due to hw differences between our test rigs.
> >
> > The EIO problem that you're seeing is likely the same client bug
> > that Chuck recently fixed in this patch:
> >
> > [PATCH 2/8] SUNRPC: Fix large reads on NFS/RDMA
> >
> > AIUI, Trond is merging that set for 3.15, so I'd make sure your
> > client has those patches when testing.
> >
>
> Nothing is in my queue yet.
>

Doh! Any reason not to merge that set from Chuck? They do fix a couple
of nasty client bugs...

--
Jeff Layton <[email protected]>

2014-03-12 14:22:05

by Tom Tucker

[permalink] [raw]
Subject: Re: NFS over RDMA crashing

Hi Trond,

I think this patch is still 'off-by-one'. We'll take a look at this today.

Thanks,
Tom

On 3/12/14 9:05 AM, Trond Myklebust wrote:
> On Mar 12, 2014, at 9:33, Jeff Layton <[email protected]> wrote:
>
>> On Sat, 08 Mar 2014 14:13:44 -0600
>> Steve Wise <[email protected]> wrote:
>>
>>> On 3/8/2014 1:20 PM, Steve Wise wrote:
>>>>> I removed your change and started debugging original crash that
>>>>> happens on top-o-tree. Seems like rq_next_pages is screwed up. It
>>>>> should always be >= rq_respages, yes? I added a BUG_ON() to assert
>>>>> this in rdma_read_xdr() we hit the BUG_ON(). Look
>>>>>
>>>>> crash> svc_rqst.rq_next_page 0xffff8800b84e6000
>>>>> rq_next_page = 0xffff8800b84e6228
>>>>> crash> svc_rqst.rq_respages 0xffff8800b84e6000
>>>>> rq_respages = 0xffff8800b84e62a8
>>>>>
>>>>> Any ideas Bruce/Tom?
>>>>>
>>>> Guys, the patch below seems to fix the problem. Dunno if it is
>>>> correct though. What do you think?
>>>>
>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>> index 0ce7552..6d62411 100644
>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>> @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
>>>> sge_no++;
>>>> }
>>>> rqstp->rq_respages = &rqstp->rq_pages[sge_no];
>>>> + rqstp->rq_next_page = rqstp->rq_respages;
>>>>
>>>> /* We should never run out of SGE because the limit is defined to
>>>> * support the max allowed RPC data length
>>>> @@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct
>>>> svcxprt_rdma *xprt,
>>>>
>>>> /* rq_respages points one past arg pages */
>>>> rqstp->rq_respages = &rqstp->rq_arg.pages[page_no];
>>>> + rqstp->rq_next_page = rqstp->rq_respages;
>>>>
>>>> /* Create the reply and chunk maps */
>>>> offset = 0;
>>>>
>>>>
>>> While this patch avoids the crashing, it apparently isn't correct...I'm
>>> getting IO errors reading files over the mount. :)
>>>
>> I hit the same oops and tested your patch and it seems to have fixed
>> that particular panic, but I still see a bunch of other mem corruption
>> oopses even with it. I'll look more closely at that when I get some
>> time.
>>
>> FWIW, I can easily reproduce that by simply doing something like:
>>
>> $ dd if=/dev/urandom of=/file/on/nfsordma/mount bs=4k count=1
>>
>> I'm not sure why you're not seeing any panics with your patch in place.
>> Perhaps it's due to hw differences between our test rigs.
>>
>> The EIO problem that you're seeing is likely the same client bug that
>> Chuck recently fixed in this patch:
>>
>> [PATCH 2/8] SUNRPC: Fix large reads on NFS/RDMA
>>
>> AIUI, Trond is merging that set for 3.15, so I'd make sure your client
>> has those patches when testing.
>>
> Nothing is in my queue yet.
>
> _________________________________
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> [email protected]
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2014-03-12 15:04:21

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS over RDMA crashing


On Mar 12, 2014, at 10:28, Jeffrey Layton <[email protected]> wrote:

> On Wed, 12 Mar 2014 10:05:24 -0400
> Trond Myklebust <[email protected]> wrote:
>
>>
>> On Mar 12, 2014, at 9:33, Jeff Layton <[email protected]> wrote:
>>
>>> On Sat, 08 Mar 2014 14:13:44 -0600
>>> Steve Wise <[email protected]> wrote:
>>>
>>>> On 3/8/2014 1:20 PM, Steve Wise wrote:
>>>>>
>>>>>> I removed your change and started debugging original crash that
>>>>>> happens on top-o-tree. Seems like rq_next_pages is screwed
>>>>>> up. It should always be >= rq_respages, yes? I added a
>>>>>> BUG_ON() to assert this in rdma_read_xdr() we hit the BUG_ON().
>>>>>> Look
>>>>>>
>>>>>> crash> svc_rqst.rq_next_page 0xffff8800b84e6000
>>>>>> rq_next_page = 0xffff8800b84e6228
>>>>>> crash> svc_rqst.rq_respages 0xffff8800b84e6000
>>>>>> rq_respages = 0xffff8800b84e62a8
>>>>>>
>>>>>> Any ideas Bruce/Tom?
>>>>>>
>>>>>
>>>>> Guys, the patch below seems to fix the problem. Dunno if it is
>>>>> correct though. What do you think?
>>>>>
>>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>>> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>>> index 0ce7552..6d62411 100644
>>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>>> @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst
>>>>> *rqstp, sge_no++;
>>>>> }
>>>>> rqstp->rq_respages = &rqstp->rq_pages[sge_no];
>>>>> + rqstp->rq_next_page = rqstp->rq_respages;
>>>>>
>>>>> /* We should never run out of SGE because the limit is
>>>>> defined to
>>>>> * support the max allowed RPC data length
>>>>> @@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct
>>>>> svcxprt_rdma *xprt,
>>>>>
>>>>> /* rq_respages points one past arg pages */
>>>>> rqstp->rq_respages = &rqstp->rq_arg.pages[page_no];
>>>>> + rqstp->rq_next_page = rqstp->rq_respages;
>>>>>
>>>>> /* Create the reply and chunk maps */
>>>>> offset = 0;
>>>>>
>>>>>
>>>>
>>>> While this patch avoids the crashing, it apparently isn't
>>>> correct...I'm getting IO errors reading files over the mount. :)
>>>>
>>>
>>> I hit the same oops and tested your patch and it seems to have fixed
>>> that particular panic, but I still see a bunch of other mem
>>> corruption oopses even with it. I'll look more closely at that when
>>> I get some time.
>>>
>>> FWIW, I can easily reproduce that by simply doing something like:
>>>
>>> $ dd if=/dev/urandom of=/file/on/nfsordma/mount bs=4k count=1
>>>
>>> I'm not sure why you're not seeing any panics with your patch in
>>> place. Perhaps it's due to hw differences between our test rigs.
>>>
>>> The EIO problem that you're seeing is likely the same client bug
>>> that Chuck recently fixed in this patch:
>>>
>>> [PATCH 2/8] SUNRPC: Fix large reads on NFS/RDMA
>>>
>>> AIUI, Trond is merging that set for 3.15, so I'd make sure your
>>> client has those patches when testing.
>>>
>>
>> Nothing is in my queue yet.
>>
>
> Doh! Any reason not to merge that set from Chuck? They do fix a couple
> of nasty client bugs?
>

Most of them are one-line debugging dprintks which I do not intend to apply.

One of them confuses a readdir optimisation with a bugfix; at the very least the patch comments need changing.
That leaves 2 that can go in, however as they are clearly insufficient to make RDMA safe for general use, they certainly do not warrant a stable@ label. The workaround for the Oopses is simple: use TCP.

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-12 15:29:24

by Jeff Layton

[permalink] [raw]
Subject: Re: NFS over RDMA crashing

On Wed, 12 Mar 2014 11:03:52 -0400
Trond Myklebust <[email protected]> wrote:

>
> On Mar 12, 2014, at 10:28, Jeffrey Layton <[email protected]> wrote:
>
> > On Wed, 12 Mar 2014 10:05:24 -0400
> > Trond Myklebust <[email protected]> wrote:
> >
> >>
> >> On Mar 12, 2014, at 9:33, Jeff Layton <[email protected]> wrote:
> >>
> >>> On Sat, 08 Mar 2014 14:13:44 -0600
> >>> Steve Wise <[email protected]> wrote:
> >>>
> >>>> On 3/8/2014 1:20 PM, Steve Wise wrote:
> >>>>>
> >>>>>> I removed your change and started debugging original crash
> >>>>>> that happens on top-o-tree. Seems like rq_next_pages is
> >>>>>> screwed up. It should always be >= rq_respages, yes? I added
> >>>>>> a BUG_ON() to assert this in rdma_read_xdr() we hit the
> >>>>>> BUG_ON(). Look
> >>>>>>
> >>>>>> crash> svc_rqst.rq_next_page 0xffff8800b84e6000
> >>>>>> rq_next_page = 0xffff8800b84e6228
> >>>>>> crash> svc_rqst.rq_respages 0xffff8800b84e6000
> >>>>>> rq_respages = 0xffff8800b84e62a8
> >>>>>>
> >>>>>> Any ideas Bruce/Tom?
> >>>>>>
> >>>>>
> >>>>> Guys, the patch below seems to fix the problem. Dunno if it is
> >>>>> correct though. What do you think?
> >>>>>
> >>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >>>>> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >>>>> index 0ce7552..6d62411 100644
> >>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >>>>> @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst
> >>>>> *rqstp, sge_no++;
> >>>>> }
> >>>>> rqstp->rq_respages = &rqstp->rq_pages[sge_no];
> >>>>> + rqstp->rq_next_page = rqstp->rq_respages;
> >>>>>
> >>>>> /* We should never run out of SGE because the limit is
> >>>>> defined to
> >>>>> * support the max allowed RPC data length
> >>>>> @@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct
> >>>>> svcxprt_rdma *xprt,
> >>>>>
> >>>>> /* rq_respages points one past arg pages */
> >>>>> rqstp->rq_respages = &rqstp->rq_arg.pages[page_no];
> >>>>> + rqstp->rq_next_page = rqstp->rq_respages;
> >>>>>
> >>>>> /* Create the reply and chunk maps */
> >>>>> offset = 0;
> >>>>>
> >>>>>
> >>>>
> >>>> While this patch avoids the crashing, it apparently isn't
> >>>> correct...I'm getting IO errors reading files over the mount. :)
> >>>>
> >>>
> >>> I hit the same oops and tested your patch and it seems to have
> >>> fixed that particular panic, but I still see a bunch of other mem
> >>> corruption oopses even with it. I'll look more closely at that
> >>> when I get some time.
> >>>
> >>> FWIW, I can easily reproduce that by simply doing something like:
> >>>
> >>> $ dd if=/dev/urandom of=/file/on/nfsordma/mount bs=4k count=1
> >>>
> >>> I'm not sure why you're not seeing any panics with your patch in
> >>> place. Perhaps it's due to hw differences between our test rigs.
> >>>
> >>> The EIO problem that you're seeing is likely the same client bug
> >>> that Chuck recently fixed in this patch:
> >>>
> >>> [PATCH 2/8] SUNRPC: Fix large reads on NFS/RDMA
> >>>
> >>> AIUI, Trond is merging that set for 3.15, so I'd make sure your
> >>> client has those patches when testing.
> >>>
> >>
> >> Nothing is in my queue yet.
> >>
> >
> > Doh! Any reason not to merge that set from Chuck? They do fix a
> > couple of nasty client bugsā€¦
> >
>
> Most of them are one-line debugging dprintks which I do not intend to
> apply.
>

Fair enough. Those are certainly not necessary, but some of them clean
up existing printks and probably do need to go in. That said, debugging
this stuff is *really* difficult so having extra debug printks in place
seems like a good thing (unless you're arguing for moving wholesale to
tracepoints instead).

> One of them confuses a readdir optimisation with a bugfix; at the
> very least the patch comments need changing.

I'll leave that to Chuck to comment on. I had the impression that it
was a bugfix, but maybe there's some better way to handle that bug.

> That leaves 2 that can
> go in, however as they are clearly insufficient to make RDMA safe for
> general use, they certainly do not warrant a stable@ label. The
> workaround for the Oopses is simple: use TCP.
>

Yeah, it's definitely rickety, but it's in and we do need to get fixes
merged to this code. I'm ok with dropping the stable labels on those
patches, but if we're going to declare this stuff "not stable enough
for general use" then I think that we should take an aggressive approach
on merging fixes to it.

FWIW, I also notice that Kconfig doesn't show the option to actually
enable/disable RDMA transports. I'll post a patch to fix that soon.
Since this stuff is not very safe to use, then we should make it
reasonably simple to disable it.

--
Jeff Layton <[email protected]>

2014-03-08 19:20:50

by Steve Wise

[permalink] [raw]
Subject: Re: NFS over RDMA crashing


> I removed your change and started debugging original crash that
> happens on top-o-tree. Seems like rq_next_pages is screwed up. It
> should always be >= rq_respages, yes? I added a BUG_ON() to assert
> this in rdma_read_xdr() we hit the BUG_ON(). Look
>
> crash> svc_rqst.rq_next_page 0xffff8800b84e6000
> rq_next_page = 0xffff8800b84e6228
> crash> svc_rqst.rq_respages 0xffff8800b84e6000
> rq_respages = 0xffff8800b84e62a8
>
> Any ideas Bruce/Tom?
>

Guys, the patch below seems to fix the problem. Dunno if it is correct
though. What do you think?

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 0ce7552..6d62411 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
sge_no++;
}
rqstp->rq_respages = &rqstp->rq_pages[sge_no];
+ rqstp->rq_next_page = rqstp->rq_respages;

/* We should never run out of SGE because the limit is defined to
* support the max allowed RPC data length
@@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct svcxprt_rdma
*xprt,

/* rq_respages points one past arg pages */
rqstp->rq_respages = &rqstp->rq_arg.pages[page_no];
+ rqstp->rq_next_page = rqstp->rq_respages;

/* Create the reply and chunk maps */
offset = 0;