Received: by 10.192.165.148 with SMTP id m20csp5608893imm; Wed, 9 May 2018 07:43:49 -0700 (PDT) X-Google-Smtp-Source: AB8JxZor+A7j+mTiBrF/Ma2kaisnaDIShfupZ2zp+cTQEkdupv5fXCTVTpJSAqxu8UMol2nWPtJb X-Received: by 10.98.69.68 with SMTP id s65mr44511134pfa.150.1525877029207; Wed, 09 May 2018 07:43:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525877029; cv=none; d=google.com; s=arc-20160816; b=i0//YiSs+FU9ee8EMsDFRNJs+MijPavgfT+8uK11tNmNaRUnPeOZd2CxZVNGOJFo5n dvbonPE622GZb2O10Z9q/mhRTdasWeStaiCYZYCW09VMeo2BSbb6/zr5BogydVPFoeJC 6NHJJjSWX0kGwhJSc+RsEzVwCfMgqadtK+m/+3xb4b+tPw2eB7hFJsRUFrkm+YddXTQy OvqPqPlNjZnDXldyaoLwwf7MlVqI2CXHEbUigQwYxGC/nCFNtseTLQ9o2CSWRSpF6WcL UA84ZwzuuzPzoh0zPRB3rY6pyZxwLAu/9DkkjiF0hL1JFAO75Eu8M2bWye/QSzDzPuvM YJoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:organization:references :in-reply-to:date:to:from:subject:message-id :arc-authentication-results; bh=HbiDYzvV8A6hej/Ro1ZPe7Qdk4j995WNYivr92grmKE=; b=keEv6UERNBNBZGehFviGal8dT9xSSsIs+Jq4USvBzCXuKTWddZxdIHHbES86uIEmeM bwM9BIFio+4RzU3KiCPitHwwRHeoTr7Qqag2xxes3UUqp/k+MF+47bezLTre4iH/0Nav l+pa0j/VAqwlKZdRx74EKcS166Nis/ekdx/BYtFYyPA3nGexGaT8tm94CBPS7e5tvC05 zQFa5g4RTB2kyFsd9QpeH7Hoam1kIrxZfW8V0YEFS5/NYGqvrUjpjQxBKUEhdMupr4dv Xf+qkSK6BI1JFlJMp1Z965ogDyFqtDPkH2lFaESLzRl+BYNJr1unzagFLYvsKRjbaALY dOjg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t1-v6si10160864pgr.681.2018.05.09.07.43.33; Wed, 09 May 2018 07:43:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935141AbeEIOnS (ORCPT + 99 others); Wed, 9 May 2018 10:43:18 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:42414 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S934775AbeEIOnQ (ORCPT ); Wed, 9 May 2018 10:43:16 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E5D5D722F4; Wed, 9 May 2018 14:43:15 +0000 (UTC) Received: from haswell-e.nc.xsintricity.com (ovpn-122-18.rdu2.redhat.com [10.10.122.18]) by smtp.corp.redhat.com (Postfix) with ESMTPS id AD7E0D74B6; Wed, 9 May 2018 14:43:15 +0000 (UTC) Message-ID: <1525876995.11756.376.camel@redhat.com> Subject: Re: [PATCH v4] nvmet,rxe: defer ip datagram sending to tasklet From: Doug Ledford To: Alexandru Moise <00moses.alexander00@gmail.com>, monis@mellanox.com, jgg@ziepe.ca, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, yanjun.zhu@oracle.com Date: Wed, 09 May 2018 10:43:15 -0400 In-Reply-To: <20180508090202.GA1690@gmail.com> References: <20180508090202.GA1690@gmail.com> Organization: Red Hat, Inc. Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-NBPUWTrLUdamQ/N8zfss" Mime-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Wed, 09 May 2018 14:43:15 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Wed, 09 May 2018 14:43:15 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'dledford@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-NBPUWTrLUdamQ/N8zfss Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, 2018-05-08 at 11:02 +0200, Alexandru Moise wrote: > This addresses 3 separate problems: >=20 > 1. When using NVME over Fabrics we may end up sending IP > packets in interrupt context, we should defer this work > to a tasklet. >=20 > [ 50.939957] WARNING: CPU: 3 PID: 0 at kernel/softirq.c:161 __local_bh_= enable_ip+0x1f/0xa0 > [ 50.942602] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G = W 4.17.0-rc3-ARCH+ #104 > [ 50.945466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO= S 1.11.0-20171110_100015-anatol 04/01/2014 > [ 50.948163] RIP: 0010:__local_bh_enable_ip+0x1f/0xa0 > [ 50.949631] RSP: 0018:ffff88009c183900 EFLAGS: 00010006 > [ 50.951029] RAX: 0000000080010403 RBX: 0000000000000200 RCX: 000000000= 0000001 > [ 50.952636] RDX: 0000000000000000 RSI: 0000000000000200 RDI: ffffffff8= 17e04ec > [ 50.954278] RBP: ffff88009c183910 R08: 0000000000000001 R09: 000000000= 0000614 > [ 50.956000] R10: ffffea00021d5500 R11: 0000000000000001 R12: ffffffff8= 17e04ec > [ 50.957779] R13: 0000000000000000 R14: ffff88009566f400 R15: ffff88009= 56c7000 > [ 50.959402] FS: 0000000000000000(0000) GS:ffff88009c180000(0000) knlG= S:0000000000000000 > [ 50.961552] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 50.963798] CR2: 000055c4ec0ccac0 CR3: 0000000002209001 CR4: 000000000= 00606e0 > [ 50.966121] Call Trace: > [ 50.966845] > [ 50.967497] __dev_queue_xmit+0x62d/0x690 > [ 50.968722] dev_queue_xmit+0x10/0x20 > [ 50.969894] neigh_resolve_output+0x173/0x190 > [ 50.971244] ip_finish_output2+0x2b8/0x370 > [ 50.972527] ip_finish_output+0x1d2/0x220 > [ 50.973785] ? ip_finish_output+0x1d2/0x220 > [ 50.975010] ip_output+0xd4/0x100 > [ 50.975903] ip_local_out+0x3b/0x50 > [ 50.976823] rxe_send+0x74/0x120 > [ 50.977702] rxe_requester+0xe3b/0x10b0 > [ 50.978881] ? ip_local_deliver_finish+0xd1/0xe0 > [ 50.980260] rxe_do_task+0x85/0x100 > [ 50.981386] rxe_run_task+0x2f/0x40 > [ 50.982470] rxe_post_send+0x51a/0x550 > [ 50.983591] nvmet_rdma_queue_response+0x10a/0x170 > [ 50.985024] __nvmet_req_complete+0x95/0xa0 > [ 50.986287] nvmet_req_complete+0x15/0x60 > [ 50.987469] nvmet_bio_done+0x2d/0x40 > [ 50.988564] bio_endio+0x12c/0x140 > [ 50.989654] blk_update_request+0x185/0x2a0 > [ 50.990947] blk_mq_end_request+0x1e/0x80 > [ 50.991997] nvme_complete_rq+0x1cc/0x1e0 > [ 50.993171] nvme_pci_complete_rq+0x117/0x120 > [ 50.994355] __blk_mq_complete_request+0x15e/0x180 > [ 50.995988] blk_mq_complete_request+0x6f/0xa0 > [ 50.997304] nvme_process_cq+0xe0/0x1b0 > [ 50.998494] nvme_irq+0x28/0x50 > [ 50.999572] __handle_irq_event_percpu+0xa2/0x1c0 > [ 51.000986] handle_irq_event_percpu+0x32/0x80 > [ 51.002356] handle_irq_event+0x3c/0x60 > [ 51.003463] handle_edge_irq+0x1c9/0x200 > [ 51.004473] handle_irq+0x23/0x30 > [ 51.005363] do_IRQ+0x46/0xd0 > [ 51.006182] common_interrupt+0xf/0xf > [ 51.007129] >=20 > 2. Work must always be offloaded to tasklet for rxe_post_send_kernel() > when using NVMEoF in order to solve lock ordering between neigh->ha_lock > seqlock and the nvme queue lock: >=20 > [ 77.833783] Possible interrupt unsafe locking scenario: > [ 77.833783] > [ 77.835831] CPU0 CPU1 > [ 77.837129] ---- ---- > [ 77.838313] lock(&(&n->ha_lock)->seqcount); > [ 77.839550] local_irq_disable(); > [ 77.841377] lock(&(&nvmeq->q_lock)->rlo= ck); > [ 77.843222] lock(&(&n->ha_lock)->seqcou= nt); > [ 77.845178] > [ 77.846298] lock(&(&nvmeq->q_lock)->rlock); > [ 77.847986] > [ 77.847986] *** DEADLOCK *** >=20 > 3. Same goes for the lock ordering between sch->q.lock and nvme queue loc= k: >=20 > [ 47.634271] Possible interrupt unsafe locking scenario: > [ 47.634271] > [ 47.636452] CPU0 CPU1 > [ 47.637861] ---- ---- > [ 47.639285] lock(&(&sch->q.lock)->rlock); > [ 47.640654] local_irq_disable(); > [ 47.642451] lock(&(&nvmeq->q_lock)->rlo= ck); > [ 47.644521] lock(&(&sch->q.lock)->rlock= ); > [ 47.646480] > [ 47.647263] lock(&(&nvmeq->q_lock)->rlock); > [ 47.648492] > [ 47.648492] *** DEADLOCK *** >=20 > Using NVMEoF after this patch seems to finally be stable, without it, > rxe eventually deadlocks the whole system and causes RCU stalls. >=20 > Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com> Thanks, applied to for-rc. --=20 Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint =3D AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD --=-NBPUWTrLUdamQ/N8zfss Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEErmsb2hIrI7QmWxJ0uCajMw5XL90FAlrzCQMACgkQuCajMw5X L90i1A//b3pol2XK2r/yA6WR3qV42v7UCXlq9TqW2GoVtKYaGAob017JRToAoo2F BMwG7di04DDcrVStX6f/SucJu+8hxsSIYWKtfyS4UPgpgYUfSYJQ9iG9fu7UWnXC I3z48QSOizWxwQNYkgVMg4YGsECkiBQdLmm5DMRbHY2dpSGIcDEwd0wmKZsx62VB +604hcUTsNYi3+YYtp+IeQs2E5N53YcotjSXVXpFj7cJsu4TLOcRNRh4Q+qEfOiq 5e6ALZLfp3U+hj9AqlYMpdTeZlIx7uIIlruyi/p8QGGDb4MJMfuFuYtJenZj4jII 0uMb4UyJwtAvWFq4NQ3Y5LUdxRkubOI8OYzbua7DYiiyihZnRy3o2TWr+FVsVNjk sIB5bKKOUHHa0dCclUPQvHloGIj0RSADm0GbMOb+3oB8nfu9iAI9ac3PegrokdlN YU38MZRzABbueYyfq2hb9aCwBGVaamtHL2CQe9MYvWuQLHWJkk0UhShVYGALb0s0 fRHZy7VvOiVuYCb/80Id2Qs6dSqoSDKPxV1W8EpXwaib9tYlH1btpiItFYakaaUn Z+zNT2r5CsGr6Ja63kAhaJExh4CHmVggSaJTS3RiiX6OdewoZu7NHl+jD3o8hMl4 WAy5mUhJU6Y4SCUtcUFh1r8EL18LhXYB77v0W9JZhWzps/mzm0g= =XHxY -----END PGP SIGNATURE----- --=-NBPUWTrLUdamQ/N8zfss--