Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp4232887ybb; Tue, 7 Apr 2020 03:29:55 -0700 (PDT) X-Google-Smtp-Source: APiQypKvUiT5UnZPZzj4e5cJ6NKgozZ0N3eFn8ILNe2Ft1u3pEXTL8nIKZnekhxfLlmy5Uh41nGk X-Received: by 2002:a9d:ef6:: with SMTP id 109mr887561otj.43.1586255395480; Tue, 07 Apr 2020 03:29:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586255395; cv=none; d=google.com; s=arc-20160816; b=cCHC5fzI5EDXEQLd+TSSlDQ+K7BvRUoA9cuDaf/fLIn0M+6fl0Dhi1Cu7PQi23RoKP bqXtH2GiyHUOTAZPsX33/0kxoO3A+t3OwbCiRIEysVAThMTSu1ZaEMSb29kL3g5HNcMf GrWlDDAbyBdnX6qERyjzfK3whK9yljr1YdYPW0xlJL/W4Za3fbFNa8qbWV4GwbaqsAOM istbgtTqs57rgIaBluaBAo69h9iglZqUf48C5/VfAQusn8DaFuYLawQIWo5JsWylcZ0w iwhj77EuLyobYaH78af9oYZ8hnKz94ZMFQGCjw8mC2HyBOMk92UxiIP75JpL+B0eRVeA sLtg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=ipSrqhEczgHZvA6r4TSFMScFNyCb+KTyg0AwWqsq7S0=; b=IVfU19E7YGx7QRMbkg+oqcsWbwjUEjuLRHWcoMn+17Sxtj3D+Rp0OiGgV1IIVRT2i9 8SRUEm4WBeWOO7bzT3qM9ry7HTHSzsl5/7xVG/eUx2ll8GrNfgfpeKiDjygx71h3OGvg 2K4OLOMFXFEHWVQXiWU/chlEsbdypvU/qVFMmnvC7WLGNDE8nMtE6487zt0hnvTGxPI9 qmgKAl3zi7a67EdcDXz2aRrlRgHYkHe2TG1iGjxmgnKHCfGAQk8eaLluJq+68YYgbl0P ed3QH3bWvahzRbq5NVGHwL3DI5ksT4kQW4hyrGsHPjwnqpIYJNsPK04OWVxxyWQoD5DD zZcQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=O9Y7xR5N; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g1si1120137otn.200.2020.04.07.03.29.43; Tue, 07 Apr 2020 03:29:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=O9Y7xR5N; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728939AbgDGKZZ (ORCPT + 99 others); Tue, 7 Apr 2020 06:25:25 -0400 Received: from mail.kernel.org ([198.145.29.99]:35956 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728927AbgDGKZX (ORCPT ); Tue, 7 Apr 2020 06:25:23 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E112D2078A; Tue, 7 Apr 2020 10:25:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1586255122; bh=2L9/6emOsrt3gbo23QLu/dAJOnl8QQFNsEcymw3XcXk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=O9Y7xR5NAL7uq+RZZCbpt3aTMTQzLzq3uK5T0Bw45eMka6vnQa4e8VQ/1m00nkNQl eBopb/cl1VLkt4mgfeG6o68ptV/1HQ+ZzMfqP9QrjZWXdWQRZJgBpu1MMvG0LfXlkF k9Roi+P7SCEMBKOUIFbaAhYdz5SxWcPweiOKhm7Q= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Kaike Wan , Mike Marciniszyn , Dennis Dalessandro , Jason Gunthorpe Subject: [PATCH 5.5 39/46] IB/hfi1: Ensure pq is not left on waitlist Date: Tue, 7 Apr 2020 12:22:10 +0200 Message-Id: <20200407101503.617028799@linuxfoundation.org> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200407101459.502593074@linuxfoundation.org> References: <20200407101459.502593074@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Mike Marciniszyn commit 9a293d1e21a6461a11b4217b155bf445e57f4131 upstream. The following warning can occur when a pq is left on the dmawait list and the pq is then freed: WARNING: CPU: 47 PID: 3546 at lib/list_debug.c:29 __list_add+0x65/0xc0 list_add corruption. next->prev should be prev (ffff939228da1880), but was ffff939cabb52230. (next=ffff939cabb52230). Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ast ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev drm_panel_orientation_quirks i2c_i801 mei_me lpc_ich mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci libahci i2c_algo_bit dca libata ptp pps_core crc32c_intel [last unloaded: i2c_algo_bit] CPU: 47 PID: 3546 Comm: wrf.exe Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.41.1.el7.x86_64 #1 Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019 Call Trace: [] dump_stack+0x19/0x1b [] __warn+0xd8/0x100 [] warn_slowpath_fmt+0x5f/0x80 [] ? ___slab_alloc+0x24e/0x4f0 [] __list_add+0x65/0xc0 [] defer_packet_queue+0x145/0x1a0 [hfi1] [] sdma_check_progress+0x67/0xa0 [hfi1] [] sdma_send_txlist+0x432/0x550 [hfi1] [] ? kmem_cache_alloc+0x179/0x1f0 [] ? user_sdma_send_pkts+0xc3/0x1990 [hfi1] [] user_sdma_send_pkts+0x158a/0x1990 [hfi1] [] ? try_to_del_timer_sync+0x5e/0x90 [] ? __check_object_size+0x1ca/0x250 [] hfi1_user_sdma_process_request+0xd66/0x1280 [hfi1] [] hfi1_aio_write+0xca/0x120 [hfi1] [] do_sync_readv_writev+0x7b/0xd0 [] do_readv_writev+0xce/0x260 [] ? pick_next_task_fair+0x5f/0x1b0 [] ? sched_clock_cpu+0x85/0xc0 [] ? __schedule+0x13a/0x860 [] vfs_writev+0x35/0x60 [] SyS_writev+0x7f/0x110 [] system_call_fastpath+0x22/0x27 The issue happens when wait_event_interruptible_timeout() returns a value <= 0. In that case, the pq is left on the list. The code continues sending packets and potentially can complete the current request with the pq still on the dmawait list provided no descriptor shortage is seen. If the pq is torn down in that state, the sdma interrupt handler could find the now freed pq on the list with list corruption or memory corruption resulting. Fix by adding a flush routine to ensure that the pq is never on a list after processing a request. A follow-up patch series will address issues with seqlock surfaced in: https://lore.kernel.org/r/20200320003129.GP20941@ziepe.ca The seqlock use for sdma will then be converted to a spin lock since the list_empty() doesn't need the protection afforded by the sequence lock currently in use. Fixes: a0d406934a46 ("staging/rdma/hfi1: Add page lock limit check for SDMA requests") Link: https://lore.kernel.org/r/20200320200200.23203.37777.stgit@awfm-01.aw.intel.com Reviewed-by: Kaike Wan Signed-off-by: Mike Marciniszyn Signed-off-by: Dennis Dalessandro Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/hfi1/user_sdma.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) --- a/drivers/infiniband/hw/hfi1/user_sdma.c +++ b/drivers/infiniband/hw/hfi1/user_sdma.c @@ -141,6 +141,7 @@ static int defer_packet_queue( */ xchg(&pq->state, SDMA_PKT_Q_DEFERRED); if (list_empty(&pq->busy.list)) { + pq->busy.lock = &sde->waitlock; iowait_get_priority(&pq->busy); iowait_queue(pkts_sent, &pq->busy, &sde->dmawait); } @@ -155,6 +156,7 @@ static void activate_packet_queue(struct { struct hfi1_user_sdma_pkt_q *pq = container_of(wait, struct hfi1_user_sdma_pkt_q, busy); + pq->busy.lock = NULL; xchg(&pq->state, SDMA_PKT_Q_ACTIVE); wake_up(&wait->wait_dma); }; @@ -256,6 +258,21 @@ pq_reqs_nomem: return ret; } +static void flush_pq_iowait(struct hfi1_user_sdma_pkt_q *pq) +{ + unsigned long flags; + seqlock_t *lock = pq->busy.lock; + + if (!lock) + return; + write_seqlock_irqsave(lock, flags); + if (!list_empty(&pq->busy.list)) { + list_del_init(&pq->busy.list); + pq->busy.lock = NULL; + } + write_sequnlock_irqrestore(lock, flags); +} + int hfi1_user_sdma_free_queues(struct hfi1_filedata *fd, struct hfi1_ctxtdata *uctxt) { @@ -281,6 +298,7 @@ int hfi1_user_sdma_free_queues(struct hf kfree(pq->reqs); kfree(pq->req_in_use); kmem_cache_destroy(pq->txreq_cache); + flush_pq_iowait(pq); kfree(pq); } else { spin_unlock(&fd->pq_rcu_lock); @@ -587,11 +605,12 @@ int hfi1_user_sdma_process_request(struc if (ret < 0) { if (ret != -EBUSY) goto free_req; - wait_event_interruptible_timeout( + if (wait_event_interruptible_timeout( pq->busy.wait_dma, - (pq->state == SDMA_PKT_Q_ACTIVE), + pq->state == SDMA_PKT_Q_ACTIVE, msecs_to_jiffies( - SDMA_IOWAIT_TIMEOUT)); + SDMA_IOWAIT_TIMEOUT)) <= 0) + flush_pq_iowait(pq); } } *count += idx;