Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1498818ybt; Thu, 9 Jul 2020 08:26:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxZxuKZ2mTzOhtyrRQyf4xyYunPWcQmkAqR8+YRLZBnpcsmTJnU3mRJXjFqrR7s4dHmbigZ X-Received: by 2002:aa7:c583:: with SMTP id g3mr73898323edq.228.1594308414629; Thu, 09 Jul 2020 08:26:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594308414; cv=none; d=google.com; s=arc-20160816; b=x/74DDVb0kGH0pWqwMCxACMn9kSGrDRk4l0k7Spx5y+NraoVGeP4mCzaYT4xfTh42k Pzk6QDiC9a/sDqBUNpNfaY41DTS1V8LAEdHyW1bKpJIVwxhBA3SHjkGNoHFD5DsOW2Y/ XMmkODJ3c+OKeeLHmPrk+LsU2r6pvTNVnevZ9gqBktaQl9vTX+hkVt6gvVy/benOGrTF JUkLeFM0oEyDL2rI2jTGlNPfg0NkULhQa+Aw4mCOMc071HeRBQ62UwZpGpgn5XkIHo0X YBgTz+cVVKKa7Q5NaboNcpSKbw5oO5uPm6zx4X1wT+sJAcZdQOxKD8zJjohNERTzfVJy giEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:ironport-sdr:ironport-sdr; bh=6wUXhb2ToC1J4DwjNaSdqVeOrf/oQQ2YCtdqpYDzRFY=; b=dyjSKSVgrTKCQGVTApXJglm/HkQWCf2OyWt4bN68I8amTbmJ5hG7y/6CFCProG/wQy pSsBCKQ3w7MJ8HlIBbBy2JuhHzJXdbtzE4iDDOzpHaak2zwh5dcD/T+DC5IYP7EwalWL B7jqguCaAo/mqRr0thcFkNvDB5qswi833csGMNkPliPG1lxl9jyx1DzouY69FDCtdrea mO1xxYcBJFEILAOxhHT+uIZCKWZbWloI9ynPAqPh8MYHm4rHWvy3LwgX6wpYxpBxgL1e zYf9uh76JArzA3CWucgbajDtFuIIzV0SpKIGX7rNdCnbIhp2h60uwntIJlpvkZTdY514 ljrQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id gh11si2179896ejb.442.2020.07.09.08.26.31; Thu, 09 Jul 2020 08:26:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728104AbgGIPX5 (ORCPT + 99 others); Thu, 9 Jul 2020 11:23:57 -0400 Received: from mga07.intel.com ([134.134.136.100]:63085 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727925AbgGIPX4 (ORCPT ); Thu, 9 Jul 2020 11:23:56 -0400 IronPort-SDR: VwTqOj2jPLcowky/2JJ09tQO6hG1v9cFL/rs2Wl/m4iXIagaqsypRDWkUIYwSaGMrNpTEHt9xJ GOeZBFAnKerA== X-IronPort-AV: E=McAfee;i="6000,8403,9677"; a="212926219" X-IronPort-AV: E=Sophos;i="5.75,331,1589266800"; d="scan'208";a="212926219" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jul 2020 08:23:42 -0700 IronPort-SDR: o7gO3hg17ryKJug7bQpOb35jPcYNb2P308g6e0B8Bqy4Yi0NV3vsp6mEkAAEJIbLnhpdHPGjng 5VAC4qaBV8Vg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,331,1589266800"; d="scan'208";a="358475675" Received: from djiang5-mobl1.amr.corp.intel.com (HELO [10.209.57.174]) ([10.209.57.174]) by orsmga001.jf.intel.com with ESMTP; 09 Jul 2020 08:23:41 -0700 Subject: Re: [PATCH v2] dmaengine: check device and channel list for empty To: Jiri Slaby , vkoul@kernel.org Cc: Swathi Kovvuri , peter.ujfalusi@ti.com, Linux kernel mailing list References: <159319496403.69045.16298280729899651363.stgit@djiang5-desk3.ch.intel.com> <852318ec-9e18-3dee-a91d-1cf4dddb8906@kernel.org> From: Dave Jiang Message-ID: <83932426-d52a-2e62-9d4b-5abb134a64df@intel.com> Date: Thu, 9 Jul 2020 08:23:41 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <852318ec-9e18-3dee-a91d-1cf4dddb8906@kernel.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/8/2020 10:35 PM, Jiri Slaby wrote: > On 07. 07. 20, 17:42, Dave Jiang wrote: >> On 7/6/2020 11:05 PM, Jiri Slaby wrote: >>> On 26. 06. 20, 20:09, Dave Jiang wrote: >>>> Check dma device list and channel list for empty before iterate as the >>>> iteration function assume the list to be not empty. With devices and >>>> channels now being hot pluggable this is a condition that needs to be >>>> checked. Otherwise it can cause the iterator to spin forever. >>> >>> Could you be a little bit more specific how this can spin forever? I.e. >>> can you attach a stacktrace of such a behaviour? >> >> I can't seem to find the original splat that lead me to the conclusion >> of it's spinning forever. As I recall, the issue seems to produce >> different splats and not always consistent in being reproduced. Here's a >> partial splat that was tracked by the internal bug database. Since with >> the dma device and channel list being are hot added and removed, the >> device and channel lists can be empty. The list_entry() and friends >> expect the list to not be empty (according to header comment), I added >> the check to ensure that isn't the case before using them in dmaengine. > > Yes, the comment states that as it is true: you receive a > wild/non-checkable pointer if you do list_entry on an empty list. BUT > have you actually read what I wrote: > >>> As in the empty case, "&pos->member" is "head" (look into >>> list_for_each_entry) and the for loop should loop exactly zero times. > > HERE ^^^^ > >> With the fix, we can no longer produce any of the splats. So maybe the >> above was a bad description of the issue. > > No, not only the description, worse, the patch proper looks wrong. > >> [ 4216.048375]  ? dma_channel_rebalance+0x7b/0x250 >> [ 4216.056360]  dma_async_device_register+0x349/0x3a0 >> [ 4216.064604]  idxd_register_dma_device+0x90/0xc0 [idxd] >> [ 4216.073175]  idxd_config_bus_probe.cold+0x7d/0x1fc [idxd] > > So, the good part in the patch is the fixed locking in > dma_async_device_register. Otherwise it adds nonsense checks. So you > fixed the issue only by a chance, by a side effect as Peter pointed out. > Leaving aside that you broke dma_request_chan -- that could happen to > anybody. > > Vinod, please drop/revert this patch. Then start over only with > dma_async_device_register fixed locking. I'll start on the proper fix. > > thanks, >