Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1133205ybt; Wed, 8 Jul 2020 22:38:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxOC5P58pPOQOn2yjZWklKF1Xhe8ge3qV8hshWdzRF93uP2HXGAKD3xQJ5steQ0RffCNxPc X-Received: by 2002:a17:906:d9c4:: with SMTP id qk4mr58186762ejb.100.1594273093259; Wed, 08 Jul 2020 22:38:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594273093; cv=none; d=google.com; s=arc-20160816; b=aixhnK3tCk9xHWseai4qsf2Tmhf9zevFhNPKWy/SN4IUSRW6m22l0h1MU5bCQ/Ddrc ZWskJtahieH2cgKvM3dDnluFtttUiU7wS82QIFGG8wopYhnAgecO0VUqutX6lYZpcz1u Jos1SjGoomH0WdddzYRBpzuP6oeyAcRofQKuIPX2CnHvVs5d8MadZ6Iur+X/rQ3Kz/4O jIOypFsvLT9aq8VmK7qNnOP0Id5w+w+xiV3A8oisX4xh8Od0fy3qriAc+YdI3MbVJEMd LoBYIIHf+gv1+k6BF6dkHgzsxEAaZSS1Y71RST0Cmt0ifYymd7NIHbFF4p8lDc2ByXMC /NBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=x/YkVRh01cwCdEzkOpoy/DTTQYnq+R76yuGpXCDqef4=; b=Rd5Ccj3a/Pgda6YoUJ+TCsjq+Nt4RBqk3Ubg1SNHcBUkWvfPygKAXLLj47T40+b36a aibBghoUwsTRpJauLJ/ecR/5U9UKzn/AnGqWJh6Pn8cYgOuv0QdZemnBTg+b6bhqjqoY zUfVgmy4sj5iWPtJ0A//O7tQNu5R8y8G06KCIuwR5T/zlqmtSHWG+OXSMQ4YWoCjSkJ6 m7a7UXKBohlNHHbWksNQgJjbYAkFlV5KGc+WvWZ8/ZNoagyXiTtcRvLG+bHw9pj7rCkf rrML0+DwUSwX11B4EO7dCMqbFFPz0QCgJOonF95jlaSR7gH5daN6EohZkvhJuSL1MzKu dzuw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f3si1395290edy.2.2020.07.08.22.37.50; Wed, 08 Jul 2020 22:38:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726343AbgGIFfk (ORCPT + 99 others); Thu, 9 Jul 2020 01:35:40 -0400 Received: from mail-ej1-f68.google.com ([209.85.218.68]:38468 "EHLO mail-ej1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726129AbgGIFfj (ORCPT ); Thu, 9 Jul 2020 01:35:39 -0400 Received: by mail-ej1-f68.google.com with SMTP id w16so938023ejj.5 for ; Wed, 08 Jul 2020 22:35:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=x/YkVRh01cwCdEzkOpoy/DTTQYnq+R76yuGpXCDqef4=; b=WVKjXKFXcnH7RXPN/I0zplDdRu59a/VIguAa3P8ZOXT6CPNG/ewlIW8tNejiTqGx1p C5bVC/bcbNr21c1uBMdqJy2YncBHZIa9pH75BAmdawbDEKBy0fTw5TF8gshJUHvRlF4w mXL2LOyeFOIGJphOev44xTs6YqmnQfbbpi7cQs62rLDIo3F3hHalm8nF+iYlHGcGhBXT F7itQ6zWF+JNKWL+WDvypdrF4QT3AcgnjA3a7OqroEGqXQkd6vPVt9dRU5aa44+sVMrP mNiy1U8EPNTbVLmx+YPN122Ah6q9+/Ag7GWinSgraMS8cagj51HszeF3uqgBjZpUc8uM M6sQ== X-Gm-Message-State: AOAM5318dNS+oJL+8UWXYyxQ2Hd+vAzY5qQoj5Oz2GS0nHpIH/xDddqC cD7HCjYb91VYMajZrutxVWjKepI1 X-Received: by 2002:a17:907:2654:: with SMTP id ar20mr51976160ejc.62.1594272937511; Wed, 08 Jul 2020 22:35:37 -0700 (PDT) Received: from ?IPv6:2a0b:e7c0:0:107::49? ([2a0b:e7c0:0:107::49]) by smtp.gmail.com with ESMTPSA id u18sm1174665edx.34.2020.07.08.22.35.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 08 Jul 2020 22:35:36 -0700 (PDT) Subject: Re: [PATCH v2] dmaengine: check device and channel list for empty To: Dave Jiang , vkoul@kernel.org Cc: Swathi Kovvuri , peter.ujfalusi@ti.com, Linux kernel mailing list References: <159319496403.69045.16298280729899651363.stgit@djiang5-desk3.ch.intel.com> From: Jiri Slaby Message-ID: <852318ec-9e18-3dee-a91d-1cf4dddb8906@kernel.org> Date: Thu, 9 Jul 2020 07:35:36 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07. 07. 20, 17:42, Dave Jiang wrote: > On 7/6/2020 11:05 PM, Jiri Slaby wrote: >> On 26. 06. 20, 20:09, Dave Jiang wrote: >>> Check dma device list and channel list for empty before iterate as the >>> iteration function assume the list to be not empty. With devices and >>> channels now being hot pluggable this is a condition that needs to be >>> checked. Otherwise it can cause the iterator to spin forever. >> >> Could you be a little bit more specific how this can spin forever? I.e. >> can you attach a stacktrace of such a behaviour? > > I can't seem to find the original splat that lead me to the conclusion > of it's spinning forever. As I recall, the issue seems to produce > different splats and not always consistent in being reproduced. Here's a > partial splat that was tracked by the internal bug database. Since with > the dma device and channel list being are hot added and removed, the > device and channel lists can be empty. The list_entry() and friends > expect the list to not be empty (according to header comment), I added > the check to ensure that isn't the case before using them in dmaengine. Yes, the comment states that as it is true: you receive a wild/non-checkable pointer if you do list_entry on an empty list. BUT have you actually read what I wrote: >> As in the empty case, "&pos->member" is "head" (look into >> list_for_each_entry) and the for loop should loop exactly zero times. HERE ^^^^ > With the fix, we can no longer produce any of the splats. So maybe the > above was a bad description of the issue. No, not only the description, worse, the patch proper looks wrong. > [ 4216.048375]  ? dma_channel_rebalance+0x7b/0x250 > [ 4216.056360]  dma_async_device_register+0x349/0x3a0 > [ 4216.064604]  idxd_register_dma_device+0x90/0xc0 [idxd] > [ 4216.073175]  idxd_config_bus_probe.cold+0x7d/0x1fc [idxd] So, the good part in the patch is the fixed locking in dma_async_device_register. Otherwise it adds nonsense checks. So you fixed the issue only by a chance, by a side effect as Peter pointed out. Leaving aside that you broke dma_request_chan -- that could happen to anybody. Vinod, please drop/revert this patch. Then start over only with dma_async_device_register fixed locking. thanks, -- js