Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp1245584pxa; Thu, 13 Aug 2020 04:31:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxzHYDHk71BBkgLWPcSIOFWsadrHWm+lfuCBx+BvuC3NFsoVopk4fuA9HveGX70KDcJvJyL X-Received: by 2002:a17:907:1191:: with SMTP id uz17mr4483984ejb.184.1597318274785; Thu, 13 Aug 2020 04:31:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597318274; cv=none; d=google.com; s=arc-20160816; b=MhnonYUObFw0ZqarrLILCz605tKcqVRfTDmVWyD249vyakWE9jpnOhHZYGgVxClC91 +MI/lRu4H1fuyPFeIyagGNZLHqv2wd4ypL3UWzfimSCH4EaPmXj/1b1hcDZkuSU+KVlr bco0bk8WSqtiAydMn9/8dEXvS1txKZGfQj+shDhKnPgCmSg7+4tf/76B72ZtjLPYrMb1 JckVHPKjjlNftZI0NICVRZPAs4G5tOx0QLvfiGbuWjT1m6ZeDTpFWwkMAxQp9FBqyVtS pr6Cm44IcW6qN4U3M2bbsG3hOJZ7qAWknxPB+fadn2W1kuYg58Y+8mnpEeVX70betP8e Dqbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-disposition:mime-version :message-id:subject:cc:to:from:date:ironport-sdr:dkim-signature; bh=kJ+5rHf/C3tP8r4ujGU+sgc6c79/X1OwxaWHoV8nvXk=; b=MX9XmQoatjIVg15gnwk/xPSIhVLPilJb0ifbvSV9nKiXBcH2ihT7q2ErX4GBiK4Ien N1dlsAZCrA6qyi/Y9hB1F+Dk7BMPh1M276f1aVo185C+JgO+d9AsHlj2M/go0KldS+x4 sBZimmqOkmLpfY4OWk9qGB5SZR3XtGzhIRXq1AQI/CNmTqtjfMaVS/W3gN89tDD/mefV xVHEVKCnRWpCQEMKoJetR5QvBjuA0rR1CbX+l2LQTCr0Paam97hbltyBduPTrXzB9cOI dt4+OhvozUc4+LDs//JZekuF5XU2adXQ1sH4Dcnio6JoM5Bwc4k86nUlj+V5xZb6XWmB IKAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@skidata.com header.s=selector1 header.b=gmiIs9xB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=skidata.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id jp21si3126968ejb.136.2020.08.13.04.30.49; Thu, 13 Aug 2020 04:31:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=fail header.i=@skidata.com header.s=selector1 header.b=gmiIs9xB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=skidata.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726615AbgHMLaP (ORCPT + 99 others); Thu, 13 Aug 2020 07:30:15 -0400 Received: from mail2.skidata.com ([91.230.2.91]:34726 "EHLO mail2.skidata.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726100AbgHMLaO (ORCPT ); Thu, 13 Aug 2020 07:30:14 -0400 X-Greylist: delayed 430 seconds by postgrey-1.27 at vger.kernel.org; Thu, 13 Aug 2020 07:30:12 EDT DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=skidata.com; i=@skidata.com; q=dns/txt; s=selector1; t=1597318212; x=1628854212; h=date:from:to:cc:subject:message-id:mime-version; bh=89bAGdQq58IWsLtFPLh9wpw9ICJ6Mc83yC/h2CpVF7Y=; b=gmiIs9xB40mH5A5EZR2XIW6s9njJ+NmW1Jbq8TKZrHpuAX42r+Qo2ETa MEkO42ZuX6gv+G+6Dt7Zk68YEc836Z4GbWXRS8+7emfWPQKZ54ThPsLBL cNORA9GoL4kI1YtlLLMEl8KS4d82vdXrwZiRSt6uOJSgAknvzTBHDgzeQ GlXOrV2WmuYbi+UNI/BrybRyWROEvHqKzGIjdfCJRe9vPHiPvgVhtqwYQ xS64s8kIFaZZOAmNcXl01Ewkci2pB5eZcsSYk/KOsJ/1eRuxK4iKQyfr1 H/XLIFePptK55urI2+7zpSuugKYtVo0GjQpw9Cx0zzyVhu1Xd9caC+SMn w==; IronPort-SDR: g2o/GJDhU0h+zjPmQsuKI9ODAT/nVkeCiipCpG2uwpWm3Ae9iIIdcYGq0fMra8Ta3DFaRKw9EN QHBiww9oRQd1JARQBabv+Mtm/liFgUXSvJNk1lVA4Sn+BgZlk68O6Yp7I/qdZbSH89QuUn4G8p bHRgOEaC8nffSuA5+pzf4rHSs7uX7mvkRpbaWzW3tNAD61xzP6sUICpmgLgiQ3BHpmpR4MGc0n Z3a5yxwqqljYiuMXgJWpi4RtTZw/LJi7D9uHi++qB7ptZ6mu0pQJA9I0ZlmGWFaIjYC/4aIRDr TJ4= X-IronPort-AV: E=Sophos;i="5.76,308,1592863200"; d="scan'208";a="2640424" Date: Thu, 13 Aug 2020 13:22:58 +0200 From: Richard Leitner To: , , , CC: , , , , , , , Benjamin Bara Subject: pcm|dmaengine|imx-sdma race condition on i.MX6 Message-ID: <20200813112258.GA327172@pcleri> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline X-Originating-IP: [192.168.111.252] X-ClientProxiedBy: sdex6srv.skidata.net (192.168.111.84) To sdex5srv.skidata.net (192.168.111.83) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, we've found a race condition with the PCM on the i.MX6 which results in an -EIO for the SNDRV_PCM_IOCTL_READI_FRAMES ioctl after an -EPIPE (XRUN). A possible reproduction may look like the following reduced call graph during a PCM capture: us -> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) - wait_for_avail() - schedule_timeout() -> snd_pcm_update_hw_ptr0() - snd_pcm_update_state: EPIPE (XRUN) - sdma_disable_channel_async() # get's scheduled away due to sleep us <- ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) returns -EPIPE us -> ioctl(SNDRV_PCM_IOCTL_PREPARE) # as reaction to the EPIPE (XRUN) us -> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) # next try to capture frames - sdma_prep_dma_cyclic() - sdma_load_context() # not loaded as context_loaded is 1 - wait_for_avail() - schedule_timeout() # now the sdma_channel_terminate_work() comes back and sets # context_loaded = false and frees in vchan_dma_desc_free_list(). us <- ioctl returns -EIO (capture write error (DMA or IRQ trouble?)) What we have found out, based on our understanding: The dmaengine docu states that a dmaengine_terminate_async() must be followed by a dmaengine_synchronize(). However, in the pcm_dmaengine.c, only dmaengine_terminate_async() is called (for performance reasons and because it might be called from an interrupt handler). In our tests, we saw that the user-space immediately calls ioctl(SNDRV_PCM_IOCTL_PREPARE) as a handler for the happened xrun (previous ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) returns with -EPIPE). In our case (imx-sdma.c), the terminate really happens asynchronously with a worker thread which is not awaited/synchronized by the ioctl(SNDRV_PCM_IOCTL_PREPARE) call. Since the syscall immediately enters an atomic context (snd_pcm_stream_lock_irq()), we are not able to flush the work of the termination worker from within the DMA context. This leads to an unterminated DMA getting re-initialized and then terminated. On the i.MX6 platform the problem is (if I got it correctly) that the sdma_channel_terminate_work() called after the -EPIPE gets scheduled away (for the 1-2ms sleep [1]). During that time the userspace already sends in the ioctl(SNDRV_PCM_IOCTL_PREPARE) and ioctl(SNDRV_PCM_IOCTL_READI_FRAMES). As none of them are anyhow synchronized to the terminate_worker the vchan_dma_desc_free_list() [2] and "sdmac->context_loaded = false;" [3] are executed during the wait_for_avail() [4] of the ioctl(SNDRV_PCM_IOCTL_READI_FRAMES). To make sure we identified the problem correctly we've tested to add a "dmaengine_synchronize()" before the snd_pcm_prepare() in [5]. This fixed the race condition in all our tests. (Before we were able to reproduce it in 100% of the test runs). Based on our understanding, there are two different points to ensure the termination: Either ensure that the termination is finished within the previous SNDRV_PCM_IOCTL_READI_FRAMES call (inside the DMA context) or finishing it in the SNDRV_PCM_IOCTL_PREPARE call (and all other applicable ioclts) before entering the atomic context (from the PCM context). We initially thought about implementing the first approach, basically splitting up the dma_device terminate_all operation into a sync (busy-wait) and a async one. This would align the operations with the DMAengine interface and would enable a sync termination variant from atomic contexts. However, we saw that the dma_free_attrs() function has a WARN_ON on irqs disabled, which would be the case for the sync variant. Side note: We found this issue on the current v5.4.y LTS branch, but it also affects v5.8.y. Any feedback or pointers how we may fix the problem are warmly welcome! If anything is unclear please just ask :-) regards; Richard Leitner Benjamin Bara [1]https://elixir.bootlin.com/linux/v5.8/source/drivers/dma/imx-sdma.c#L1066 [2]https://elixir.bootlin.com/linux/v5.8/source/drivers/dma/imx-sdma.c#L1071 [3]https://elixir.bootlin.com/linux/v5.8/source/drivers/dma/imx-sdma.c#L1072 [4]https://elixir.bootlin.com/linux/v5.8/source/sound/core/pcm_lib.c#L1825 [5]https://elixir.bootlin.com/linux/v5.8/source/sound/core/pcm_native.c#L3226