Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp2112716pxa; Mon, 17 Aug 2020 00:38:42 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwRSf4bb7N9aV5rC6/cvxgFay6RyO78l4WmluPWT6ot3cILA8VDW12ZgUFlBdQAg+pWKv3z X-Received: by 2002:a17:906:35ce:: with SMTP id p14mr14493738ejb.276.1597649922328; Mon, 17 Aug 2020 00:38:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597649922; cv=none; d=google.com; s=arc-20160816; b=wssGPJtH2aBOMOY0vsasMxHE2aINGMttZ7QJI2pWj+b3GL+bHP6GYiALUUXMtBUyFM 4oAsBN2UwC/4w/qVOJzTLYQ5l8vtkAsqQUT5Z6I027AGBHpnkaxzt6U8c5NCksb4IbIL bXvSW2i0cFmUJDj4i6Lj5ym0ArgHq5XBgHMUaygrHW3TqiTVCI1FDeZMlG4bIG8cICwf 3mwqsOeegJ1Z82Qak0laSxbB4n3PVoAJ7snLr+IdvPnD3IWPj/P7Td8p2WWO5irrJ+gr WV8LRLeTVsjVacHcl0dLxjcGjohSdV48T5peU7jm012y3FXmHjCgkP9/sRlwOnUY37Xw +2QQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:cc:to:from:ironport-sdr :dkim-signature; bh=dfltYVbOwdZmovdwWcIT4gKlyuQ6yqtmhZVArzXWgEc=; b=XPEBKAnrIQaD55H+28hNMqeHp99pJtYRARjTGz81vpPjfVEFlS+e2ztkRJqZxauKpl BkZQZXr2lJrnYe81bpXamG78307MSdkHbTdM5ptrqKvCJIu2fS0w1JhfVbctpbYySPwg GiKIgawdAX3nUjiLeWHTT663ZWpvP83jQaeXsJyZbZVGiby0x0vH1mVhVgTgeG1M219g Fz0S3Ej8eJB7wsp5UOe7BWpH3bvGXbCEZTx2jN8JSVKQKKjvLGHXUjnH6ume4yEMW+Wk RStf39UiJhpSj9Z2BiemHBZydYGgFDQ0gyI1DXAPdf5JQfEO2NIhJYdpzGA8yyHOhY+9 DLTw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@skidata.com header.s=selector1 header.b=DzfAMctk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=skidata.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w13si10175318ede.238.2020.08.17.00.38.19; Mon, 17 Aug 2020 00:38:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=fail header.i=@skidata.com header.s=selector1 header.b=DzfAMctk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=skidata.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726366AbgHQHfr (ORCPT + 99 others); Mon, 17 Aug 2020 03:35:47 -0400 Received: from mail1.skidata.com ([91.230.2.99]:52668 "EHLO mail1.skidata.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726171AbgHQHfr (ORCPT ); Mon, 17 Aug 2020 03:35:47 -0400 X-Greylist: delayed 428 seconds by postgrey-1.27 at vger.kernel.org; Mon, 17 Aug 2020 03:35:43 EDT DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=skidata.com; i=@skidata.com; q=dns/txt; s=selector1; t=1597649744; x=1629185744; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=+RD2yN5N1q3un9IT1M50pdc25CeJUXzpLiptrNEMp4I=; b=DzfAMctkhSF9NTcFvvu0VFB8aR9syhGL+H0Hx2MLIzNAfLn4YIgPrcJo FUWQ8NyL3bfPsgi6Z4BqlwBq50rSNSowGNCIXmtD+QtCbJlrh5xDY8q9i f3VCTxb8JAzQ+D9xpJWG/4MYFxA7jzaSMh8u8SUwBrlPh1uhKpOysVTfU hX5/myX6Tagvxl/ldf+yF++kCPlzsHhlC2AYmxOVJ5zID+MNFo3WXt+5g ptwal9D4byfDpWzstp9E7RnKJm+JzXlxVdW9DPLPXIg6gt5jlQrni/9F6 0yoFmA9QNsPD3tXeES+/4cnfO36EyFsQy/BYsdVFJQOCpi2C8BJW5Nvbb g==; IronPort-SDR: jiDMR+V544j3gOce62tFfWKGXrrLqEe/pdjucy+gw24PjZnbDl329ICrdflz8SEFksEsf2k0jV 54mCnrEMkI6c/MniViVITnAcoy8eKx0GutP6stpU9hnISpJWeJftEEWEXrvQXzZYTtGuqYkRas Fmah0qd/0n5Y2jgBxCKMGHuyBpjyuZmFt51gQ/cJ8OiiBPV6+KZ+nXMjJZdUj8jTvq/siUl/Yz MyGnkKLE6fCmbZ/eASzZD0PGpr6AyxoACu573PAA0QSL/YljSVQR4qC4u13NB1Y2l0K37WHGff NIU= X-IronPort-AV: E=Sophos;i="5.76,322,1592863200"; d="scan'208";a="26167241" From: Benjamin Bara - SKIDATA To: Robin Gong CC: "timur@kernel.org" , "nicoleotsuka@gmail.com" , "vkoul@kernel.org" , "dan.j.williams@intel.com" , dl-linux-imx , "shawnguo@kernel.org" , "kernel@pengutronix.de" , "dmaengine@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" , "alsa-devel@alsa-project.org" , Richard Leitner - SKIDATA Subject: RE: pcm|dmaengine|imx-sdma race condition on i.MX6 Thread-Topic: pcm|dmaengine|imx-sdma race condition on i.MX6 Thread-Index: AQHWcWQZKYOChL0mPkuCFeZyDJy6mKk3KiiAgABS1/A= Date: Mon, 17 Aug 2020 07:28:31 +0000 Message-ID: <61498763c60e488a825e8dd270732b62@skidata.com> References: <20200813112258.GA327172@pcleri> In-Reply-To: Accept-Language: en-US, de-AT Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [192.168.111.252] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We think this is not an i.MX6-specific problem, but a problem of the DMAeng= ine usage from the PCM. In case of a XRUN, the DMA channel is never closed but first a SNDRV_PCM_TR= IGGER_STOP next a SNDRV_PCM_TRIGGER_START is triggered. The SNDRV_PCM_TRIGGER_STOP simply executes a dmaengine_terminate_async() [1= ] but does not await the termination by calling dmaengine_synchronize(), which is required as stated by the docu [2]. Anyways, we are not able to fix it in the pcm_dmaengine layer either at the= end of SNDRV_PCM_TRIGGER_STOP (called from the DMA on complete interrupt handler) or at the beginning of SNDRV_PCM_TRIGGER_START (called from a PCM ioctl), since the dmaengine_synchronize() requires a non-atomic context. Based on my understanding, most of the DMA implementations don't even imple= ment device_synchronize and if they do, it might not really be necessary since the terminate_all op= eration is synchron. With the i.MX6, it looks a bit different: Since [4], the terminate_all operation really schedules a worker which wait= s the required ~1ms and then does the context freeing. Now, the ioctl(SNDRV_PCM_IOCTL_PREPARE) and the following ioctl(SNDRV_PCM_I= OCTL_READI_FRAMES), which are called from US to handle/recover from a XRUN, are in a race with = the terminate_worker. If the terminate_worker finishes earlier, everything is fine. Otherwise, the sdma_prep_dma_cyclic() is called, sets up everything and as soon as it is scheduled out to wait for data, the terminate_worker is sc= heduled and kills it. In this case, we wait in [5] until the timeout is reached and return with -= EIO. Based on our understanding, there exist two different fixing approaches: We thought that the pcm_dmaengine should handle this by either synchronizin= g the DMA on a trigger or terminating it synchronously. However, as we are in an atomic context, we either have to give up the atom= ic context of the PCM to finish the termination or we have to design a synchronous terminate vari= ant which is callable from an atomic context. For the first option, which is potentially more performant, we have to leav= e the atomic PCM context and we are not sure if we are allowed to. For the second option, we would have to divide the dma_device terminate_all= into an atomic sync and an async one, which would align with the dmaengine API, giving it the optio= n to ensure termination in an atomic context. Based on my understanding, most of them are synchronous anyways, for the cu= rrently async ones we=20 would have to implement busy waits. However, with this approach, we reach the WARN_ON [6] inside of an atomic c= ontext, indicating we might not do the right thing. Ad Failure Log (at bottom): I haven't added the ioctl syscalls, but this is basically the output with a= dditional prints to be able to follow the code execution path. A XRUN (buffer size is 480 but 960 available) leads to a SNDRV_PCM_TRIGGER_= STOP. This leads to terminate_async, starting the terminate_worker. Next, the XRUN recovery triggers SNDRV_PCM_TRIGGER_START, calling sdma_prep= _dma_cyclic and then waits for the DMA in wait_for_avail(). Next we see the two freeings, first the old, then the newly added one;=20 so the terminate_worker is back at work. Now the DMA is terminated, while we are still waiting on data from it. What do you think about it? Is any of the provided solutions practicable? If you need further information or additional logging, feel free to ask. Best regards Benjamin [1] https://elixir.bootlin.com/linux/latest/source/sound/core/pcm_dmaengine= .c#L209 [2] https://www.kernel.org/doc/html/latest/driver-api/dmaengine/client.html= #further-apis [3] https://elixir.bootlin.com/linux/latest/source/sound/core/pcm_dmaengine= .c#L189 [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/comm= it/?id=3Db8603d2a5795c42f78998e70dc792336e0dc20c9 [5] https://elixir.bootlin.com/linux/v5.8/source/sound/core/pcm_lib.c#L1875 [6] https://elixir.bootlin.com/linux/latest/source/kernel/dma/mapping.c#L30= 6 *Failure Log from latest 5.4 LTS kernel:* [ 535.201598] imx-sgtl5000 sound: snd_pcm_period_elapsed() [ 535.201610] imx-sgtl5000 sound: snd_pcm_period_elapsed: calls snd_pcm_up= date_hw_ptr0() [ 535.201626] imx-sdma 20ec000.sdma: sdma_tx_status channel: 2 [ 535.201640] snd_pcm_capture_avail: hw_ptr: 960, appl_ptr: 0, avail: 960,= boundary: 2013265920 [ 535.201655] imx-sgtl5000 sound: snd_dmaengine_pcm_trigger command: 0 [ 535.201664] imx-sdma 20ec000.sdma: sdma_disable_channel_async channel: 2 [ 535.201672] imx-sdma 20ec000.sdma: sdma_disable_channel channel: 2 [ 535.201752] imx-sgtl5000 sound: wait_for_avail: tout=3D999, state=3D4 [ 535.201760] imx-sdma 20ec000.sdma: sdma_channel_terminate_work channel: = 2 [ 535.201877] imx-sgtl5000 sound: snd_pcm_do_reset: ioctl SNDRV_PCM_IOCTL1= _RESET [ 535.201888] imx-sgtl5000 sound: snd_pcm_lib_ioctl_reset: calls snd_pcm_u= pdate_hw_ptr() [ 535.201912] imx-sgtl5000 sound: snd_dmaengine_pcm_trigger command: 1 [ 535.201922] imx-sdma 20ec000.sdma: sdma_prep_dma_cyclic channel: 2 [ 535.201931] imx-sdma 20ec000.sdma: sdma_config_write channel: 1 [ 535.201939] imx-sdma 20ec000.sdma: sdma_config_channel channel: 2 [ 535.201948] imx-sdma 20ec000.sdma: sdma_disable_channel channel: 2 [ 535.201959] imx-sdma 20ec000.sdma: sdma_load_context channel: 2 [ 535.201967] imx-sdma 20ec000.sdma: sdma_transfer_init channel: 2 [ 535.201983] imx-sdma 20ec000.sdma: sdma_load_context channel: 2 [ 535.201995] imx-sdma 20ec000.sdma: entry 0: count: 256 dma: 0x4a300000 = intr [ 535.202005] imx-sdma 20ec000.sdma: entry 1: count: 256 dma: 0x4a300100 = intr [ 535.202014] imx-sdma 20ec000.sdma: entry 2: count: 256 dma: 0x4a300200 = intr [ 535.202023] imx-sdma 20ec000.sdma: entry 3: count: 256 dma: 0x4a300300 = intr [ 535.202033] imx-sdma 20ec000.sdma: entry 4: count: 256 dma: 0x4a300400 = intr [ 535.202042] imx-sdma 20ec000.sdma: entry 5: count: 256 dma: 0x4a300500 = intr [ 535.202050] imx-sdma 20ec000.sdma: entry 6: count: 256 dma: 0x4a300600 = intr [ 535.202059] imx-sdma 20ec000.sdma: entry 7: count: 256 dma: 0x4a300700 = intr [ 535.202067] imx-sdma 20ec000.sdma: entry 8: count: 256 dma: 0x4a300800 = intr [ 535.202077] imx-sdma 20ec000.sdma: entry 9: count: 256 dma: 0x4a300900 = intr [ 535.202086] imx-sdma 20ec000.sdma: entry 10: count: 256 dma: 0x4a300a00 = intr [ 535.202094] imx-sdma 20ec000.sdma: entry 11: count: 256 dma: 0x4a300b00 = intr [ 535.202103] imx-sdma 20ec000.sdma: entry 12: count: 256 dma: 0x4a300c00 = intr [ 535.202111] imx-sdma 20ec000.sdma: entry 13: count: 256 dma: 0x4a300d00 = intr [ 535.202120] imx-sdma 20ec000.sdma: entry 14: count: 256 dma: 0x4a300e00 = wrap intr [ 535.202135] imx-sdma 20ec000.sdma: vchan 8aa58994: txd 0a262722[8]: subm= itted [ 535.202145] imx-sdma 20ec000.sdma: sdma_issue_pending channel: 2 [ 535.202181] snd_pcm_capture_avail: hw_ptr: 0, appl_ptr: 0, avail: 0, bou= ndary: 2013265920 [ 535.202192] snd_pcm_capture_avail: hw_ptr: 0, appl_ptr: 0, avail: 0, bou= ndary: 2013265920 [ 535.202202] imx-sgtl5000 sound: wait_for_avail: avail=3D0, state=3D3, tw= ake=3D64 [ 535.203182] imx-sdma 20ec000.sdma: txd 19499aa8: freeing [ 535.203207] imx-sdma 20ec000.sdma: txd 0a262722: freeing [ 545.766059] imx-sgtl5000 sound: wait_for_avail: tout=3D0, state=3D3 [ 545.766075] imx-sgtl5000 sound: capture write error (DMA or IRQ trouble?= )