From: Mike Snitzer Subject: Re: [PATCH 3.14 73/92] dm crypt: fix deadlock when async crypto algorithm returns -EBUSY Date: Mon, 4 May 2015 23:22:13 -0400 Message-ID: <20150505032213.GB4954@redhat.com> References: <20150502190109.683061482@linuxfoundation.org> <20150502190117.069180534@linuxfoundation.org> <20150504213254.GA27445@debian> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Greg Kroah-Hartman , Ben Collins , linux-kernel@vger.kernel.org, stable@vger.kernel.org, dm-devel@redhat.com, linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au To: Rabin Vincent Return-path: Content-Disposition: inline In-Reply-To: <20150504213254.GA27445@debian> Sender: stable-owner@vger.kernel.org List-Id: linux-crypto.vger.kernel.org On Mon, May 04 2015 at 5:32pm -0400, Rabin Vincent wrote: > On Sat, May 02, 2015 at 09:03:28PM +0200, Greg Kroah-Hartman wrote: > > 3.14-stable review patch. If anyone has any objections, please let me know. > > > > ------------------ > > > > From: Ben Collins > > > > commit 0618764cb25f6fa9fb31152995de42a8a0496475 upstream. > > The commit in question was recently merged to mainline and is now being > sent to various stable kernels. But I think the patch is wrong and the > real problem lies in the Freescale CAAM driver which is described as > having being tested with. > > commit 0618764cb25f6fa9fb31152995de42a8a0496475 > Author: Ben Collins > Date: Fri Apr 3 16:09:46 2015 +0000 > > dm crypt: fix deadlock when async crypto algorithm returns -EBUSY > > I suspect this doesn't show up for most anyone because software > algorithms typically don't have a sense of being too busy. However, > when working with the Freescale CAAM driver it will return -EBUSY on > occasion under heavy -- which resulted in dm-crypt deadlock. > > After checking the logic in some other drivers, the scheme for > crypt_convert() and it's callback, kcryptd_async_done(), were not > correctly laid out to properly handle -EBUSY or -EINPROGRESS. > > Fix this by using the completion for both -EBUSY and -EINPROGRESS. Now > crypt_convert()'s use of completion is comparable to > af_alg_wait_for_completion(). Similarly, kcryptd_async_done() follows > the pattern used in af_alg_complete(). > > Before this fix dm-crypt would lockup within 1-2 minutes running with > the CAAM driver. Fix was regression tested against software algorithms > on PPC32 and x86_64, and things seem perfectly happy there as well. > > Signed-off-by: Ben Collins > Signed-off-by: Mike Snitzer > Cc: stable@vger.kernel.org > > dm-crypt uses CRYPTO_TFM_REQ_MAY_BACKLOG. This means the the crypto > driver should internally backlog requests which arrive when the queue is > full and process them later. Until the crypto hw's queue becomes full, > the driver returns -EINPROGRESS. When the crypto hw's queue if full, > the driver returns -EBUSY, and if CRYPTO_TFM_REQ_MAY_BACKLOG is set, is > expected to backlog the request and process it when the hardware has > queue space. At the point when the driver takes the request from the > backlog and starts processing it, it calls the completion function with > a status of -EINPROGRESS. The completion function is called (for a > second time, in the case of backlogged requests) with a status/err of 0 > when a request is done. > > Crypto drivers for hardware without hardware queueing use the helpers, > crypto_init_queue(), crypto_enqueue_request(), crypto_dequeue_request() > and crypto_get_backlog() helpers to implement this behaviour correctly, > while others implement this behaviour without these helpers (ccp, for > example). > > dm-crypt (before this patch) uses this API correctly. It queues up as > many requests as the hw queues will allow (i.e. as long as it gets back > -EINPROGRESS from the request function). Then, when it sees at least > one backlogged request (gets -EBUSY), it waits till that backlogged > request is handled (completion gets called with -EINPROGRESS), and then > continues. The references to af_alg_wait_for_completion() and > af_alg_complete() in the commit message are irrelevant because those > functions only handle one request at a time, unlink dm-crypt. OK, I clearly missed the subtleties of the API. dm-crypt.c definitely needs more documentation related to CRYPTO_TFM_REQ_MAY_BACKLOG and the implications it has on completions, etc. Any chance you'd be willing to provide in-code comments in the appropriate places in dm-crypt.c (after having reverted this patch in question)? I'd be happy to take the combination of the revert and documentation in a single patch and get it to Linus for 4.0-rc3. > The problem is that the Freescale CAAM driver, which this problem is > described as being seen on, fails to implement the backlogging behaviour > correctly. In cam_jr_enqueue(), if the hardware queue is full, it > simply returns -EBUSY without backlogging the request. What the > observed deadlock was is not described in the commit message but it is > obviously the wait_for_completion() in crypto_convert() where dm-crypto > would wait for the completion being called with -EINPROGRESS in the case > of backlogged requests. This completion will never be completed due to > the bug in the CAAM driver. > > What this patch does is that it makes dm-crypt wait for every request, > even when the driver/hardware queues are not full, which means that > dm-crypt will never see -EBUSY. This means that this patch will cause a > performance regression on all crypto drivers which implement the API > correctly. > > So, I think this patch should be reverted in mainline and the stable > kernels where it has been merged, and correct backlog handling should be > implemented in the CAAM driver instead. Hopefully it hasn't been merged, Greg? If it has been merged to stable@ then we'll have to split the revert patch out from any documentation improvements.