Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964998AbcLTTOU (ORCPT ); Tue, 20 Dec 2016 14:14:20 -0500 Received: from mail-yb0-f181.google.com ([209.85.213.181]:33143 "EHLO mail-yb0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S938709AbcLTTNy (ORCPT ); Tue, 20 Dec 2016 14:13:54 -0500 Date: Tue, 20 Dec 2016 14:13:52 -0500 From: Justin Bronder To: Mark Greer Cc: Geoff Lansberry , linux-wireless@vger.kernel.org, lauro.venancio@openbossa.org, aloisio.almeida@openbossa.org, sameo@linux.intel.com, robh+dt@kernel.org, mark.rutland@arm.com, netdev@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, Jaret Cantu Subject: Re: nfc: trf7970a: Prevent repeated polling from crashing the kernel Message-ID: <20161220191352.GB23496@lasswell.members.linode.com> References: <1482250592-4268-1-git-send-email-glansberry@gmail.com> <1482250592-4268-3-git-send-email-glansberry@gmail.com> <20161220185905.GA5867@animalcreek.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161220185905.GA5867@animalcreek.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1979 Lines: 54 On 20/12/16 11:59 -0700, Mark Greer wrote: > On Tue, Dec 20, 2016 at 11:16:32AM -0500, Geoff Lansberry wrote: > > From: Jaret Cantu > > > > Repeated polling attempts cause a NULL dereference error to occur. > > This is because the state of the trf7970a is currently reading but > > another request has been made to send a command before it has finished. > > How is this happening? Was trf7970a_abort_cmd() called and it didn't > work right? Was it not called at all and there is a bug in the digital > layer? More details please. > > > The solution is to properly kill the waiting reading (workqueue) > > before failing on the send. > > If the bug is in the calling code, then that is what should get fixed. > This seems to be a hack to work-around a digital layer bug. One of our uses of NFC is to begin polling to read a tag and then stop polling (in order to save power) until we know via user interaction that we need to poll again. This is typically many minutes later so the power saving is pretty significant. However, it's possible that a user will remove the tag before reading has completed. We also detect this case and stop polling. I can go more into this if necessary but that is what exposed a panic. You can reproduce using neard and python, in our testing it was very likely to occur in 10-100 iterations of the following.: #!/usr/bin/python import time import dbus bus = dbus.SystemBus() nfc0 = bus.get_object('org.neard', '/org/neard/nfc0') props = dbus.Interface(nfc0, 'org.freedesktop.DBus.Properties') try: props.Set('org.neard.Adapter', 'Powered', dbus.Boolean(1)) except: pass adapter = dbus.Interface(nfc0, 'org.neard.Adapter') for i in range(1000): adapter.StartPollLoop('Initiator') time.sleep(0.1) adapter.StopPollLoop() print(i) I believe the last time we tested this was around the 4.1 release. -- Justin Bronder