Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754426AbdFNHpb (ORCPT ); Wed, 14 Jun 2017 03:45:31 -0400 Received: from mail-vk0-f67.google.com ([209.85.213.67]:34872 "EHLO mail-vk0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750828AbdFNHp3 (ORCPT ); Wed, 14 Jun 2017 03:45:29 -0400 MIME-Version: 1.0 In-Reply-To: References: <1497345926-3262-1-git-send-email-binoy.jayan@linaro.org> <20170613095618.GB29589@mail.corp.redhat.com> From: David Herrmann Date: Wed, 14 Jun 2017 09:45:27 +0200 Message-ID: Subject: Re: [PATCH v2] HID: Replace semaphore driver_lock with mutex To: Arnd Bergmann Cc: Binoy Jayan , Benjamin Tissoires , "open list:HID CORE LAYER" , Linux Kernel Mailing List , Rajendra , Mark Brown , Jiri Kosina , David Herrmann , Andrew de los Reyes Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2843 Lines: 58 Hey On Wed, Jun 14, 2017 at 9:20 AM, Arnd Bergmann wrote: > On Wed, Jun 14, 2017 at 7:22 AM, Binoy Jayan wrote: >> Hi, >> >> On 14 June 2017 at 01:55, Arnd Bergmann wrote: >> >>>> The mutex code clearly states mutex_trylock() must not be used in >>>> interrupt context (see kernel/locking/mutex.c), hence we used a >>>> semaphore here. Unless the mutex code is changed to allow this, we >>>> cannot switch away from semaphores. >>> >>> Right, that makes a lot of sense. I don't think changing the mutex >>> code is an option here, but I wonder if we can replace the semaphore >>> with something simpler anyway. >>> >>> From what I can tell, it currently does two things: >>> >>> 1. it acts as a simple flag to prevent hid_input_report from derefencing >>> the hid->driver pointer during initialization and exit. I think this could >>> be done equally well using a simple atomic set_bit()/test_bit() or similar. >>> >>> 2. it prevents the hid->driver pointer from becoming invalid while an >>> asynchronous hid_input_report() is in progress. This actually seems to >>> be a reference counting problem rather than a locking problem. >>> I don't immediately see how to better address it, or how exactly this >>> could go wrong in practice, but I would naively expect that either >>> hdev->driver->remove() needs to wait for the last user of hdev->driver >>> to complete, or we need kref_get/kref_put in hid_input_report() >>> to trigger the actual release function. The HID design is explained in detail in ./Documentation/hid/hid-transport.txt, in case you want some background information. The problem here is that the low-level transport driver has a lifetime that is independent of the hid device-driver. So the transport driver needs to be able to tell the HID layer about coming/going devices, as well as incoming traffic. At the same time, the HID layer can bind upper-layer hid device drivers *anytime* (since it is exposed via the driver core interfaces in /sys to bind drivers). The locking architecture is very similar to 's_active' on super-blocks, or 'active' on kernfs-nodes. However, the big difference here is that drivers can be rebound. Hence, we're not limited to just one driver lifetime, which is why we went with a semaphore instead. Also note that hid_input_report() might be called from interrupt context, hence it can never call into kref_put() or similar (unless we want to guarantee that unbinding can run in interrupt context). If we really want to get rid of the semaphore, a spinlock might do fine as well. Then again, some hid device drivers might expect their transport driver to *not* run in irq context, and thus break under a spinlock. So if you want to fix this, we need to audit the hid device drivers. David