Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3258902imu; Mon, 17 Dec 2018 16:36:49 -0800 (PST) X-Google-Smtp-Source: AFSGD/XvqPTKJtFlGOEeT2QxzMKzBlykj3Ge1jSw4e6kOD9PNLNp7lH2Qf8IedY+B/cGFEu2JrnU X-Received: by 2002:a63:1321:: with SMTP id i33mr1542059pgl.380.1545093409166; Mon, 17 Dec 2018 16:36:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545093409; cv=none; d=google.com; s=arc-20160816; b=qBQIv4kTuAUxngkUa2VhG1hOl2tgohNK04DeRcOeWbpDG0cFtUFcmGp3sLkd/6wMcx mEyL8j223N4fMxDNT9RtcoZ1cmVytlYsb8YlcloiDVZDG1TWJk2BMAXkx3hPMzEfcozn ir8jN2bMezmdeqL6JBpidXNiYaeY7Rz6JN/ibtG9L19O8naaQv5LetUfg3nG3ftvW3p7 2pdgyDxY3Npd7VXT6iuqg3juWuhwFe5rehrAuQL+Pj0hYfzq+DFt4txlPzGPoGEPP41b qx4esH3SBHFiUfvgD9iJcXKn8a5S4HW8k2hCNK/A8CgQyIuybZ4oFFWAYQI6owhIpaRl I+EA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=wwzev31AaH99pYdLxHHZctCyHMR/tBLknS1FKpmQgN0=; b=S4iHBgFjSFv0bzmvNsfecAIgTIbknMdnac2bSgqB1Sw9i5lIlmzuSgvlmI59ULARp6 XdRQd5q8EFyhb64Wumq1jh5nzGzFMZpnhDOX5wToddznLkpPozOKC/gj8bNficQdKZkM +7GdIxHOC/0R/fHtCzcIST3D4e9jI5Rt8GZ5VfQtK27OK+5tkV1T1kgkZ6xO47MHEcBM IBBvlicid3+CVnYxsH6uyRuyMnhSa+zNTuTZONgrBR/2iL6T81YETS2P1tXH+r9/BkZN 4Ocpu5fRHcDoyWegsL66l1zSgV9Diyf7m5OJQ8hU6xJeCUremBTuGW8GvLP1wBmoJtwA HcrQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=PQ5lUhNC; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q3si12561478plb.209.2018.12.17.16.36.34; Mon, 17 Dec 2018 16:36:49 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=PQ5lUhNC; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726417AbeLRAfA (ORCPT + 99 others); Mon, 17 Dec 2018 19:35:00 -0500 Received: from mail-vk1-f196.google.com ([209.85.221.196]:42911 "EHLO mail-vk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726378AbeLRAfA (ORCPT ); Mon, 17 Dec 2018 19:35:00 -0500 Received: by mail-vk1-f196.google.com with SMTP id y14so3287981vky.9; Mon, 17 Dec 2018 16:34:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=wwzev31AaH99pYdLxHHZctCyHMR/tBLknS1FKpmQgN0=; b=PQ5lUhNCoGXVSyP+LCzbn+T+hBl/yDIz4zJYfNzES+vU0DZgCc3HRHBlC2d/Bj+8vP SUJbC07lYEE+8Ej56rIXjRl1i/KwpGxFXw1/LhY48zY85wiwTC1RyA5usFqjND+DHiSW hSVXDyyn2At/OMpQBQwFYOeKenyr75YhLTMZ2k2uT71l1zGoDnbg7eSI1uO6Szt5awLM nl1bidsZX0jDcSnyxziyirqeYDfdeewIYmLSZoHKMCw7Pl9i9BZArYVt2WnB8TcKzYcD pZPerw2OscToHk141q3GkHeDqH0xgQU1VuBdKP3gS+vfIKun3snxFP/jQTcGQCKiK4pA bxYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wwzev31AaH99pYdLxHHZctCyHMR/tBLknS1FKpmQgN0=; b=MCInf1kHOLFwmxLfDgVPhJHYBU0B793N152Ve1r/q/t4OlcWdg+dJLEoePVPTrR6ZA YTHUutz1WxtYLBkOGdUo4VImgcG3Ua8wQEDTI6SEtyp6Ty//GnCTBHLDelTjEwsz+d74 ZspeVxbepm5WxZTA30F94jIZuGMbNGWnGQpgvhNOH8kAoXyqS4yC4pQ+71d6/uLm2t1w njVuRkiWmhb/USh+OzEsrYJ4HlhiAgwpqAVg1wfe3ii1EKIA1S/D487nysCDn3FJNo0P SlxmrDEhqCFlynYMPbyuwqXGmgzS3NnW1z3IX4K9AWlDqv4Yg24jR/pxHXEhFpnkucPc 1CYQ== X-Gm-Message-State: AA+aEWYEpeHJV/Y/rMmb7aDfqSx00BujOYipKZl91HpZJeUNbQJMn6Ua K+Ky2EdN5BjAOmCJJOutEJ2HInnEL0zN1l8HY027nBOh5Iw= X-Received: by 2002:a1f:8804:: with SMTP id k4mr6590628vkd.61.1545093298578; Mon, 17 Dec 2018 16:34:58 -0800 (PST) MIME-Version: 1.0 References: <20181217061036.24143-1-amworsley@gmail.com> In-Reply-To: From: Andrew Worsley Date: Tue, 18 Dec 2018 11:34:46 +1100 Message-ID: Subject: Re: [PATCH] Prevent race condition between USB authorisation and USB discovery events To: Alan Stern Cc: Greg Kroah-Hartman , Mathias Nyman , Nicolas Boichat , Jon Flatley , Kai-Heng Feng , Bin Liu , Benson Leung , "open list:USB SUBSYSTEM" , open list Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 18 Dec 2018 at 03:21, Alan Stern wrote: > > On Mon, 17 Dec 2018, Andrew Worsley wrote: > > > A sysfs driven USB authorisation change can trigger a usb_set_configuration > > while a hub_event worker thread is running. This can result in a USB device > > being disabled just after it was configured and bringing down all the > > devices and impacting hardware and user processes that were established on > > top of this these interfaces. In some cases the USB disable never completed > > and the whole system hung. > > Can you be more specific about this? Disabling a USB device shouldn't > cause these kinds of problems, regardless of whether or not the device > was just configured. I can't say too much about the hardware - but there was an SPI bus and an i2c bus built on top of USB devices on a bus. It appears the SPI bus shutdown was the item that was prone to problems when being torn down. I understand it would be better if the dependant systems shutdown cleanly, which it did about 75% of the time but then it was rather messy sequence. So the current system is far from ideal. The crux of the issue is the usb_disable_device() (line 1874 of drivers/usb/core/message.c) call in usb_set_configuration() is applied to a configured device and triggers the tear down of all the devices built on that USB device and is very disruptive. > > At my work I had an occasional hang due to this race condition. Roughly 1 > > in 50 boots had the race occurrence and 1 in 4 of those resulted in a hang. > > This patch fixed the problem and I had no problems (spurious disables > > or hangs) in 750+ boots. > > usb_authorize_device, usb_deauthorize_device, and hub_event all acquire > the device mutex. Why should adding another mutex make any difference? I believe the new usb_authorize_mutex prevents the cascade of USB device bus scans, discoveries and probes from colliding with those caused a change in USB bus authorisation. The individual device / hub locks are not across the whole system, only an individual component hub/device so any logic that gives an orderly USB device scanning and probing is undermined. The idea of the mutex is to keep the USB device discovery work when a device is added/removed/powered up is separated from those that are caused by authorisation changes. I don't pretend to understand all the logic and sequencing of the current code which works works faultlessly when these threads are serialised by the mutex (750+ times) in boot ups which build up the system and shutdowns which bring down the system. Likewise if I endlessly run the authorisation off / on by itself it is faultless over hundreds of iterations. > In fact there's an actual disadvantage: Making hub_event acquire a > global mutex will prevent us from handling multiple hubs concurrently. > Although we don't do this now, we might want to in the future. > > Alan Stern This is true - I am not trying to serialise the hub_event() threads - but to prevent the authorisation changes from happening during the hub_event threads. So perhaps there would be a way of allowing multiple concurrent hub_events from happening but only one authorisation thread at a time. Assuming multiple hub_event()s are deemed ok. Something like a lock which allows multiple readers but only a single exclusive writer lock? Andrew