Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp3107409pxa; Tue, 18 Aug 2020 06:52:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzvyAK6npCclU1uUlYU+wmWl8hgpBLlaVI739kzc2PDiOFUs8UpZJ9Qx7D7QP6pYLRmG9uF X-Received: by 2002:a17:906:f8d5:: with SMTP id lh21mr21006198ejb.360.1597758748803; Tue, 18 Aug 2020 06:52:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597758748; cv=none; d=google.com; s=arc-20160816; b=ncTqM1ZNYqcL7NACdfwcLwppprXVY1D1M+8FgF6l+b+e+FG4DbRHulgDraBXNH57sT OLUlL7AwwMMw0yJu5quOFrKDrFrLdMzgCE2l03V3lX4VbZ1ugl+T5QniIcIg7beoGIxb +ZXHek2KXQx48NxQ56OzV+VaeUFpxutmvghy3sufKniyP1opD7WzVYgjr1YFa5oiaPdO RJQqhnlDecX6iIJEcYRbHxOGp7iQux5zx5xtPmRNac1pcuqYXvJ3p9b0uVyHrrqpk38Q tD90nEnc958URmBSaf6NjYIO3TNXo62i1UrrP/SdPewSIrurA49vGEbJRLgSPhXgXOHm LJRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=mH0XoKSCap4k5m7ibFsvFqyDS2A2zoxuVNWfknOLQzA=; b=lbI/jGPR6mDsggia6CVzIcL5BTgtUFi00a/FIrkObgsqgxHDBqfuv441uGGcuY74n2 S+qOlS9425G5KgYkGXNOzGZHbmzWs46Hx3bvpOtNnaMtLbYZcr48HDXIOPDnrQ5c+VFH xQ4ALW78agQ4FBexRP1b2SzPlRfwRut372HlPb4QKKQk1RwMGkz0Tw/zdoF2HJ4QYqPx w9nPfuFwNWXkcWSPJrPfkkbA0BLfoUJcSgPr2yXRQZ9LCa5GUEl/BIj7TVcGW8DJhgrO G4jnEI8SZ8S2s106RdqVuA1zR/ZH5nIfOLAJ6laEXgxbRHboql5VSSg3ft0GmUX5Xju6 +v0Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=bbqRxEer; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m4si13577349edr.500.2020.08.18.06.52.04; Tue, 18 Aug 2020 06:52:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=bbqRxEer; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726820AbgHRNu0 (ORCPT + 99 others); Tue, 18 Aug 2020 09:50:26 -0400 Received: from mail.kernel.org ([198.145.29.99]:53688 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726482AbgHRNuX (ORCPT ); Tue, 18 Aug 2020 09:50:23 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C63D4206DA; Tue, 18 Aug 2020 13:50:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1597758622; bh=omiWsQfnzSBaf81C+C5v56nWimMqUyxwiQky0T6QpcA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=bbqRxEerJGafXIATjU+Q/2wDTKtKmbe6ixV3RQrJVo6WtjJNhgykwBkVj9bM6o/MZ 1UojFmy5fWQvESZVLPaeZhTCj2sq5KMNAtNvQK9A0rGVCmpVaYzKJZ9liMEIzFB1Jh fsHgRoNgWVorHmarCEMu0aFtYF+Zn28sMyOoxV5Q= Date: Tue, 18 Aug 2020 15:50:45 +0200 From: Greg KH To: Hugh Dickins Cc: Linus Torvalds , Oleg Nesterov , Michal Hocko , Linux-MM , LKML , Andrew Morton , Tim Chen , Michal Hocko Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page Message-ID: <20200818135045.GA495837@kroah.com> References: <20200724152424.GC17209@redhat.com> <20200725101445.GB3870@redhat.com> <20200727193512.GA236164@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 05, 2020 at 10:46:12PM -0700, Hugh Dickins wrote: > On Mon, 27 Jul 2020, Greg KH wrote: > > > > Linus just pointed me at this thread. > > > > If you could run: > > echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control > > and run the same workload to see if anything shows up in the log when > > xhci crashes, that would be great. > > Thanks, I tried that, and indeed it did have a story to tell: > > ep 0x81 - asked for 16 bytes, 10 bytes untransferred > ep 0x81 - asked for 16 bytes, 10 bytes untransferred > ep 0x81 - asked for 16 bytes, 10 bytes untransferred > a very large number of lines like the above, then > Cancel URB 00000000d81602f7, dev 4, ep 0x0, starting at offset 0xfffd42c0 > // Ding dong! > ep 0x81 - asked for 16 bytes, 10 bytes untransferred > Stopped on No-op or Link TRB for slot 1 ep 0 > xhci_drop_endpoint called for udev 000000005bc07fa6 > drop ep 0x81, slot id 1, new drop flags = 0x8, new add flags = 0x0 > add ep 0x81, slot id 1, new drop flags = 0x8, new add flags = 0x8 > xhci_check_bandwidth called for udev 000000005bc07fa6 > // Ding dong! > Successful Endpoint Configure command > Cancel URB 000000006b77d490, dev 4, ep 0x81, starting at offset 0x0 > // Ding dong! > Stopped on No-op or Link TRB for slot 1 ep 2 > Removing canceled TD starting at 0x0 (dma). > list_del corruption: prev(ffff8fdb4de7a130)->next should be ffff8fdb41697f88, > but is 6b6b6b6b6b6b6b6b; next(ffff8fdb4de7a130)->prev is 6b6b6b6b6b6b6b6b. > ------------[ cut here ]------------ > kernel BUG at lib/list_debug.c:53! > RIP: 0010:__list_del_entry_valid+0x8e/0xb0 > Call Trace: > > handle_cmd_completion+0x7d4/0x14f0 [xhci_hcd] > xhci_irq+0x242/0x1ea0 [xhci_hcd] > xhci_msi_irq+0x11/0x20 [xhci_hcd] > __handle_irq_event_percpu+0x48/0x2c0 > handle_irq_event_percpu+0x32/0x80 > handle_irq_event+0x4a/0x80 > handle_edge_irq+0xd8/0x1b0 > handle_irq+0x2b/0x50 > do_IRQ+0xb6/0x1c0 > common_interrupt+0x90/0x90 > > > Info provided for your interest, not expecting any response. > The list_del info in there is non-standard, from a patch of mine: > I find hashed addresses in debug output less than helpful. Thanks for this, that is really odd. > > > > Although if you are using an "older version" of the driver, there's not > > much I can suggest except update to a newer one :) > > Yes, I was reluctant to post any info, since really the ball is at our > end of the court, not yours. I did have a go at bringing in the latest > xhci driver instead, but quickly saw that was not a sensible task for > me. And I did scan the git log of xhci changes (especially xhci-ring.c > changes): thought I saw a likely relevant and easily applied fix commit, > but in fact it made no difference here. > > I suspect it's in part a hardware problem, but driver not recovering > correctly. I've replaced the machine (but also noticed that the same > crash has occasionally been seen on other machines). I'm sure it has > no relevance to this unlock_page() thread, though it's quite possible > that it's triggered under stress, and Linus's changes allowed greater > stress. I will be willing to blame hardware problems for this as well, but will save this report in case something else shows up in the future, thanks! greg k-h