Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1027115yba; Fri, 26 Apr 2019 12:47:11 -0700 (PDT) X-Google-Smtp-Source: APXvYqyZ6K9Hmyw/JkIOIwR/BVlR4uH+MMJohfvclMdxmrQWQorqx74WQPcPGR1zgcOqmQF7sBY7 X-Received: by 2002:a63:cc47:: with SMTP id q7mr19494000pgi.171.1556308031815; Fri, 26 Apr 2019 12:47:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556308031; cv=none; d=google.com; s=arc-20160816; b=eJ6nUsgOchtXAdf+pVm74eDHXmtOso+HVWKiYagOBb/L5TuSVLnWt8UnsxfrIEgddF RgUgk7yL773oJoC8EHTPyO3bncPT63qS4XbYJzyk/XOlr9Rs9a75reChGzyCV7gPdSMv kg3ulKW+0xWqUXPVmx5Rilv28be7/yRZxy7c0a0jskoD/nRtnvwx0FbgdEbNDBapHre8 XDy8WWCADRt/b3g9b6+M1qBXp4Us4BJ3DRHMOykOfV15fsg0d9xccWvMgIdX/5+MtKpx +VNoNP1r4v6AB8k7KdKyNVFQuKF8BTS5yLgLyhfBmmNuf7YwDF8x6+rzrKjj9SLj0+EG mgnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:references:cc:to:from:subject; bh=UCSb0ymVPA+H4aJkPxvW+oCUlWp8t6MxeYjR02NmstI=; b=IHQc1FFphIepdgao+6ra4Rxltn0YjCoIcFxjR+JgY3lBdfNAExPIZbykK9uvmY3Ocf KnBUnGRpDDHgN0jilGxAsOEAGK6ervYYAvlZSvnejFzS1HcfNTNuwiirQF+2lgvPeAH0 oX5MXQTb0Q6F56IOH7Lp8RjzBJknpkfIguvc5sKkELie5Ll/+b/+I0VppsSJf01l9QCF 53jCpYD2NLv/KV25Z9n/sS5DRqqPn5sPdITqCu4G9bH4f+r6U6s8Id+kzXfk/3UDDFYz gcUi0z1Ho9qW2+LpZkf+MdNGx0Ighd8aiVdzXGv0wRcnLBup85vFxXKtoW/Z+Ei81Yh/ DEtw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g70si12553414pgc.87.2019.04.26.12.46.54; Fri, 26 Apr 2019 12:47:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726584AbfDZTp6 (ORCPT + 99 others); Fri, 26 Apr 2019 15:45:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58694 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726310AbfDZTp6 (ORCPT ); Fri, 26 Apr 2019 15:45:58 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id F1D80772CC; Fri, 26 Apr 2019 19:45:56 +0000 (UTC) Received: from prarit.bos.redhat.com (prarit-guest.khw1.lab.eng.bos.redhat.com [10.16.200.63]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4A0AF60FD9; Fri, 26 Apr 2019 19:45:56 +0000 (UTC) Subject: Re: [-next] system hangs likely due to "modules: Only return -EEXIST for modules that have finished loading" From: Prarit Bhargava To: Jessica Yu , Heiko Carstens Cc: linux-next@vger.kernel.org, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Cathy Avery References: <20190426130736.GB8646@osiris> <20190426150741.GD8646@osiris> <20190426160956.GA3827@linux-8ccs> <2e047a7e-bf08-be8c-bdd0-429464fa133d@redhat.com> Message-ID: <52c293e9-ddfa-426a-a8f1-2106e250e78d@redhat.com> Date: Fri, 26 Apr 2019 15:45:55 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <2e047a7e-bf08-be8c-bdd0-429464fa133d@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Fri, 26 Apr 2019 19:45:57 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/26/19 2:10 PM, Prarit Bhargava wrote: > > > On 4/26/19 12:09 PM, Jessica Yu wrote: >> +++ Heiko Carstens [26/04/19 17:07 +0200]: >>> On Fri, Apr 26, 2019 at 09:22:34AM -0400, Prarit Bhargava wrote: >>>> On 4/26/19 9:07 AM, Heiko Carstens wrote: >>>>> Hello Prarit, >>>>> >>>>> it looks like your commit f9a75c1d717f ("modules: Only return -EEXIST >>>>> for modules that have finished loading") _sometimes_ causes hangs on >>>>> s390. This is unfortunately not 100% reproducible, however the >>>>> mentioned commit seems to be the only relevant one in modules.c. >>>>> >>>>> What I see is a hanging system with messages like this on the console: >>>>> >>>>> [ 65.876040] rcu: INFO: rcu_sched self-detected stall on CPU >>>>> [ 65.876049] rcu: 7-....: (5999 ticks this GP) >>>> idle=eae/1/0x4000000000000002 softirq=1181/1181 fqs=2729 >>>>> [ 65.876078] (t=6000 jiffies g=-471 q=17196) >>>>> [ 65.876084] Task dump for CPU 7: >>>>> [ 65.876088] systemd-udevd R running task 0 731 721 >>>> 0x06000004 >>>>> [ 65.876097] Call Trace: >>>>> [ 65.876113] ([<0000000000abb264>] __schedule+0x2e4/0x6e0) >>>>> [ 65.876122] [<00000000001ee486>] finished_loading+0x4e/0xb0 >>>>> [ 65.876128] [<00000000001f1ed6>] load_module+0xcce/0x27a0 >>>>> [ 65.876134] [<00000000001f3af0>] __s390x_sys_init_module+0x148/0x178 >>>>> [ 65.876142] [<0000000000ac0766>] system_call+0x2aa/0x2c8 >>>>> I did not look any further into the dump, however since the commit >>>>> touches exactly the code path which seems to be looping... ;) >>>>> >>>> >>>> Ouch :( I wonder if I exposed a further race or another bug. Heiko, can you >>>> determine which module is stuck? Warning: I have not compiled this code. >>> >>> Here we go: >>> >>> [ 11.716866] PRARIT: waiting for module s390_trng to load. >>> [ 11.716867] PRARIT: waiting for module s390_trng to load. >>> [ 11.716868] PRARIT: waiting for module s390_trng to load. >>> [ 11.716870] PRARIT: waiting for module s390_trng to load. >>> [ 11.716871] PRARIT: waiting for module s390_trng to load. >>> [ 11.716872] PRARIT: waiting for module s390_trng to load. >>> [ 11.716874] PRARIT: waiting for module s390_trng to load. >>> [ 11.716875] PRARIT: waiting for module s390_trng to load. >>> [ 11.716876] PRARIT: waiting for module s390_trng to load. >>> [ 16.726850] add_unformed_module: 31403529 callbacks suppressed >>> [ 16.726853] PRARIT: waiting for module s390_trng to load. >>> [ 16.726862] PRARIT: waiting for module s390_trng to load. >>> [ 16.726865] PRARIT: waiting for module s390_trng to load. >>> [ 16.726867] PRARIT: waiting for module s390_trng to load. >>> [ 16.726869] PRARIT: waiting for module s390_trng to load. >>> >>> If I'm not mistaken then there was _no_ corresponding message on the >>> console stating that the module already exists. > > Heiko, > > Where is the s390_trng module? I can't seem to find it in the tree. > > [02:06 PM root@ibm-z-108 linux-next]# find ./ -name *s390_trng* > [02:06 PM root@ibm-z-108 linux-next]# git grep s390_trng > [02:07 PM root@ibm-z-108 linux-next]# > Never mind, I found it: s390-trng.c P. > P. > >> >> Hm, my current theory is that we have a module whose exit() function >> is taking a while to run to completion. While it is doing so, the >> module's state is already set to MODULE_STATE_GOING. >> >> With Prarit's patch, since this module is probably still in GOING, >> add_unformed_module() will wait until the module is finally gone. If >> this takes too long, we will keep trying to add ourselves to the >> module list and hence stay in the loop in add_unformed_module(). >> According to Documentation/RCU/stallwarn.txt, this looping in the >> kernel may trigger an rcu stall warning (see bullet point stating "a >> CPU looping anywhere in the kernel without invoking schedule()". >> >> Heiko, could you modify the patch to print the module's state to >> confirm? >> >> Thanks, >> >> Jessica