Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp2533557imw; Wed, 6 Jul 2022 07:33:37 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tqwZm3iZG37qevVuduy2Ydwxst9IDxn3OAEwHCKmJpjl0R9w/EgG6xJR15eBuyj5G8C0eK X-Received: by 2002:a17:907:9689:b0:71e:56c1:838b with SMTP id hd9-20020a170907968900b0071e56c1838bmr39067795ejc.304.1657118017089; Wed, 06 Jul 2022 07:33:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657118017; cv=none; d=google.com; s=arc-20160816; b=mLyVce4flrgMbmbVeC9VPwWeTOB3iTnPTuZo4xWONmHLvSBF0bAz18zcw3hvFAfrnr Mc2/ZjJP44UYg01gRunn8t20WS6WIsKrcZacSuB2SWsfDco7JSb4rJ2WorG35SoOlfPr dkMjZvCJLRjZAhUHTDKH4l7Xob0G/R9aSwZI/IyeT2Iq1i/ugbyaCpFshoOL2to+Em5i laPl0GZtIMNf80p25B1MPWxWLdJnNQwg+AmAN1RsWDCgWkoYhlYPaR92FcsB0r8MlJ8p CHiF9hVrP3snyP6qZdGhHGC2YffkWZhSGF44T0/HzyVR22/wpQb+LSvMO29BDZSYCPzb gmoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=6/sZWf/Lvx1q3xK7nhT0Zw1w9/NgyKAMJBWjw0cFvjQ=; b=FDh0JI+YZmRWsNnCOY34YFVCurFT9In1WYV+6TJU7WG0xagMUTF8UGjWNZDTI9IlHA qiP575jrPYJT77+DkWVSsm6CJ8vZEfVgaXg9+dpkrNbgAJwKDDVzxJz7RW1InjzBe2Nz 3AulGl5tgUqAX/DiqgS1QYO8eAVblVPJaQvdocFEwqq8HEU/WhOwFqlF6UU0s7X97e/P T4Flp7d62dNwPdpSXTvnIuI1e8CC49GdYSxqeTE7hv/qebR5NGAQDpETKQ+3RbYE8jut mk769nJYv9fg+F96XE69FONhy2nNy6WbhVsLY+sGKbI/tcYatiB9OuUxa2IpEQgjIMlA mLSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fyQ4xFgN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hd14-20020a170907968e00b007269f867290si16538914ejc.1008.2022.07.06.07.33.09; Wed, 06 Jul 2022 07:33:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fyQ4xFgN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233277AbiGFOVv (ORCPT + 99 others); Wed, 6 Jul 2022 10:21:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46890 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232943AbiGFOVr (ORCPT ); Wed, 6 Jul 2022 10:21:47 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 060E523BFB for ; Wed, 6 Jul 2022 07:21:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657117305; x=1688653305; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=rANpPC1k8ntNoM9UbPaxTmHG5k84FYQtX4k3gZlotro=; b=fyQ4xFgN/axPASZ5hJrxCrXFwKTbzjCIj2nCAMGyRkJm1QS1frSUa/By kIKWIBKdsScHB7Ex9jGkHtSV2yzzI+Fnf7PjPPyWZKTPQ8gqxzsrQLraF 5UkDSc8n6Hfaz31/8F/6YZ8DntiYWDg91VjfD1g3sDwNVO+65MQKFnWs6 3qgjh+5mArSvO0kyJ4jkv0BqdMOpo+nWmTpGMxrmtxwbl4VBXZXm8tsZj 0XaybRIZvq7053AkLPj/jjseRbYU2+/XV9DZ3tAWBebOC6gxTqOetaLO4 ajctck1tsp89Hj/54DAlBbDt16NE1Wr92nXGAHQzYMhGVg8QQCAvKv6mU w==; X-IronPort-AV: E=McAfee;i="6400,9594,10399"; a="283783802" X-IronPort-AV: E=Sophos;i="5.92,250,1650956400"; d="scan'208";a="283783802" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jul 2022 07:21:45 -0700 X-IronPort-AV: E=Sophos;i="5.92,250,1650956400"; d="scan'208";a="650683204" Received: from xsang-optiplex-9020.sh.intel.com (HELO xsang-OptiPlex-9020) ([10.239.159.143]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jul 2022 07:21:42 -0700 Date: Wed, 6 Jul 2022 22:21:36 +0800 From: Oliver Sang To: Mel Gorman Cc: Andrew Morton , 0day robot , LKML , linux-mm@kvack.org, lkp@lists.01.org, Nicolas Saenz Julienne , Marcelo Tosatti , Vlastimil Babka , Michal Hocko , Hugh Dickins Subject: Re: [mm/page_alloc] 2bd8eec68f: BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c Message-ID: References: <20220613125622.18628-8-mgorman@techsingularity.net> <20220703132209.875b823d1cb7169a8d51d56d@linux-foundation.org> <20220706095535.GD27531@techsingularity.net> <20220706115328.GE27531@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220706115328.GE27531@techsingularity.net> X-Spam-Status: No, score=-6.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org hi, Mel Gorman, On Wed, Jul 06, 2022 at 12:53:29PM +0100, Mel Gorman wrote: > On Wed, Jul 06, 2022 at 10:55:35AM +0100, Mel Gorman wrote: > > On Tue, Jul 05, 2022 at 09:51:25PM +0800, Oliver Sang wrote: > > > Hi Andrew Morton, > > > > > > On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote: > > > > On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot wrote: > > > > > > > > > FYI, we noticed the following commit (built with gcc-11): > > > > > > > > > > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock") > > > > > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139 > > > > > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 > > > > > patch link: https://lore.kernel.org/lkml/20220613125622.18628-8-mgorman@techsingularity.net > > > > > > > > > > > > > Did this test include the followup patch > > > > mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch? > > > > > > no, we just fetched original patch set and test upon it. > > > > > > now we applied the patch you pointed to us upon 2bd8eec68f and found the issue > > > still exist. > > > (attached dmesg FYI) > > > > > > > Thanks Oliver. > > > > The trace is odd in that it hits in GUP when the page allocator is no > > longer active and the context is a syscall. First, is this definitely > > the first patch the problem occurs? > > > > I tried reproducing this on a 2-socket machine with Xeon > Gold Gold 5218R CPUs. It was necessary to set timeouts in both > vm/settings and kselftest/runner.sh to avoid timeouts. Testing with > a standard config on my original 5.19-rc3 baseline and the baseline > b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 both passed. I tried your kernel > config with i915 disabled (would not build) and necessary storage drivers > and network drivers enabled (for boot and access). The kernel log shows > a bunch of warnings related to USBAN during boot and during some of the > tests but otherwise compaction_test completed successfully as well as > the other VM tests. > > Is this always reproducible? not always but high rate. we actually also observed other dmesgs stats for both 2bd8eec68f74 and its parent, but those dmesg.BUG:sleeping_function_called_from_invalid_context_at* seem only happen on 2bd8eec68f74 as well as the '-fix' commit. ========================================================================================= compiler/group/kconfig/rootfs/sc_nr_hugepages/tbox_group/testcase/ucode: gcc-11/vm/x86_64-rhel-8.3-kselftests/debian-11.1-x86_64-20220510.cgz/2/lkp-csl-2sp9/kernel-selftests/0x500320a commit: eec0ff5df294 ("mm/page_alloc: Remotely drain per-cpu lists") 2bd8eec68f74 ("mm/page_alloc: Replace local_lock with normal spinlock") 292baeb4c714 ("mm/page_alloc: replace local_lock with normal spinlock -fix") eec0ff5df2945d19 2bd8eec68f740608db5ea58ecff 292baeb4c7149ac2cb844137481 ---------------- --------------------------- --------------------------- fail:runs %reproduction fail:runs %reproduction fail:runs | | | | | :20 75% 15:20 70% 14:21 dmesg.BUG:scheduling_while_atomic :20 5% 1:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_fs/binfmt_elf.c :20 5% 1:20 10% 2:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_fs/dcache.c :20 5% 1:20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/freezer.h :20 10% 2:20 25% 5:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/mmu_notifier.h :20 5% 1:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/percpu-rwsem.h :20 40% 8:20 40% 8:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h :20 10% 2:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c :20 10% 2:20 10% 2:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_lib/strncpy_from_user.c :20 55% 11:20 65% 13:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c :20 15% 3:20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/memory.c :20 60% 12:20 55% 11:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/migrate.c :20 5% 1:20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/page_alloc.c :20 0% :20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/rmap.c :20 15% 3:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/vmalloc.c :20 45% 9:20 45% 9:21 dmesg.BUG:workqueue_leaked_lock_or_atomic :20 25% 5:20 15% 3:21 dmesg.Kernel_panic-not_syncing:Attempted_to_kill_init!exitcode= :20 5% 1:20 0% :21 dmesg.RIP:__clear_user 20:20 0% 20:20 5% 21:21 dmesg.RIP:rcu_eqs_exit 20:20 0% 20:20 5% 21:21 dmesg.RIP:sched_clock_tick :20 5% 1:20 0% :21 dmesg.RIP:smp_call_function_many_cond 20:20 0% 20:20 5% 21:21 dmesg.WARNING:at_kernel/rcu/tree.c:#rcu_eqs_exit 20:20 0% 20:20 5% 21:21 dmesg.WARNING:at_kernel/sched/clock.c:#sched_clock_tick :20 5% 1:20 0% :21 dmesg.WARNING:at_kernel/smp.c:#smp_call_function_many_cond 20:20 0% 20:20 5% 21:21 dmesg.WARNING:suspicious_RCU_usage 20:20 0% 20:20 5% 21:21 dmesg.boot_failures 9:20 -15% 6:20 -5% 8:21 dmesg.include/linux/rcupdate.h:#rcu_read_lock()used_illegally_while_idle 9:20 -15% 6:20 -5% 8:21 dmesg.include/linux/rcupdate.h:#rcu_read_unlock()used_illegally_while_idle 20:20 0% 20:20 5% 21:21 dmesg.include/trace/events/error_report.h:#suspicious_rcu_dereference_check()usage 20:20 0% 20:20 5% 21:21 dmesg.include/trace/events/lock.h:#suspicious_rcu_dereference_check()usage > > -- > Mel Gorman > SUSE Labs