Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp4084063ybg; Fri, 25 Oct 2019 13:05:20 -0700 (PDT) X-Google-Smtp-Source: APXvYqzJZjQwDIasZt31L6KRVVXGvuHTLeIwp+le4pyFAPE3f9iczsrk+HoS7CHqxp6sKh2a+2E8 X-Received: by 2002:a50:9eac:: with SMTP id a41mr5983376edf.237.1572033920186; Fri, 25 Oct 2019 13:05:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572033920; cv=none; d=google.com; s=arc-20160816; b=dJKmVkDwTcdePsA+/avoDWOSBByN59PKUbMXhoADABfU4rg9TMLY3Jdr+J/9XORzeJ /DkRfM5Uluk8V4W3yATqgO56KX+SvJNHSZfoUDJ4KnxuGAWYripbOhe8Jn3DsL0fR3Il Y7+ds/ajeMvtWqUWykjIyJJuwJVdk79bX+p4Cf1c7I/BXXvCaex2uHCaTAbydI+Z7pLS 9KS1RFxQBx4Pt+MtgSa8yN9KNOYwwOfgO9KSZKbngs3us2i1JXsDEUbe594KnsE+wJu/ Bk7cAZUGXDai1jIKbBlkGKGMkQb2irdh6cgoAuXeCy0yRBsGLI1DzyMWepWbT4/uYHgP JHMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=F0Tuu2Z8zfyOm0DJkjca5AwDw9TvlB5GtUpXfxoP7rU=; b=v4q8FBKBkPkzRPBeotExtobg3Rh3Kjbvwo9ub8+MnkdGtJFHh6YwvuiuOXKzM7XTLx hppBN8dxVDTW+CZGxSEuqFW7tfIJOmJFU0Jw87dzAhLtSrQ/pe3mtTmMwwl1b1xdGDF/ jl28nmJ3jKKFEtHmtbjw5YOWL2kNE6Pffh+98oXDThXQ5x9/tVM9J1SrpsRayj3dkm47 NvqFY47PgIzDCbv71euVWwEbCwXHd8tbKClvI4XKEjmxFCDGkToX67AgPnJBKOvEjW9/ ccOVzkKUIDUpFlSMW3b090GMq736Py+L7rG3c5n7z9xdA3EMbPHSpeLxY7J95nAp/2lL n5KQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p20si1537327eju.109.2019.10.25.13.04.51; Fri, 25 Oct 2019 13:05:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2504176AbfJYMFJ (ORCPT + 99 others); Fri, 25 Oct 2019 08:05:09 -0400 Received: from mx2.suse.de ([195.135.220.15]:53076 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2504024AbfJYMFI (ORCPT ); Fri, 25 Oct 2019 08:05:08 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 56049B25F; Fri, 25 Oct 2019 12:05:06 +0000 (UTC) Date: Fri, 25 Oct 2019 14:05:05 +0200 From: Michal Hocko To: snazy@snazy.de Cc: Randy Dunlap , linux-kernel@vger.kernel.org, Linux MM , Andrew Morton , "Potyra, Stefan" Subject: Re: mlockall(MCL_CURRENT) blocking infinitely Message-ID: <20191025120505.GG17610@dhcp22.suse.cz> References: <4576b336-66e6-e2bb-cd6a-51300ed74ab8@snazy.de> <20191025092143.GE658@dhcp22.suse.cz> <70393308155182714dcb7485fdd6025c1fa59421.camel@gmx.de> <20191025114633.GE17610@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 25-10-19 13:55:13, Robert Stupp wrote: > On Fri, 2019-10-25 at 13:46 +0200, Michal Hocko wrote: > > On Fri 25-10-19 13:02:23, Robert Stupp wrote: > > > On Fri, 2019-10-25 at 11:21 +0200, Michal Hocko wrote: > > > > On Thu 24-10-19 16:34:46, Randy Dunlap wrote: > > > > > [adding linux-mm + people] > > > > > > > > > > On 10/24/19 12:36 AM, Robert Stupp wrote: > > > > > > Hi guys, > > > > > > > > > > > > I've got an issue with `mlockall(MCL_CURRENT)` after > > > > > > upgrading > > > > > > Ubuntu 19.04 to 19.10 - i.e. kernel version change from 5.0.x > > > > > > to > > > > > > 5.3.x. > > > > > > > > > > > > The following simple program hangs forever with one CPU > > > > > > running > > > > > > at 100% (kernel): > > > > > > > > Can you capture everal snapshots of proc/$(pidof $YOURTASK)/stack > > > > while > > > > this is happening? > > > > > > Sure, > > > > > > Approach: > > > - one shell running > > > while true; do cat /proc/$(pidof test)/stack; done > > > - starting ./test in another shell + ctrl-c quite some times > > > > > > Vast majority of all ./test invocations return an empty 'stack' > > > file. > > > Some tries, maybe 1 out of 20, returned these snapshots. > > > Was running 5.3.7 for this test. > > > > > > > > > [<0>] __handle_mm_fault+0x4c5/0x7a0 > > > [<0>] handle_mm_fault+0xca/0x1f0 > > > [<0>] __get_user_pages+0x230/0x770 > > > [<0>] populate_vma_page_range+0x74/0x80 > > > [<0>] __mm_populate+0xb1/0x150 > > > [<0>] __x64_sys_mlockall+0x11c/0x190 > > > [<0>] do_syscall_64+0x5a/0x130 > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > [<0>] __handle_mm_fault+0x4c5/0x7a0 > > > [<0>] handle_mm_fault+0xca/0x1f0 > > > [<0>] __get_user_pages+0x230/0x770 > > > [<0>] populate_vma_page_range+0x74/0x80 > > > [<0>] __mm_populate+0xb1/0x150 > > > [<0>] __x64_sys_mlockall+0x11c/0x190 > > > [<0>] do_syscall_64+0x5a/0x130 > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > > > > > [<0>] __handle_mm_fault+0x4c5/0x7a0 > > > [<0>] handle_mm_fault+0xca/0x1f0 > > > [<0>] __get_user_pages+0x230/0x770 > > > [<0>] populate_vma_page_range+0x74/0x80 > > > [<0>] __mm_populate+0xb1/0x150 > > > [<0>] __x64_sys_mlockall+0x11c/0x190 > > > [<0>] do_syscall_64+0x5a/0x130 > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > > > > > [<0>] __do_fault+0x3c/0x130 > > > [<0>] do_fault+0x248/0x640 > > > [<0>] __handle_mm_fault+0x4c5/0x7a0 > > > [<0>] handle_mm_fault+0xca/0x1f0 > > > [<0>] __get_user_pages+0x230/0x770 > > > [<0>] populate_vma_page_range+0x74/0x80 > > > [<0>] __mm_populate+0xb1/0x150 > > > [<0>] __x64_sys_mlockall+0x11c/0x190 > > > [<0>] do_syscall_64+0x5a/0x130 > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > > > This is expected. > > > > > // doubt this one is relevant > > > [<0>] __wake_up_common_lock+0x7c/0xc0 > > > [<0>] __wake_up_sync_key+0x1e/0x30 > > > [<0>] __wake_up_parent+0x26/0x30 > > > [<0>] do_notify_parent+0x1cc/0x280 > > > [<0>] do_exit+0x703/0xaf0 > > > [<0>] do_group_exit+0x47/0xb0 > > > [<0>] get_signal+0x165/0x880 > > > [<0>] do_signal+0x34/0x280 > > > [<0>] exit_to_usermode_loop+0xbf/0x160 > > > [<0>] do_syscall_64+0x10f/0x130 > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > Hmm, this means that the task has exited so how come there are > > other syscalls happening. Are you sure you are collecting stacks for > > the > > correct task? > > I guess that the `cat /proc/$(pidof test)/stack` captured the stack > after I hit ctrl-c. Does that make sense? > > Also tried `syscall(SYS_mlockall, MCL_CURRENT);` instead of > `mlockall(MCL_CURRENT)` - same behavior. This smells like something that could be runtime specific. Could you post strace output of your testcase? -- Michal Hocko SUSE Labs