Received: by 10.213.65.68 with SMTP id h4csp1238673imn; Wed, 21 Mar 2018 06:09:59 -0700 (PDT) X-Google-Smtp-Source: AG47ELuAEN25V/5hA516RDJHnjShpG3WJDv3jx2uHAFDgptCWGcNTSyRUyS6vwHjgKEXiN3XMIdv X-Received: by 10.167.130.10 with SMTP id k10mr16892996pfi.11.1521637799919; Wed, 21 Mar 2018 06:09:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521637799; cv=none; d=google.com; s=arc-20160816; b=O63KD/5CFLEugAIN0vHtMy5zU6vHGJLrDbngdOoH+whWgpyjBN4NL8yqIHFUTufKo+ g8mBebzoBq1fXr5DqRLZzrIYL/X86QrC0447iybIDBbk3IrWhsYaX9pVnrxCKy68+Hv9 xe6MoOPR0bvm4+OaLagovPblnzpNPwK3hIevbbiZdTMCpn9ijPZVWj2MhOLXjbumftgy OaRrp8osW5/AqCvn0p7nlm0eNnGHDy/XmMGtk2dnMJFTOnL5wzkDKLVEM7kaPi9p4sDO pk3bh0+85P1LzyWV+2sGZmC4DKktxEGan/j4Yj7mecW1tc8gfXgzIjQKUafAd/crUb2k +k+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=q2K48rIjvHLJtj0eFCv2gqZJBdL5YkB2eB5bm4J7Z6s=; b=R2zEwwItQOA9ZG/uoj6YOQOKMSeGBFvsjqTh8SoPUS5EwAzGLEKzchY8w2yOG7R3Ss 6oeTUTL9HOxCprqJB/cMQjLg2onBbhJGzYW8epIQBabGwlBBhoboCETHAzQWADJCxALn gUgFOLBHcOSalxcWHR+IAI0gtpB9OcKyxBfXOAWHsjSCekaaiKpCNhy0JcNaNcx1ZH2u gnRqKICoAthN7wy2/dQY2SfnnFzMO+yqcmT+505CNls6L/HEcD+6/Kn3uMfMp+gazyUU UxQwEy3P8elE4nM67vpZg00bBOfBmR+7So3E9TMUAERpZ6cYOZxvjOHI5dWC4KA0CD6D tcwA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h3-v6si73802plb.285.2018.03.21.06.09.45; Wed, 21 Mar 2018 06:09:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751905AbeCUNIf (ORCPT + 99 others); Wed, 21 Mar 2018 09:08:35 -0400 Received: from mx2.suse.de ([195.135.220.15]:44417 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751694AbeCUNIe (ORCPT ); Wed, 21 Mar 2018 09:08:34 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 5FC2DAC4A; Wed, 21 Mar 2018 13:08:33 +0000 (UTC) Date: Wed, 21 Mar 2018 14:08:33 +0100 From: Michal Hocko To: Yang Shi Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 1/8] mm: mmap: unmap large mapping by section Message-ID: <20180321130833.GM23100@dhcp22.suse.cz> References: <1521581486-99134-1-git-send-email-yang.shi@linux.alibaba.com> <1521581486-99134-2-git-send-email-yang.shi@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1521581486-99134-2-git-send-email-yang.shi@linux.alibaba.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 21-03-18 05:31:19, Yang Shi wrote: > When running some mmap/munmap scalability tests with large memory (i.e. > > 300GB), the below hung task issue may happen occasionally. > > INFO: task ps:14018 blocked for more than 120 seconds. > Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > message. > ps D 0 14018 1 0x00000004 > ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0 > ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040 > 00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000 > Call Trace: > [] ? __schedule+0x250/0x730 > [] schedule+0x36/0x80 > [] rwsem_down_read_failed+0xf0/0x150 > [] call_rwsem_down_read_failed+0x18/0x30 > [] down_read+0x20/0x40 > [] proc_pid_cmdline_read+0xd9/0x4e0 > [] ? do_filp_open+0xa5/0x100 > [] __vfs_read+0x37/0x150 > [] ? security_file_permission+0x9b/0xc0 > [] vfs_read+0x96/0x130 > [] SyS_read+0x55/0xc0 > [] entry_SYSCALL_64_fastpath+0x1a/0xc5 > > It is because munmap holds mmap_sem from very beginning to all the way > down to the end, and doesn't release it in the middle. When unmapping > large mapping, it may take long time (take ~18 seconds to unmap 320GB > mapping with every single page mapped on an idle machine). Yes, this definitely sucks. One way to work that around is to split the unmap to two phases. One to drop all the pages. That would only need mmap_sem for read and then tear down the mapping with the mmap_sem for write. This wouldn't help for parallel mmap_sem writers but those really need a different approach (e.g. the range locking). > Since unmapping does't require any atomicity, so here unmap large How come? Could you be more specific why? Once you drop the lock the address space might change under your feet and you might be unmapping a completely different vma. That would require userspace doing nasty things of course (e.g. MAP_FIXED) but I am worried that userspace really depends on mmap/munmap atomicity these days. -- Michal Hocko SUSE Labs