Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp4837450imu; Tue, 29 Jan 2019 08:18:55 -0800 (PST) X-Google-Smtp-Source: ALg8bN6UBPtP4Hl7f6e5E4LiWxjUgIoW7BuB8GtCzjDrd2uZoVCv+DkZZEhGbCDlDslaRDzr0h2o X-Received: by 2002:a63:3e05:: with SMTP id l5mr16549630pga.96.1548778735371; Tue, 29 Jan 2019 08:18:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548778735; cv=none; d=google.com; s=arc-20160816; b=ROZAhwqItvd/2EUGn9WHT2R504uldCQ6iwhYnQsv3GSL7EMOKBvGSGHZYKYCxo5Sn3 wLMsMYvpPePZKw/G9oFZ/3Okpxeh5j3z6Uca99Xxgnlz5ZaXPKfGbSnJprQ7wQYVyKkD 4c7KpLnrN777abDQVZ7cLMGJh8pxHvmcz5SPOjj+AwTx0peJ0ApkQSJC73nI9qsllhZ+ PpuDAdDDVa+qxteUdUQUStJPwNeEOhdKQg2bOqQRwzI8BKmwPW0y/WOWBA4sdUUG6MKi 8sHrxLOjGR5d1xz3seY4Fuw9B5ygh0mSDZDn7SscMG9VcNo6Q8XeYH9hqdKxAH4q89IR MAbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:date:from:dkim-signature; bh=zQ9HptmdOWJ9u9idVqDsmESiFBgfBiqm+pbhrVDO9qE=; b=X9gl2yjrxE2nWZHdhoDLTp1s0/soyf7r+FPBUOOxbKA3Ywq46Z1fjt8xhFhe5++VWJ dn0l4kMPvv9gjGlyuVmWx2tgLDP4IroLmFpQLQRBF4X9IW6jN/b1x4wNnbNYNIlMJfdQ z6CqVTDu3PL1EhnfGcPv6tOPkCPLBcQp8RDHWGAD/QrWQnlxenoioAH4ckyp65O/2xT/ F5kPswXp/+6STLTOq1/0yhhNsXbyzYcBRxg8ekVpYmXWmRAh1yIuW9MKC4iB395bPNct UpUHKDCcMLsCkcAhL7h6SSC0YtDTFH9FI/1lZ6IIlQ9ZLs/oXA/RBZF9y4j/pzdoDrhf 0KBg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=a1u6VfbH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c8si34511502pgc.65.2019.01.29.08.18.38; Tue, 29 Jan 2019 08:18:55 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=a1u6VfbH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727848AbfA2QSI (ORCPT + 99 others); Tue, 29 Jan 2019 11:18:08 -0500 Received: from mail-lf1-f66.google.com ([209.85.167.66]:40621 "EHLO mail-lf1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725804AbfA2QSI (ORCPT ); Tue, 29 Jan 2019 11:18:08 -0500 Received: by mail-lf1-f66.google.com with SMTP id v5so15044386lfe.7 for ; Tue, 29 Jan 2019 08:18:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=zQ9HptmdOWJ9u9idVqDsmESiFBgfBiqm+pbhrVDO9qE=; b=a1u6VfbHu0XhUBPA9meYk+NPYYvfrvLI3EffjAPM+xeblatTcIdmuwE16U85fJqHRR N+Yo2eSJ5GeJTJtgJgfL0Xwm0HNhKf7ZDqJQqoo+cockoPVGkEtJZLx5y9v7GT58YGkO N4urZ4CT1vFCWlPaUg6X2h2lwi6NHfhgVKce3BvGRRShEhgwQDPhljs2pspkShLfD4XV z5bNvK1VqKLslqvC16hKcZuyFQV1EWR1yP4pKSZyNGPFJjjgtY0G2cr52+iCB7d8Zb4C utpHNl/KE66ZFJnUXEN/t7XaIzoQ1o2PpX83KUmM2XxMJHP9fuS9Gr24lYV8gHU/4XCZ TtIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=zQ9HptmdOWJ9u9idVqDsmESiFBgfBiqm+pbhrVDO9qE=; b=aCbKjjrM8ezNKm1uOTx1jVACX6eW9daO2gscodzP6Wz5NbFAWoEjpvwVai/IOJWloV dhqxkGHYEiDhvUL8r3NmNiYVifcfxYX9UKlWQoKKGHSLIi43r4yNSI4UQB50xQociPKq /w/AUht2rYq3iCpEBZqu0ew4roKa53yigGQK50HeDO6t9cJeEwPQYrSg1WhMk5c7TE+B C/80GhckUipyCIl21bFsMaIKZ4dfHdkx/aXCu40VI5/cIrBoiMWkj99SyZpEh+dSTLYj AsbtYhRFYQPyo3j307erAqQwxTZKrs8cTXZFsMw6T7hqxwmnd50Yudk8viPK1oQj6zGM eufQ== X-Gm-Message-State: AHQUAubNgAivcNfR1fnFx+b/U3DSbSnYiN97B6pQXPaksrwetKvr/S+2 rmNmifPwaycSTyfEIZZEOHw= X-Received: by 2002:ac2:4243:: with SMTP id m3mr2329267lfl.5.1548778684646; Tue, 29 Jan 2019 08:18:04 -0800 (PST) Received: from pc636 ([37.139.158.167]) by smtp.gmail.com with ESMTPSA id x24-v6sm3974681ljc.54.2019.01.29.08.18.02 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 29 Jan 2019 08:18:03 -0800 (PST) From: Uladzislau Rezki X-Google-Original-From: Uladzislau Rezki Date: Tue, 29 Jan 2019 17:17:54 +0100 To: Andrew Morton Cc: "Uladzislau Rezki (Sony)" , Michal Hocko , Matthew Wilcox , linux-mm@kvack.org, LKML , Thomas Garnier , Oleksiy Avramchenko , Steven Rostedt , Joel Fernandes , Thomas Gleixner , Ingo Molnar , Tejun Heo Subject: Re: [PATCH v1 2/2] mm: add priority threshold to __purge_vmap_area_lazy() Message-ID: <20190129161754.phdr3puhp4pjrnao@pc636> References: <20190124115648.9433-1-urezki@gmail.com> <20190124115648.9433-3-urezki@gmail.com> <20190128120429.17819bd348753c2d7ed3a7b9@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190128120429.17819bd348753c2d7ed3a7b9@linux-foundation.org> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 28, 2019 at 12:04:29PM -0800, Andrew Morton wrote: > On Thu, 24 Jan 2019 12:56:48 +0100 "Uladzislau Rezki (Sony)" wrote: > > > commit 763b218ddfaf ("mm: add preempt points into > > __purge_vmap_area_lazy()") > > > > introduced some preempt points, one of those is making an > > allocation more prioritized over lazy free of vmap areas. > > > > Prioritizing an allocation over freeing does not work well > > all the time, i.e. it should be rather a compromise. > > > > 1) Number of lazy pages directly influence on busy list length > > thus on operations like: allocation, lookup, unmap, remove, etc. > > > > 2) Under heavy stress of vmalloc subsystem i run into a situation > > when memory usage gets increased hitting out_of_memory -> panic > > state due to completely blocking of logic that frees vmap areas > > in the __purge_vmap_area_lazy() function. > > > > Establish a threshold passing which the freeing is prioritized > > back over allocation creating a balance between each other. > > It would be useful to credit the vmalloc test driver for this > discovery, and perhaps to identify specifically which test triggered > the kernel misbehaviour. Please send along suitable words and I'll add > them. > Please see below more detail of testing: Using vmalloc test driver in "stress mode", i.e. When all available test cases are run simultaneously on all online CPUs applying a pressure on the vmalloc subsystem, my HiKey 960 board runs out of memory due to the fact that __purge_vmap_area_lazy() logic simply is not able to free pages in time. How i run it: 1) You should build your kernel with CONFIG_TEST_VMALLOC=m 2) ./tools/testing/selftests/vm/test_vmalloc.sh stress during this test "vmap_lazy_nr" pages will go far beyond acceptable lazy_max_pages() threshold, that will lead to enormous busy list size and other problems including allocation time and so on. > > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -661,23 +661,27 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) > > struct llist_node *valist; > > struct vmap_area *va; > > struct vmap_area *n_va; > > - bool do_free = false; > > + int resched_threshold; > > > > lockdep_assert_held(&vmap_purge_lock); > > > > valist = llist_del_all(&vmap_purge_list); > > + if (unlikely(valist == NULL)) > > + return false; > > Why this change? > I decided to refactor a bit, simplify and get rid of unneeded do_free check logic. I think it is more straightforward just to check if list is empty or not, instead of accessing to "do_free" "n" times in a loop. I can drop it, or upload as separate patch. What is your view? > > + /* > > + * TODO: to calculate a flush range without looping. > > + * The list can be up to lazy_max_pages() elements. > > + */ > > How important is this? > It depends on vmap_lazy_nr pages in the list we iterate. For example on my ARM 8 cores with 4Gb system i see that __purge_vmap_area_lazy() can take up to 12 milliseconds because of long list. That is why there is the cond_resched_lock(). As for this first loop's time execution, it takes ~4/5 milliseconds to find out the flush range. Probably it is not so important since it is not done in atomic context means it can be interrupted or preempted. So, it will increase execution time of the current process that does: vfree()/etc -> __purge_vmap_area_lazy(). From the other hand if we could calculate that range in runtime, i mean when we add a VA to the vmap_purge_list checking va->va_start and va->va_end with min/max we could get rid of that loop. But this is just an idea. > > llist_for_each_entry(va, valist, purge_list) { > > if (va->va_start < start) > > start = va->va_start; > > if (va->va_end > end) > > end = va->va_end; > > - do_free = true; > > } > > > > - if (!do_free) > > - return false; > > - > > flush_tlb_kernel_range(start, end); > > + resched_threshold = (int) lazy_max_pages() << 1; > > Is the typecast really needed? > > Perhaps resched_threshold shiould have unsigned long type and perhaps > vmap_lazy_nr should be atomic_long_t? > I think so. Especially that atomit_t is 32 bit integer value on both 32 and 64 bit systems. lazy_max_pages() deals with unsigned long that is 8 bytes on 64 bit system, thus vmap_lazy_nr should be 8 bytes on 64 bit as well. Should i send it as separate patch? What is your view? > > spin_lock(&vmap_area_lock); > > llist_for_each_entry_safe(va, n_va, valist, purge_list) { > > @@ -685,7 +689,9 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) > > > > __free_vmap_area(va); > > atomic_sub(nr, &vmap_lazy_nr); > > - cond_resched_lock(&vmap_area_lock); > > + > > + if (atomic_read(&vmap_lazy_nr) < resched_threshold) > > + cond_resched_lock(&vmap_area_lock); > > } > > spin_unlock(&vmap_area_lock); > > return true; > Thank you for your comments and review. -- Vlad Rezki