Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp154121ybg; Tue, 22 Oct 2019 17:54:48 -0700 (PDT) X-Google-Smtp-Source: APXvYqzhNThA5ndtq7tZxJlhlU+CY9P2deab4DIuscaoIoUs+/J6dtXCjxPO+YnxTnNg/4uWBo7J X-Received: by 2002:a50:8f03:: with SMTP id 3mr35255564edy.195.1571792088150; Tue, 22 Oct 2019 17:54:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571792088; cv=none; d=google.com; s=arc-20160816; b=rwxU0uiowdH0ohWbPao4Dk7qTzxb861EQ66+LoUZgMProuk8OTzkGjC3i2Zr15IKRj BI+AFV2VKZT1MqWKoPaAL/ZqMjpuDCi5TIAmPHDJraOiIhKOC0yNIwNzUwFd+FsSNuC8 hjlR74T/TcYxaPlg0xgIflShvF7rdarNb/LBFvZDqkFV9/eH3oLCh92MipBA++fflVWR Kc33J6uL65sQ9/D6dqSrFD1rQA5b3NQdA9zU2JaFDa7hWHvR5IyPwKJFLf7EOJzLmSen tWLOqOIA4kVqKRC0dSCXokcuS53X84nMpp40FpajkzgmOXPD6ZeDhOpqUEuK2lrARtx9 d8MQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=Srd6gml0YLGx1I577UoqvNSMcSiLgu2cTCSMasmakCE=; b=0udpg3hOzk2Kqtxqt0nvpock3zuRlcyg99IgcrtHR7OCKzCtS+As7t6sXKIlNGaHQk kQjNF3i4j/ktGT0n/gkX55pHHwYSDClo3l3LTd7F+Xj6Xtg243F4t1IuKPlUIVLtpFt0 H55IkgDdENiG/Nc9ALg+0ZzLihLy9u1eE3fXgYwSpM29FyNYlk3naJ9yuHv4TGU7PUT2 qLLAO1Vrw2kgBaai2ouaxwQZ4UdD0TLWgba2rYBVeKgRKnWOTdrzhtXmYudJPUzcF8v1 s2jT6HePktKBTe3BuUS8royc/Tzj5wLIum1H1EDaP5mw9rVJK1dOZsp+oQHHqIylaXG8 l2Sw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=E1fC8UTL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m17si5635061eja.415.2019.10.22.17.54.22; Tue, 22 Oct 2019 17:54:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=E1fC8UTL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731988AbfJWAwK (ORCPT + 99 others); Tue, 22 Oct 2019 20:52:10 -0400 Received: from mail-pl1-f195.google.com ([209.85.214.195]:38482 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727403AbfJWAwK (ORCPT ); Tue, 22 Oct 2019 20:52:10 -0400 Received: by mail-pl1-f195.google.com with SMTP id w8so9194770plq.5 for ; Tue, 22 Oct 2019 17:52:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=Srd6gml0YLGx1I577UoqvNSMcSiLgu2cTCSMasmakCE=; b=E1fC8UTLD3GGDIef9dWBus1qtkkmtrAcx1kuFUFA0HMa6PVDEe+k0alKpZHj5wHZIG HpskJIgaFND8TR6D2YN86I9cDYiRF3aEhmMH69RsXSsHFlqA5M1AVqt67btfZXQSUOJm 67/j4apzvaAoxbq//X21PI2f/aHbbrR1WNWTZl29HeaLcLbZTm+LMzWW/0PzVl6B6cgd /BI5xIYJ4HSxRRA13FvrHFh4NmgR0msBEljG2tsYOyhqQt23O2Bsg5Yd+BhNxZSYSb8F 2o/wDghmzo8c0xyZo8pFj90UXzPoMxwNFQrfQbb+kKQnlUClDEiDCgdA1+ljMOqcswOW 6IIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=Srd6gml0YLGx1I577UoqvNSMcSiLgu2cTCSMasmakCE=; b=MGbR8pPKqp0VWCzyhf29E+al/of1TglIoIvElht1ODoEcpoWeQVkneGvROBQp53Vcp pnaYl/lkZoV3acJuFvxJM5n3PUTOlixaBvhodEeITkJzX1VH21kX1RUe+JCzyzGR9UM9 6FKhlpLZCwkkBgf36rQLxGqaKHQzXg8N9P4ORfrzvdEvHm6s/6tU7VE9TFBzdyqVMrn9 Iavxb9JE0LLO0FMzY3JOec4hDwEFfRCVaqGLTF8JXLv0Qqu79aqkrRg9SrpXuagL5E1H khnyYrLj/bEZd4qWo7ERjHZb/NuhQdn20zoiUN98esyIp38ZWd2fgD1BEhTzBXSF5bgF VKZw== X-Gm-Message-State: APjAAAVAD/LkIZy+QV5VwBF5WTnUjOTlzGuGaudjZdjRDHeWEJ12CgMl GI4b8E8jwe4yuuTHNULVPi6uPg== X-Received: by 2002:a17:902:6ac8:: with SMTP id i8mr6349524plt.164.1571791929082; Tue, 22 Oct 2019 17:52:09 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id 64sm21139510pfx.31.2019.10.22.17.52.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Oct 2019 17:52:08 -0700 (PDT) Date: Tue, 22 Oct 2019 17:52:07 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Waiman Long cc: Michal Hocko , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Johannes Weiner , Roman Gushchin , Vlastimil Babka , Konstantin Khlebnikov , Jann Horn , Song Liu , Greg Kroah-Hartman , Rafael Aquini , Mel Gorman Subject: Re: [PATCH] mm/vmstat: Reduce zone lock hold time when reading /proc/pagetypeinfo In-Reply-To: Message-ID: References: <20191022162156.17316-1-longman@redhat.com> <20191022165745.GT9379@dhcp22.suse.cz> <0b206255-5c62-18f5-d751-a5576a6c0e8f@redhat.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 22 Oct 2019, Waiman Long wrote: > >>> and used nr_free to compute the missing count. Since MIGRATE_MOVABLE > >>> is usually the largest one on large memory systems, this is the one > >>> to be skipped. Since the printing order is migration-type => order, we > >>> will have to store the counts in an internal 2D array before printing > >>> them out. > >>> > >>> Even by skipping the MIGRATE_MOVABLE pages, we may still be holding the > >>> zone lock for too long blocking out other zone lock waiters from being > >>> run. This can be problematic for systems with large amount of memory. > >>> So a check is added to temporarily release the lock and reschedule if > >>> more than 64k of list entries have been iterated for each order. With > >>> a MAX_ORDER of 11, the worst case will be iterating about 700k of list > >>> entries before releasing the lock. > >> But you are still iterating through the whole free_list at once so if it > >> gets really large then this is still possible. I think it would be > >> preferable to use per migratetype nr_free if it doesn't cause any > >> regressions. > >> > > Yes, it is still theoretically possible. I will take a further look at > > having per-migrate type nr_free. BTW, there is one more place where the > > free lists are being iterated with zone lock held - mark_free_pages(). > > Looking deeper into the code, the exact migration type is not stored in > the page itself. An initial movable page can be stolen to be put into > another migration type. So in a delete or move from free_area, we don't > know exactly what migration type the page is coming from. IOW, it is > hard to get accurate counts of the number of entries in each lists. > I think the suggestion is to maintain a nr_free count of the free_list for each order for each migratetype so anytime a page is added or deleted from the list, the nr_free is adjusted. Then the free_area's nr_free becomes the sum of its migratetype's nr_free at that order. That's possible to do if you track the migratetype per page, as you said, or like pcp pages track it as part of page->index. It's a trade-off on whether you want to impact the performance of maintaining these new nr_frees anytime you manipulate the freelists. I think Vlastimil and I discussed per order per migratetype nr_frees in the past and it could be a worthwhile improvement for other reasons, specifically it leads to heuristics that can be used to determine how fragmentated a certain migratetype is for a zone, i.e. a very quick way to determine what ratio of pages over all MIGRATE_UNMOVABLE pageblocks are free. Or maybe there are other reasons why these nr_frees can't be maintained anymore? (I had a patch to do it on 4.3.) You may also find systems where MIGRATE_MOVABLE is not actually the longest free_list compared to other migratetypes on a severely fragmented system, so special casing MIGRATE_MOVABLE might not be the best way forward.