Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp747820ybg; Wed, 23 Oct 2019 05:25:35 -0700 (PDT) X-Google-Smtp-Source: APXvYqz8GQ0qMS0ATquL/g9cIV987FKeKyvx9DtItGtFqflTJCTloj2QcUt8uwRK5bna+5ucLlTp X-Received: by 2002:a17:906:d8c3:: with SMTP id re3mr32449397ejb.167.1571833535119; Wed, 23 Oct 2019 05:25:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571833535; cv=none; d=google.com; s=arc-20160816; b=rJHmYho4Flt8GlLZK6U2qhSdMIMNfRcnccQZUM/noTiljGt6BGxCCc6qJZLqsPMr33 c1qIm9GiqpJsO6o/OhZzM9zqzFb47omNPnsrCc8nxjsqKH41+7/C7u7EFaKXnVQ2IoOW Z3YSIFzsslCkK1dPM+VmfTimGtOm4xPEMzAfDorPENgCsn7ftd1pjWrUPX6Lv7q5tY41 6ZoMEGoprzap1YPz86tuvnW/MElfPDiapkQCkGij28X6xfjNnd2czA2j/0nCxoDrHzBB +EIFtJJ+ZYZpXhxrUHOy9Xk2jHVOuqgOtX3iniYs8+q/C2pW8IyvRefmUaLIwgSyGEFN 1arQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=AkrYFKRJJbnZ0QoUgV7sjmIgSBTJyD2Q8ma/6Uow7aY=; b=f0Bi2MbDGrJrn7ngxoF945TqZIuSLD0fx+oM8UpL9w7sTzOyIf+VrYa45+b/Lz9/oc 2Tcl7E3ql1GYtPF2THdEpefsyBGvCpL3Kq1f1n5owd9DXHyasVcE8iPdyOhpi5fn7QRV 3Ln8AkolMBT5Z/oNInwpvPFMEkNP39RX91Raojs2JdyQn1CVuI/MYH3iTwOmxieYNVtX XO7V2enP+YHJV/6XERPo7Z9CZOw5oklQ9pw1l6k/6NG35T6mDzt7tswSvl5oD0S9LMVh 0+mwX3NTFC8W1C9MCkJ4sRjJZpq9yXdwWqhYLQUlLSAueWrdnXFcmJAkVMsmKuVH8C6i Xg1A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f37si3413171ede.11.2019.10.23.05.25.11; Wed, 23 Oct 2019 05:25:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732393AbfJWGPP (ORCPT + 99 others); Wed, 23 Oct 2019 02:15:15 -0400 Received: from mx2.suse.de ([195.135.220.15]:46124 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725852AbfJWGPP (ORCPT ); Wed, 23 Oct 2019 02:15:15 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 92ED5AEF3; Wed, 23 Oct 2019 06:15:12 +0000 (UTC) Date: Wed, 23 Oct 2019 08:15:11 +0200 From: Michal Hocko To: Andrew Morton , Vlastimil Babka , Mel Gorman Cc: Waiman Long , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Johannes Weiner , Roman Gushchin , Konstantin Khlebnikov , Jann Horn , Song Liu , Greg Kroah-Hartman , Rafael Aquini Subject: Re: [PATCH] mm/vmstat: Reduce zone lock hold time when reading /proc/pagetypeinfo Message-ID: <20191023061511.GA754@dhcp22.suse.cz> References: <20191022162156.17316-1-longman@redhat.com> <20191022145902.d9c4a719c0b32175e06e4eee@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191022145902.d9c4a719c0b32175e06e4eee@linux-foundation.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 22-10-19 14:59:02, Andrew Morton wrote: > On Tue, 22 Oct 2019 12:21:56 -0400 Waiman Long wrote: [...] > > - for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) { > > - seq_printf(m, "Node %4d, zone %8s, type %12s ", > > - pgdat->node_id, > > - zone->name, > > - migratetype_names[mtype]); > > - for (order = 0; order < MAX_ORDER; ++order) { > > + lockdep_assert_held(&zone->lock); > > + lockdep_assert_irqs_disabled(); > > + > > + /* > > + * MIGRATE_MOVABLE is usually the largest one in large memory > > + * systems. We skip iterating that list. Instead, we compute it by > > + * subtracting the total of the rests from free_area->nr_free. > > + */ > > + for (order = 0; order < MAX_ORDER; ++order) { > > + unsigned long nr_total = 0; > > + struct free_area *area = &(zone->free_area[order]); > > + > > + for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) { > > unsigned long freecount = 0; > > - struct free_area *area; > > struct list_head *curr; > > > > - area = &(zone->free_area[order]); > > - > > + if (mtype == MIGRATE_MOVABLE) > > + continue; > > list_for_each(curr, &area->free_list[mtype]) > > freecount++; > > - seq_printf(m, "%6lu ", freecount); > > + nfree[order][mtype] = freecount; > > + nr_total += freecount; > > } > > + nfree[order][MIGRATE_MOVABLE] = area->nr_free - nr_total; > > + > > + /* > > + * If we have already iterated more than 64k of list > > + * entries, we might have hold the zone lock for too long. > > + * Temporarily release the lock and reschedule before > > + * continuing so that other lock waiters have a chance > > + * to run. > > + */ > > + if (nr_total > (1 << 16)) { > > + spin_unlock_irq(&zone->lock); > > + cond_resched(); > > + spin_lock_irq(&zone->lock); > > + } > > + } > > + > > + for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) { > > + seq_printf(m, "Node %4d, zone %8s, type %12s ", > > + pgdat->node_id, > > + zone->name, > > + migratetype_names[mtype]); > > + for (order = 0; order < MAX_ORDER; ++order) > > + seq_printf(m, "%6lu ", nfree[order][mtype]); > > seq_putc(m, '\n'); > > This is not exactly a thing of beauty :( Presumably there might still > be situations where the irq-off times remain excessive. Yes. It is the list_for_each over the free_list that needs the lock and that is the actual problem here. This can be really large with a _lot_ of memory. And this is why I objected to the patch. Because it doesn't really address this problem. I would like to hear from Mel and Vlastimil how would they feel about making free_list fully migrate type aware (including nr_free). > Why are we actually holding zone->lock so much? Can we get away with > holding it across the list_for_each() loop and nothing else? If so, > this still isn't a bulletproof fix. Maybe just terminate the list > walk if freecount reaches 1024. Would anyone really care? > > Sigh. I wonder if anyone really uses this thing for anything > important. Can we just remove it all? Vlastimil would know much better but I have seen this being used for fragmentation related debugging. That should imply that 0400 should be sufficient and a quick and easily backportable fix for the most pressing immediate problem. -- Michal Hocko SUSE Labs