Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp2385899imm; Sat, 16 Jun 2018 17:08:27 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJXvS+av20ahGwX4plZ2+G66K2v6X+2zXSgLd34kL+XcyIusHtaz/zgXzUXKjDrPAbphXsS X-Received: by 2002:a65:6157:: with SMTP id o23-v6mr6509847pgv.310.1529194107615; Sat, 16 Jun 2018 17:08:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529194107; cv=none; d=google.com; s=arc-20160816; b=MGhBcmbua1Dh0i39dz8UYmq3xl62ekMjG2a4OX8Iuv0+4HKzU50DNlpIhW+Z+MKxFS Fn04CUkBUWGb5l5vXo+4AAnraVTmJo5ZjyYNMc9QpK4w6K9/Ml4YFXrbyi3gwo0w8ifi kfj0b7fOP66KC7dV5YkEDmwYdhjmTBGyvP0Ovf8zbiZTy3L0TOTUJJlqY+pa9LKW0Cnt 6N2a2PcnzHZ3hjDwNfR2eSmxsGGujRM3N7oKgjaYSPW1ZdXllb8BjBkhFt6H+b5eF/yU jB3b1/xdHWP9GuGjzPOABS5Me1RJ13vboSQgGO8VMCuYPNF5Emr93C++jI6pShW7Fn0k zfww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :dlp-reaction:dlp-version:dlp-product:content-language :accept-language:in-reply-to:references:message-id:date:thread-index :thread-topic:subject:cc:to:from:arc-authentication-results; bh=isM4VEPzAP6SP79LnKryEURPlDpn6sC4Fs0eJOZsa9w=; b=PkgPqagX2G5S4H6VXcnsdOIgu+LIbmH/L70wAE5+pcg3qLUOkZE0LA4YChMPdc6g/0 h3Nno6LFNU0hBu48ssVXv3OCV9O0e1FOOi+CKSYhpx+75ktBrgEvNOCS4yjnMAZRHOwe eoSvFXn3FdnMqw7XxyVNAzfXVuF3gdO/GyMSKMNwEZnVGJnJuK+D7RzvyYLQKG7lAgmQ q02PP+OcH0Fstylizwt6XCvFv/kBlDOqHOGKktyrJRe3ebrTlF52o06g4TWazmPuL4/R 0OB+0JwPXbPq0mUe6zNZI9cWB4JU4bi6U1OUv0xQw7Kxwr627Fg03PvvHLfG8u3Uyi2m tk4w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m11-v6si10982304plt.284.2018.06.16.17.08.11; Sat, 16 Jun 2018 17:08:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756946AbeFQAHq convert rfc822-to-8bit (ORCPT + 99 others); Sat, 16 Jun 2018 20:07:46 -0400 Received: from mga05.intel.com ([192.55.52.43]:6353 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756924AbeFQAHp (ORCPT ); Sat, 16 Jun 2018 20:07:45 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Jun 2018 17:07:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,232,1526367600"; d="scan'208";a="65181993" Received: from fmsmsx106.amr.corp.intel.com ([10.18.124.204]) by orsmga001.jf.intel.com with ESMTP; 16 Jun 2018 17:07:43 -0700 Received: from fmsmsx152.amr.corp.intel.com (10.18.125.5) by FMSMSX106.amr.corp.intel.com (10.18.124.204) with Microsoft SMTP Server (TLS) id 14.3.319.2; Sat, 16 Jun 2018 17:07:43 -0700 Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by FMSMSX152.amr.corp.intel.com (10.18.125.5) with Microsoft SMTP Server (TLS) id 14.3.319.2; Sat, 16 Jun 2018 17:07:43 -0700 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.223]) by SHSMSX152.ccr.corp.intel.com ([169.254.6.70]) with mapi id 14.03.0319.002; Sun, 17 Jun 2018 08:07:41 +0800 From: "Wang, Wei W" To: 'Matthew Wilcox' CC: "virtio-dev@lists.oasis-open.org" , "linux-kernel@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "kvm@vger.kernel.org" , "linux-mm@kvack.org" , "mst@redhat.com" , "mhocko@kernel.org" , "akpm@linux-foundation.org" , "torvalds@linux-foundation.org" , "pbonzini@redhat.com" , "liliang.opensource@gmail.com" , "yang.zhang.wz@gmail.com" , "quan.xu0@gmail.com" , "nilal@redhat.com" , "riel@redhat.com" , "peterx@redhat.com" Subject: RE: [PATCH v33 1/4] mm: add a function to get free page blocks Thread-Topic: [PATCH v33 1/4] mm: add a function to get free page blocks Thread-Index: AQHUBGbzlrVfblBI3Uy7QJ/rUPSUs6RhzBKAgAEz9TA= Date: Sun, 17 Jun 2018 00:07:40 +0000 Message-ID: <286AC319A985734F985F78AFA26841F7396A6415@shsmsx102.ccr.corp.intel.com> References: <1529037793-35521-1-git-send-email-wei.w.wang@intel.com> <1529037793-35521-2-git-send-email-wei.w.wang@intel.com> <20180616045005.GA14936@bombadil.infradead.org> In-Reply-To: <20180616045005.GA14936@bombadil.infradead.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiZDc4ZWY0MWMtMzkwYy00ZmU5LTk3NWEtZDUxZTBhNTI2NjJlIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiNm8wbno4OHo1MnlyeGYycnBcL291VVpoRU53MGkwNmdkM2dSbEw0MnErbEhhWUU2dm1CVzVDbkx6YWVIQmJKdGIifQ== x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.200.100 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Saturday, June 16, 2018 12:50 PM, Matthew Wilcox wrote: > On Fri, Jun 15, 2018 at 12:43:10PM +0800, Wei Wang wrote: > > +/** > > + * get_from_free_page_list - get free page blocks from a free page > > +list > > + * @order: the order of the free page list to check > > + * @buf: the array to store the physical addresses of the free page > > +blocks > > + * @size: the array size > > + * > > + * This function offers hints about free pages. There is no guarantee > > +that > > + * the obtained free pages are still on the free page list after the > > +function > > + * returns. pfn_to_page on the obtained free pages is strongly > > +discouraged > > + * and if there is an absolute need for that, make sure to contact MM > > +people > > + * to discuss potential problems. > > + * > > + * The addresses are currently stored to the array in little endian. > > +This > > + * avoids the overhead of converting endianness by the caller who > > +needs data > > + * in the little endian format. Big endian support can be added on > > +demand in > > + * the future. > > + * > > + * Return the number of free page blocks obtained from the free page list. > > + * The maximum number of free page blocks that can be obtained is > > +limited to > > + * the caller's array size. > > + */ > > Please use: > > * Return: The number of free page blocks obtained from the free page list. > > Also, please include a > > * Context: Any context. > > or > > * Context: Process context. > > or whatever other conetext this function can be called from. Since you're > taking the lock irqsafe, I assume this can be called from any context, but I > wonder if it makes sense to have this function callable from interrupt context. > Maybe this should be callable from process context only. Thanks, sounds better to make it process context only. > > > +uint32_t get_from_free_page_list(int order, __le64 buf[], uint32_t > > +size) { > > + struct zone *zone; > > + enum migratetype mt; > > + struct page *page; > > + struct list_head *list; > > + unsigned long addr, flags; > > + uint32_t index = 0; > > + > > + for_each_populated_zone(zone) { > > + spin_lock_irqsave(&zone->lock, flags); > > + for (mt = 0; mt < MIGRATE_TYPES; mt++) { > > + list = &zone->free_area[order].free_list[mt]; > > + list_for_each_entry(page, list, lru) { > > + addr = page_to_pfn(page) << PAGE_SHIFT; > > + if (likely(index < size)) { > > + buf[index++] = cpu_to_le64(addr); > > + } else { > > + spin_unlock_irqrestore(&zone->lock, > > + flags); > > + return index; > > + } > > + } > > + } > > + spin_unlock_irqrestore(&zone->lock, flags); > > + } > > + > > + return index; > > +} > > I wonder if (to address Michael's concern), you shouldn't instead use the first > free chunk of pages to return the addresses of all the pages. > ie something like this: > > __le64 *ret = NULL; > unsigned int max = (PAGE_SIZE << order) / sizeof(__le64); > > for_each_populated_zone(zone) { > spin_lock_irq(&zone->lock); > for (mt = 0; mt < MIGRATE_TYPES; mt++) { > list = &zone->free_area[order].free_list[mt]; > list_for_each_entry_safe(page, list, lru, ...) { > if (index == size) > break; > addr = page_to_pfn(page) << PAGE_SHIFT; > if (!ret) { > list_del(...); Thanks for sharing. But probably we would not take this approach for the reasons below: 1) I'm not sure if getting a block of free pages to use could be that simple (just pluck it from the list as above). I think it is more prudent to let the callers allocate the array via the regular allocation functions. 2) Callers may need to use this with their own defined protocols, and they want the header and payload (i.e. the obtained hints) to locate in physically continuous memory (there are tricks they can use to make it work with non-physically continuous memory, but that would just complicate all the things) . In this case, it is better to have callers allocate the memory on their own, and pass the payload part memory to this API to get the payload filled. Best, Wei