Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp4557365ybb; Tue, 14 Apr 2020 09:34:08 -0700 (PDT) X-Google-Smtp-Source: APiQypJMh0yP2Io1i0YXwjEw/T3kmy7auHCQNaEoqDjf/RZ5L82DcmudvPJaiSw1ldXCZ25mZgEH X-Received: by 2002:a05:6402:1116:: with SMTP id u22mr21953388edv.336.1586882048019; Tue, 14 Apr 2020 09:34:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586882048; cv=none; d=google.com; s=arc-20160816; b=CPJQHDoPD9hbUAO86+zqsroifjlKkjqMUWuBmat8quOl4daGpa2WgMcpWnGz4qaLfl Oz2gsufjKj0hyRGqNYsxsdPS27p1BVJFD7iKb6cxHesmLAqs9Oj9lMIhJFy6gz+QXqZA nOUPWV6B8oVgR0d0bJqCDR5rKbRgdDPL8dVH2RmpKf3bm6Sjh69rkoGcXTU0Oh/6Lyyo kW7Dp7U6bQkI0RYInNnJ49yHYofXf7d8rUr2QbKRsm/IEFyMU2dFPiY6BQKwph30/E86 bLthB/uegpCdCz6bAxs5lQBveyA1Xia/uy9UITPzZc69VfGq0egrVWBQ7fT1B4XifLjy PDrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=pU7M95ov88jSHp/VQnt5NZEkZGsi4VE+ZPZVHu49Pjg=; b=E+LPuyiKYOkCFGO0hMettoNSHorJ7WaBZ5kdrYfimPfjC7hgQGz/FY5p7srZy4QxUh Trfo8CRChJtfbCnWuVM++rfiBwG1gFc1IOO2FFjZ3s6XaCYg9aSJwkai78VYCZsw8BfO 5Y1Sb+kuvOE3AfkDmNTrRjRYq7vTDkI4lvthYT+d1R3SVVjoDyWp+xQNfhSkpp/HdOaq JvqjsK/r8cFGg8zpmsZZGCZ7KoSftkAVbv/59+K+9uIbHzwVNQrBYh5zKTLU8WKN0XdC zz4Tq+wdouug6catpfDZAkyKcSdW/5xPid2k0sqpa6MKCxJZWVwVeUe9uGO33PP5ekP/ 7nEQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=vf4FJxRx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j33si7268670edd.593.2020.04.14.09.33.43; Tue, 14 Apr 2020 09:34:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=vf4FJxRx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2437529AbgDNPMw (ORCPT + 99 others); Tue, 14 Apr 2020 11:12:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44348 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2440209AbgDNPHo (ORCPT ); Tue, 14 Apr 2020 11:07:44 -0400 Received: from mail-il1-x142.google.com (mail-il1-x142.google.com [IPv6:2607:f8b0:4864:20::142]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF6C0C061A0E for ; Tue, 14 Apr 2020 08:07:44 -0700 (PDT) Received: by mail-il1-x142.google.com with SMTP id f2so6796ilq.7 for ; Tue, 14 Apr 2020 08:07:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=pU7M95ov88jSHp/VQnt5NZEkZGsi4VE+ZPZVHu49Pjg=; b=vf4FJxRxr27g2CTlAwgyEerxuSMdNMwaeiQPuvoLNeKHv4jT15rCInSNRy+CxsLPb4 TzAvAdYlL5uIriY2uV4LgbRCv7Nh0aOi8itP7nlgan2pNeCvKpt/RK+UNF15gmpiiWz/ bFD0qmhhYIbG8e2eUFk88uhNZX6XgD54BX8ZUR5FY6+i4s7vKKsUc33Z7u0pbCYmMUOo +YmovdJuyV8vSXOsaoD0O06YqasLXuPkfmTgkhsqQOVQ5UUlOJpYXazrj9bQu4/GJXdk SNKbj1Nc/RzZrZm5s7NtqLX4Wu/aGF7kxyC7DOyX0h5E3V+P2pR/MKdeEjRHNJC01EPH r1Cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=pU7M95ov88jSHp/VQnt5NZEkZGsi4VE+ZPZVHu49Pjg=; b=A2jD13epFg8he3m4qFxb4OezjV53S2u0lJ2wJelCmbvn+xKUEgFWymyR0xeFxCbUVU YUdVZHnwlQmjFGcUKkCMpUQcdiDwQ3UD6MfxTf1BSUVQDJQ0gnTSwKTcm2GFkJZHtSZh zrnfes7yhWpUE1nTRwAW8y2vTKW4DucRJmBsr6zvQCcSQZbTtGEnVdgg3StyLaOJhclq SowjrwietuGbzThax211XATgqt2NtA3mxzKaNraSW00uQm88i9Nx0ZZ5tDYwlgGjnmTD /K1EwmnAK1+iRY5o2qW6mNuwTjL8grtVtaoiRe8cFyFOIQ682T5rYZtP7fJ9JFSAzhGX Licw== X-Gm-Message-State: AGi0PuZsKfkg9PXPZJIzR49wJfjg0fs3f2HePGpGkfmzcRo9m2xqLksv yO8wjH8ZoVuPbdC72Rol7Il6vlmY+w8lEkv7yYgljvLYstk= X-Received: by 2002:a92:4b11:: with SMTP id m17mr737912ilg.42.1586876863830; Tue, 14 Apr 2020 08:07:43 -0700 (PDT) MIME-Version: 1.0 References: <20200412090728.GA19572@open-light-1.localdomain> <7de81a1c-0568-b8dd-02c5-b109a2e74a04@redhat.com> In-Reply-To: <7de81a1c-0568-b8dd-02c5-b109a2e74a04@redhat.com> From: Alexander Duyck Date: Tue, 14 Apr 2020 08:07:32 -0700 Message-ID: Subject: Re: [RFC PATCH 0/4] mm: Add PG_zero support To: David Hildenbrand Cc: Alexander Duyck , Mel Gorman , linux-mm , LKML , Andrea Arcangeli , Dan Williams , Dave Hansen , Michal Hocko , Andrew Morton , Alex Williamson Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 14, 2020 at 5:01 AM David Hildenbrand wrote: > > On 12.04.20 11:07, liliangleo wrote: > > Zero out the page content usually happens when allocating pages, > > this is a time consuming operation, it makes pin and mlock > > operation very slowly, especially for a large batch of memory. > > > > This patch introduce a new feature for zero out pages before page > > allocation, it can help to speed up page allocation. > > > > The idea is very simple, zero out free pages when the system is > > not busy and mark the page with PG_zero, when allocating a page, > > if the page need to be filled with zero, check the flag in the > > struct page, if it's marked as PG_zero, zero out can be skipped, > > it can save cpu time and speed up page allocation. > > > > This serial is based on the feature 'free page reporting' which > > introduced by Alexander Duyck > > > > We can benefit from this feature in the flowing case: > > 1. User space mlock a large chunk of memory > > 2. VFIO pin pages for DMA > > 3. Allocating transparent huge page > > 4. Speed up page fault process > > > > My original intention for adding this feature is to shorten > > VM creation time when VFIO device is attached, it works good > > and the VM creation time is reduced obviously. > > > > Creating a VM [64G RAM, 32 CPUs] with GPU passthrough > > ===================================================== > > QEMU use 4K pages, THP is off > > round1 round2 round3 > > w/o this patch: 23.5s 24.7s 24.6s > > w/ this patch: 10.2s 10.3s 11.2s > > > > QEMU use 4K pages, THP is on > > round1 round2 round3 > > w/o this patch: 17.9s 14.8s 14.9s > > w/ this patch: 1.9s 1.8s 1.9s > > ===================================================== > > > > Look forward to your feedbacks. > > I somehow have the feeling that this should not be glued to free page > reporting. After all, you are proposing your own status indicator for > each buddy page (PG_zero) already, which would mean you can build > something similar to free page reporting fairly easily, and have it > co-exist. > > The free page reporting infrastructure is helpful when wanting to > asynchronously batch-process higher-order pages. I don't see the > immediate need for the "batch-processing here". > > E.g., why not simply zero out pages as they are freed/placed into free > lists? Especially, this is one of the simple alternatives to free page > reporting as we have it today (guest zeroes free pages, hypervisor > detects free pages using e.g., ksm). The problem with doing it at free is that it would be just as expensive as doing it at allocation, only you would likely see it in more cases as more applications are more likely to free all of their memory at once on exit, while only a few will pin all of their pages at the start. > That could even allow you to avoid the PG_zero flag completely. E.g., > once the feature is activated and running, all pages in the buddy free > lists are zeroed out already. Zeroing happens synchronously from the > page-freeing thread, not when starting a guest. > > Having that said, I agree with Dave here, that there might be better > alternatives for this somewhat-special-case. I wonder if it wouldn't make more sense to look at the option of splitting the initialization work up over multiple CPUs instead of leaving it all single threaded. The data above was creating a VM with 64GB of RAM and 32 CPUs. How fast could we zero the pages if we were performing the zeroing over those 32 CPUs? I wonder if we couldn't look at recruiting other CPUs on the same node to perform the zeroing like what Dan had originally proposed for ZONE_DEVICE initialization a couple years ago[1]. Thanks. - Alex [1]: https://lore.kernel.org/linux-mm/153077336359.40830.13007326947037437465.stgit@dwillia2-desk3.amr.corp.intel.com/