Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1119278imm; Wed, 25 Jul 2018 11:50:07 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdhzhCMDKqTuq5UuJz8RRecfkZ2zxQp8UiRcJP4+01yt6CKK7XAktBNm1ScVm39spKXznE4 X-Received: by 2002:a62:98d6:: with SMTP id d83-v6mr23463548pfk.186.1532544607729; Wed, 25 Jul 2018 11:50:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532544607; cv=none; d=google.com; s=arc-20160816; b=ciXMXKS1fsQue7F+7UpoJNtKYnnGsuNxlLKThq/uWcDfJ0ZctjfZMPlp8x5sMezm2w baEpQNEUETCt6iTFquoeAHObWa+Dl7PWiUMHJNtmNjIomluSh2EEcyE3xsqUIlw7Shb7 yNg+uD1KWhxjNdphoEetncuQ5/SmQpx8tJmz2Ymf9/01WUCCj/+WEhXIPV+zLU2LtyPG QwfdMtUXUytOtpVpFAeJU6aw8CfDR30jlx8lTLxJKYP7Y5oI21snlwXXZiQojX8nH3OR i50RD86KdG1uIC7OQddIl0WhKMuN6OAN8bKx9k0iCxWTEmcD2OkVZ/8qexkv8j1Wc/BP kqyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature :arc-authentication-results; bh=pit0IoZyKRuBV/BUuheLw/fR9l4naQRoQX7O+ZI0jg0=; b=bTke54B+eMtYEsqFuQGrSzqn8medsqTGL4c5Yi6ZNtYB46OXNBy6+0TniFqPaB3HmS aWe1fnxSXviRyUVJP9Fxph+kh47hEbMBUACTMSIylWa474ICvqJvw3zg+slFtg8fd9zu /c+oH7CMTEP6dCOy0XEu/3yPtQq/y5saw+1nLqa/9B95BX6NhAw3fC2xypEWfv2ED5ED v4fnoiRzP/dZH5/WJlDxNDEmt1XXiss/6AiN00YdYB7b/9dcr7lTQESwkf5g5fVQSd79 dUVBmqwrzQ4jHa66BOBWjm8PDa/f26Z7aGrGw2DQi7zqSg2IawQks2S2Ai3Zkln906vt ++aA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=YZ25xIKd; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r11-v6si12701092plo.144.2018.07.25.11.49.52; Wed, 25 Jul 2018 11:50:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=YZ25xIKd; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730555AbeGYUCA (ORCPT + 99 others); Wed, 25 Jul 2018 16:02:00 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:37607 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729170AbeGYUCA (ORCPT ); Wed, 25 Jul 2018 16:02:00 -0400 Received: by mail-wr1-f65.google.com with SMTP id q10-v6so8344717wrd.4 for ; Wed, 25 Jul 2018 11:49:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=pit0IoZyKRuBV/BUuheLw/fR9l4naQRoQX7O+ZI0jg0=; b=YZ25xIKdwmjFrN0eqNrvH7s9nb26wI86heyqsg/QRGY1ALqyao6QW8uRdzfw8+r49m oXimGai3l2RmI7rpKvHEClbZ7hlGJCgZrXjmG13IvUdwmuGSsgmQn/at7G82m75zQLfk 3MSl7y44RdzSiqfQXpy6MFNXskc/yZMlBl8u0k9ABsQBC5yaLLop/LsfxnlMV2vDra6c y/xbvjKDP43tU9llBtscudKR9BM6yPNYD3BRWXjZir5+c/5tOCK+B8nh7DHFuPrr0lnI crUQkSVpUg4TqwaxzGKz1+TXmvPvFtmtKpFug5XddviggSJ43RCl8M1lEq+BXjWicRsl tRcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=pit0IoZyKRuBV/BUuheLw/fR9l4naQRoQX7O+ZI0jg0=; b=nlgeA9a6mGuZLS1Vg9Ignyr3QfiZNOXJpIaHZeUzCXUxwsI6svBqg89U8YdU6PiljK CmG7hJljr89+pGdJvyyZorvWphT7ZBQTTX4Akcx14vvtN0AvUYjQOEJ6h2V+0cjhcf3k EYQO0cZ6zYwAQJyePTqK4DrG1a/QQpcBDMTIF9lxXqjGZhL4zOTubaPKho3j3WibQBAl 9SeNZeq+RDKHSPIyuChAoLunUJb1KH70M1/7VbrBEyi6v3/kBBeT2zrduyx8IzR09jR3 1R48Gu59XMJrUmS3bBGAGyaliYuxfVvoXpoMXSE66JrjT3WW1ICwZ4Gsbwh7FOXakZeN I0kQ== X-Gm-Message-State: AOUpUlGNskOee1cpGZoAyeeRgU9w0FmN1NRm+EfPPZ1iIuzvY42GqHM8 CViN6dkWsVmClCOMHLVeBsAHYnjyomutPKCf6Bk9Tg== X-Received: by 2002:adf:adc9:: with SMTP id w67-v6mr14809904wrc.135.1532544541809; Wed, 25 Jul 2018 11:49:01 -0700 (PDT) MIME-Version: 1.0 References: <20180724210923.GA20168@bombadil.infradead.org> <20180725023728.44630-1-cannonmatthews@google.com> <20180725182303.GA1366@bombadil.infradead.org> In-Reply-To: <20180725182303.GA1366@bombadil.infradead.org> From: Cannon Matthews Date: Wed, 25 Jul 2018 11:48:50 -0700 Message-ID: Subject: Re: [PATCH v2] RFC: clear 1G pages with streaming stores on x86 To: willy@infradead.org Cc: elliott@hpe.com, mhocko@kernel.org, mike.kravetz@oracle.com, akpm@linux-foundation.org, kirill.shutemov@linux.intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andres Lagar-Cavilla , Salman Qazi , Paul Turner , David Matlack , Peter Feiner , Alain Trinh Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 25, 2018 at 11:23 AM Matthew Wilcox wrote: > > On Wed, Jul 25, 2018 at 10:30:40AM -0700, Cannon Matthews wrote: > > On Tue, Jul 24, 2018 at 10:02 PM Elliott, Robert (Persistent Memory) > > > > + BUG_ON(pages_per_huge_page % PAGES_BETWEEN_RESCHED != 0); > > > > + BUG_ON(!dest); > > > > > > Are those really possible conditions? Is there a safer fallback > > > than crashing the whole kernel? > > > > Perhaps not, I hope not anyhow, this was something of a first pass > > with paranoid > > invariant checking, and initially I wrote this outside of the x86 > > specific directory. > > > > I suppose that would depend on: > > > > Is page_to_virt() always available and guaranteed to return something valid? > > Will `page_per_huge_page` ever be anything other than 262144, and if so > > anything besides 512 or 1? > > page_to_virt() can only return NULL for HIGHMEM, which we already know > isn't going to be supported. pages_per_huge_page might vary in the > future, but is always going to be a power of two. You can turn that into > a build-time assert, or just leave it for the person who tries to change > gigantic pages from being anything other than 1GB. > > > It seems like on x86 these conditions will always be true, but I don't know > > enough to say for 100% certain. > > They're true based on the current manuals. If Intel want to change them, > it's fair that they should have to change this code too. Thanks for the confirmations! > > > Before I started this I experimented with all of those variants, and > > interestingly found that I could equally saturate the memory bandwidth with > > 64,128, or 256bit wide instructions on a broadwell CPU ( I did not have a > > skylake/AVX-512 machine available to run the tests on, would be a curious > > thing to see it it holds for that as well). > > > > >From userspace I did a mmap(MAP_POPULATE), then measured the time > > to zero a 100GiB region: > > > > mmap(MAP_POPULATE): 27.740127291 > > memset [libc, AVX]: 19.318307069 > > rep stosb: 19.301119348 > > movntq: 5.874515236 > > movnti: 5.786089655 > > movtndq: 5.837171599 > > vmovntdq: 5.798766718 > > > > It was interesting also that both the libc memset using AVX > > instructions > > (confirmed with gdb, though maybe it's more dynamic/tricksy than I know) was > > almost identical to the `rep stosb` implementation. > > > > I had some conversations with some platforms engineers who thought this made > > sense, but that it is likely to be highly CPU dependent, and some CPUs might be > > able to do larger bursts of transfers in parallel and get better > > performance from > > the wider instructions, but this got way over my head into hardware SDRAM > > controller design. More benchmarking would tell however. > > > > Another thing to consider about AVX instructions is that they affect core > > frequency and power/thermals, though I can't really speak to specifics but I > > understand that using 512/256 bit instructions and zmm registers can use more > > power and limit the frequency of other cores or something along those > > lines. > > Anyone with expertise feel free to correct me on this though. I assume this is > > also highly CPU dependent. > > There's a difference between using AVX{256,512} load/store and arithmetic > instructions in terms of power draw; at least that's my recollection > from reading threads on realworldtech. But I think it's not worth > going further than you have. You've got a really nice speedup and it's > guaranteed to be faster on basically every microarch. If somebody wants > to do something super-specialised for their microarch, they can submit > a patch on top of yours. Good point, that was a subtly that escaped my recollection. In particular I've also been told that using the zmm registers has power/thermal penalties as well, though xmm/ymm is OK as long as you don't wake up the multipliers at least on specific microarches. Nonetheless I agree, we can start with this general one and leave room for more specialized alternatives should anyone ever have the interest to build on top of this.