Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp6296164imm; Sat, 19 May 2018 23:29:09 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpdEQL4RjoitgmTekZzngb7WyyC67V5ftYSuk529InvoVW3DFWUOrOVSHtFB3EtnLpv3Ay9 X-Received: by 2002:a63:604b:: with SMTP id u72-v6mr1164261pgb.108.1526797748997; Sat, 19 May 2018 23:29:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526797748; cv=none; d=google.com; s=arc-20160816; b=DWkOwwq/W2DcHo+kCcHJrJzOWy5/AacyJjMOLg8ZwRRJ4VwoYFnezBcL4yVBSDdt4l w62y5Vg1zpph1uI1hoX/BwCKSC79ED5I0zwH2UxJUSZMv4vXQXARsDszQcd9iB47rIR6 xNb/utoajdgXIrw0FXH5Ryv4aPmVl4UHxsuOMxS1HiMfHC41YMMbTXuHbtp1ouR/vGzC dRIi/3aOBXzrBh+lidLSuVvxKvMUw9kCeqCwBhi5OAo87jh3FYTL6si/CdBW2Zmbodpc 29ixVcywtl/rLQQKVsPZgxXGlTjgvy5UwsD+XQDcCf3kh+zoFkiJtJZvv5pNZSsNuDRT vpTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=d2fgBVpmjKMMk/nn3fLAtsi+h35HKHUP8VHSwZ4q0Qo=; b=PcsVe+l7ZJj8QG/7w/05Ke4bVkGG0HTFRUlNopHs0bZ53amBpMKu5+qKML+xWxUSYP CEjcDX4jrlzMZ3geRtN4vn6IQ4cSXNJGjuxaNTINAhgRSDCN5U2+diZS9MR1zV9NWtbW +5MCfRrJ6yPvjwzPF+nqWHecEwqgLFD5HiMTsJKTuNdEBWtTRzGf9GORzFRPd1jkzk7u PZXEtSqzj8UHfkpW3rUcOdPOAXL+cFUa+WUw8Kb+XVib4eGKtbO4uTSBtBvV1EoGNAl5 3Bsxwo42CNpO5u1y3CvNxLWyR/rEnwqj7e6L/DoZOhFrv9RqzcFcX1mJSIMFgDsEyLnJ xcmg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=tvdSzWYk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e5-v6si9187872pgu.73.2018.05.19.23.28.53; Sat, 19 May 2018 23:29:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=tvdSzWYk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751102AbeETG0K (ORCPT + 99 others); Sun, 20 May 2018 02:26:10 -0400 Received: from mail-qt0-f196.google.com ([209.85.216.196]:42122 "EHLO mail-qt0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750742AbeETG0H (ORCPT ); Sun, 20 May 2018 02:26:07 -0400 Received: by mail-qt0-f196.google.com with SMTP id c2-v6so15365300qtn.9 for ; Sat, 19 May 2018 23:26:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=d2fgBVpmjKMMk/nn3fLAtsi+h35HKHUP8VHSwZ4q0Qo=; b=tvdSzWYkq2K9GQ87FlPLy98fWYAQFJ/gVmp/XvnOkpiuWnDH0LB5v9UzZdOq8WSGU4 e8Z30dQzOpDxY9PClv+oOBlFhV2b2By5pqZ9tXUIuJr2az5FT3EkxxDhYd388xam9nsA aTpeon7GC58wFi4KOM5FDnb//DqOfck90c7BTQ5RJQ5Yr8Ylcdvp912zFKDjPZO66tMy Y5oB3iloLDed/ZjThFM8eLJH7dvq4VBa8GNWxGpq2gQlufFYHcm/18wBqUgQChwKy1Ry 0JIARz/KgWhym0QGHtUI4XcTbJw++ndzatfKgxC24Akmga+V+QQpcUOCuP7a5R3gpRNd NRug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=d2fgBVpmjKMMk/nn3fLAtsi+h35HKHUP8VHSwZ4q0Qo=; b=QwX17HgdHLTE9EPaYmrnhGBYwPhjtyfvRrYjL1/fvkFVFFGF2XB9z/DJ4+ghSWWdVS o4j26rmStX80ldq5AVdlpZBNE27FtoYoUrin2Drqy8u0Ngw2OF1KtWw7uyPX0364CYZu Y2lMSZb6fUx6RNloZQJf2o84hdqrOsDny2FhLbECqdu1bNB4QcpalS1QhwUxh6m5hKug 5t673Bbc6DiV5TLLJYzYhvTSZEYR7NjDIDMtYKQTeuhrgFXIV0LtmKkBX53zkaFA91ol nive8EZO6w6U9GfXXpYwdKLjpkRncnTTJsYm/2dZZFAG1nIZ1UKGMP2bnGtRdCE3iLGH hJ1A== X-Gm-Message-State: ALKqPwemwq3yjqhB2GhBqZvu0eF9U8gCl2iZayTFZFWCA8UrxhJWUqyi oXHcgo8AYM43pRcsbDuzWTM9QVg07jsBwJS6PlU= X-Received: by 2002:ac8:194c:: with SMTP id g12-v6mr14544851qtk.53.1526797566609; Sat, 19 May 2018 23:26:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.12.158.16 with HTTP; Sat, 19 May 2018 23:26:06 -0700 (PDT) In-Reply-To: <714E0B73-BE6C-408B-98A6-2A7C82E7BB11@oracle.com> References: <5BB682E1-DD52-4AA9-83E9-DEF091E0C709@oracle.com> <20180517152333.GA26718@bombadil.infradead.org> <714E0B73-BE6C-408B-98A6-2A7C82E7BB11@oracle.com> From: Song Liu Date: Sat, 19 May 2018 23:26:06 -0700 Message-ID: Subject: Re: [RFC] mm, THP: Map read-only text segments using large THP pages To: William Kucharski Cc: Matthew Wilcox , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-team@fb.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 17, 2018 at 10:31 AM, William Kucharski wrote: > > >> On May 17, 2018, at 9:23 AM, Matthew Wilcox wrote: >> >> I'm certain it is. The other thing I believe is true that we should be >> able to share page tables (my motivation is thousands of processes each >> mapping the same ridiculously-sized file). I was hoping this prototype >> would have code that would be stealable for that purpose, but you've >> gone in a different direction. Which is fine for a prototype; you've >> produced useful numbers. > > Definitely, and that's why I mentioned integration with the page cache > would be crucial. This prototype allocates pages for each invocation of > the executable, which would never fly on a real system. > >> I think the first step is to get variable sized pages in the page cache >> working. Then the map-around functionality can probably just notice if >> they're big enough to map with a PMD and make that happen. I don't immediately >> see anything from this PoC that can be used, but it at least gives us a >> good point of comparison for any future work. > > Yes, that's the first step to getting actual usable code designed and > working; this prototype was designed just to get something working and > to get a first swag at some performance numbers. > > I do think that adding code to map larger pages as a fault_around variant > is a good start as the code is already going to potentially map in > fault_around_bytes from the file to satisfy the fault. It makes sense > to extend that paradigm to be able to tune when large pages might be > read in and/or mapped using large pages extant in the page cache. > > Filesystem support becomes more important once writing to large pages > is allowed. > >> I think that really tells the story. We almost entirely eliminate >> dTLB load misses (down to almost 0.1%) and iTLB load misses drop to 4% >> of what they were. Does this test represent any kind of real world load, >> or is it designed to show the best possible improvement? > > It's admittedly designed to thrash the caches pretty hard and doesn't > represent any type of actual workload I'm aware of. It basically calls > various routines within a huge text area while scribbling to automatic > arrays declared at the top of each routine. It wasn't designed as a worst > case scenario, but rather as one that would hopefully show some obvious > degree of difference when large text pages were supported. > > Thanks for your comments. > > -- Bill We (Facebook) have quite a few real workloads that take advantage of text on huge pages. For some of them, we can see savings close to the number above. Currently, we "hugify" the text region through some hack in user space. We are very interested in supporting it natively in the kernel, because the hack breaks other features. We also tested enabling text on huge pages through shmem, and it does work. The downside is that it requires putting the whole file in memory (or at least in swap). This doesn't work very well for large binaries with GBs of debugging data. Song