Received: by 10.192.165.148 with SMTP id m20csp2670064imm; Sun, 6 May 2018 21:52:37 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrLUwCmhAptXths/yrn+7LIoeoy81pVPdxyHTUJA78ez0uMdp1b8NMPNseXHkVjb7rc+Hmd X-Received: by 2002:a17:902:5602:: with SMTP id h2-v6mr37321804pli.115.1525668757878; Sun, 06 May 2018 21:52:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525668757; cv=none; d=google.com; s=arc-20160816; b=K7+UuVyf+owT1YsN00kfAN+8oFLSBZcY2STAlSCIJ1xQQoDCXesIMolEfFfjqT5pd0 G1IXczQZcJ0jvcgL8Kfk9oRVkUAF9Educ2lU1muL/a9rwe/ipkYaMun+BpPEnNtSB+U4 wE7Y7RAKiafaq3vm9ISXQwpcRTa6BjMPldeI8zx+HWKMQ4K9l6txw7VpTt7+v9XbeOQ4 OV0VwM1Sn8FHNt2oJnF7yVBhkMWXmOMzFrpwlVXgJj4t3WELSsDifUeS4gBSVB3+IUHn WZcnhO4ChX1LjyK9PbNz/AZ77F2e0eywaTOjBMCNOgcylxOR7hAp9ZepwJxBnkzj2BrH uK1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature :arc-authentication-results; bh=0mbq72Hakz0c0r3VfBVtQmMkIAffHNJfhKms5ndRKNo=; b=sRjFLc6nla0lrgs3S0Hu10iNgE86umhsxLz5s/uKj9QUgRq0ava2Qf6PjAjiyHsBd1 9+Fi+z2XqwOjOvt6UlM80IJZEQ7R6MTqgNXZ7Ut/DZUL7FYqFWGQ62Lp5H7zJE/3rFdU JJbudBhxiKCGGDEf0v0ZCAJ2+7yI07QKCGpuAbskTFkT0U6y/bLFutrpCCXEGTCwV8Gt tozya2ZrIJRMizaq+3gRDKfrCmph8i7u4f/TWuzNEnmz0aphfKo0UQzHmlQYc6hQjrgt IAlJ6AmUhpzQlpasFTdodpFceX9sJCRmh+7KBe619DVuw1PHhJQgblf/5SaFKCLkQGSZ sPmw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=rWnsLuYO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e39-v6si20724767plg.335.2018.05.06.21.52.23; Sun, 06 May 2018 21:52:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=rWnsLuYO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751881AbeEGEwL (ORCPT + 99 others); Mon, 7 May 2018 00:52:11 -0400 Received: from mail-wr0-f169.google.com ([209.85.128.169]:33420 "EHLO mail-wr0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750811AbeEGEwJ (ORCPT ); Mon, 7 May 2018 00:52:09 -0400 Received: by mail-wr0-f169.google.com with SMTP id o4-v6so27135818wrm.0 for ; Sun, 06 May 2018 21:52:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=0mbq72Hakz0c0r3VfBVtQmMkIAffHNJfhKms5ndRKNo=; b=rWnsLuYOzDMhxX2D58z9Xctoa0hQO1FRwFWqduHt0s7h5cZl6me/ktsBJbQyCOuEJv cMVpyElm8fflhj17FJ4LsVyaqyetWIl7M/D4lEvd925NGiSx3RoHXvduSE7KThi27DBH rNlSGeH9NrfjqtuUt1qyLbgRSYmcR1FoyeZ6y2GGWbtSx4nvPkS3G9eT5hrbpxcH9QwM qdS7/TDYbuh5hnfQ8mLvc9xwo00qYX75TXGuTiJ8wV3nXSvCnfVZybzPh675LgZTemRp AIJij6bF9lpP7GBJnMyTegBsbBxqXNgI1FdiLPJz6qJLAv1cRbk6nnkcLbx26awb8NCJ 4y8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=0mbq72Hakz0c0r3VfBVtQmMkIAffHNJfhKms5ndRKNo=; b=gwyL8Cz45DCY0pDBz8wTJQ2bgSAgnTipa27fYkTMT6KDcBUwbzeJ0T9V4NtqYLdfIN v0IN2FTRIMh5cD6knfqdr3Rujyjtci5gq89NKihOzuy2KyJjTp5N9OJnMdsC+gx8LFJf a6wJeHrF1zgKEjXk8nGqdy3pIXSBaxP09IYJIgArOUfPBFXyZBV2KwsYt0YXERZPpVy+ hlzY0XAejmWUAxvzsXmtuaQwQ/fBVwTex0WdfcrniPet0PUqBSOI7q1SVAB5aUJnCogH XZ/gfO4Eo7RPUb9jTtPSjOKVmUhDTZI/PF+R7hCBITUqkyyV7p2VX8zNBG79ZgI0EQHI iiSg== X-Gm-Message-State: ALQs6tA9PgWNuwU5MgzFC+sQqp6E1ykG4FlfwiUn0bg+VsuCYq5wjslE H5ohY3Zy6PdhU/QYyMHsAfb4/WNh1MRmLt6UiQ/3XQ== X-Received: by 2002:adf:b456:: with SMTP id v22-v6mr18165108wrd.67.1525668728114; Sun, 06 May 2018 21:52:08 -0700 (PDT) MIME-Version: 1.0 References: <20180424154355.mfjgkf47kdp2by4e@black.fi.intel.com> In-Reply-To: <20180424154355.mfjgkf47kdp2by4e@black.fi.intel.com> From: Andy Lutomirski Date: Mon, 07 May 2018 04:51:57 +0000 Message-ID: Subject: Re: Proof-of-concept: better(?) page-table manipulation API To: "Kirill A. Shutemov" Cc: Andrew Morton , Michal Hocko , Linus Torvalds , X86 ML , Linux-MM , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 24, 2018 at 8:44 AM Kirill A. Shutemov < kirill.shutemov@linux.intel.com> wrote: > Hi everybody, > I've proposed to talk about page able manipulation API on the LSF/MM'2018, > so I need something material to talk about. I gave it a quick read. I like the concept a lot, and I have a few comments. > +/* > + * How manu bottom level we account to mm->pgtables_bytes > + */ > +#define PT_ACCOUNT_LVLS 3 > + > +struct pt_ptr { > + unsigned long *ptr; > + int lvl; > +}; > + I think you've inherited something that I consider to be a defect in the old code: you're conflating page *tables* with page table *entries*. Your 'struct pt_ptr' sounds like a pointer to an entire page table, but AFAICT you're using it to point to a specific entry within a table. I think that both the new core code and the code that uses it would be clearer and less error prone if you made the distinction explicit. I can think of two clean ways to do it: 1. Add a struct pt_entry_ptr, and make it so that get_ptv(), etc take a pt_entry_ptr instead of a pt_ptr. Add a helper to find a pt_entry_ptr given a pt_ptr and either an index or an address. 2. Don't allow pointers to page table entries at all. Instead, get_ptv() would take an address or an index parameter. Also, what does lvl == 0 mean? Is it the top or the bottom? I think a comment would be helpful. > +/* > + * When walking page tables, get the address of the next boundary, > + * or the end address of the range if that comes earlier. Although no > + * vma end wraps to 0, rounded up __boundary may wrap to 0 throughout. > + */ I read this comment twice, and I still don't get it. Can you clarify what this function does and why you would use it? > +/* Operations on page table pointers */ > + > +/* Initialize ptp_t with pointer to top page table level. */ > +static inline ptp_t ptp_init(struct mm_struct *mm) > +{ > + struct pt_ptr ptp ={ > + .ptr = (unsigned long *)mm->pgd, > + .lvl = PT_TOP_LEVEL, > + }; > + > + return ptp; > +} > + On some architectures, there are multiple page table roots. For example, ARM64 has a root for the kernel half of the address space and a root for the user half (at least -- I don't fully understand it). x86 PAE sort-of has four roots. Would it make sense to expose this in the API for real? For example, ptp_init(mm) could be replaced with ptp_init(mm, addr). This would make it a bit cleaner to handle an separate user and kernel tables. (As it stands, what is supposed to happen on ARM if you do ptp_init(something that isn't init_mm) and then walk it to look for a kernel address?) Also, ptp_init() seems oddly named for me. ptp_get_root_for_mm(), perhaps? There could also be ptp_get_kernel_root() to get the root for the init_mm's tables. > +static inline void ptp_walk(ptp_t *ptp, unsigned long addr) > +{ > + ptp->ptr = (unsigned long *)ptp_page_vaddr(ptp); > + ptp->ptr += __pt_index(addr, --ptp->lvl); > +} Can you add a comment that says what this function does? Why does it not change the level? > + > +static void ptp_free(struct mm_struct *mm, ptv_t ptv) > +{ > + if (ptv.lvl < PT_SPLIT_LOCK_LVLS) > + ptlock_free(pfn_to_page(ptv_pfn(ptv))); > +} > + As it stands, this is a function that seems easy easy to misuse given the confusion between page tables and page table entries. Finally, a general comment. Actually fully implementing this the way you've done it seems like a giant mess given that you need to support all architectures. But couldn't you implement the new API as a wrapper around the old API so you automatically get all architectures?