Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp746110imu; Thu, 22 Nov 2018 04:52:34 -0800 (PST) X-Google-Smtp-Source: AJdET5etSDyc6RYiub9E1l6y4Fn1GtNlvN6cVWkCgCCebFQNP54245i+7Ou9015FhZ96dH59liPA X-Received: by 2002:aa7:828a:: with SMTP id s10-v6mr11103999pfm.63.1542891153971; Thu, 22 Nov 2018 04:52:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542891153; cv=none; d=google.com; s=arc-20160816; b=cX18wO8gIzrFqcBGgnHSYgEEf+2TEk2BKm8H+P4AKIo0LBI6Zw5m+qeQwPIvcX+9Vt 82G5B3oqIUxCvEu/HOEzU8Xa2aqYKwLeEU7JRgsTrKhaLqbP1RKekhDG/qccDzjUa7ZM dhNoeHgdpp22q6ueLy0IcwJDuHJGysUJyj660dSiUP8xOWQdyXP9tAENeIiuqX/qOlsf gcBzDmYXkRP0kltAWyaZ7oIOkmpUn4cJxSLyIOhP+KFjnkFuQ7BjvSml28Lg46k53tVI Nq7fT8FvsQGGG/BfXGhZML5MPULGOyFWZWin1u+NSoEo+Qxpamk81QZv6JlloCDZarDg vy9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=H88b1cmdsQaoqoZRWizaECbaO0xqVJmOgfKDRxRxpmQ=; b=xopv0o+f2bckRciecFPPv4oPxa2KqZq3aWZdplHBI1mk/35dSdgMV8iZO3Udn5kWiR vFlopiuMc3k/SXOT3Jj/22RBYqNd6+B3+aEzZ33QfNML/mACQA+vVIDWLcVfZFSbTc+B N+TKqWKyVKfAuW2saeQpYtujzNYosC6WJ2QDqbgBsFkeC+4Qo1gWrGn6FggxMSfwLBg2 G+vhnsOEePgqJPlAJzL5zCLxtLGTD5dLIR0JvW/59e9D5Y394Locovlhz3nu/3Qz63bA 77a+CJlZK/EWFi28TpksQLvd+A9x2bLKPcyMInf9/usozRoF99EAJSDWE5hk2sksO+v7 CRJw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="A/8CsJQu"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i12si34091260pgq.466.2018.11.22.04.52.19; Thu, 22 Nov 2018 04:52:33 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="A/8CsJQu"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388594AbeKVLmf (ORCPT + 99 others); Thu, 22 Nov 2018 06:42:35 -0500 Received: from mail-pl1-f195.google.com ([209.85.214.195]:45044 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730060AbeKVLmf (ORCPT ); Thu, 22 Nov 2018 06:42:35 -0500 Received: by mail-pl1-f195.google.com with SMTP id s5-v6so7872046plq.11 for ; Wed, 21 Nov 2018 17:05:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=H88b1cmdsQaoqoZRWizaECbaO0xqVJmOgfKDRxRxpmQ=; b=A/8CsJQuup1VRcpAYA9SR4z7C3uaETeqL5QqwT76vV7znii0KUhHbYeR7OHGdRSsmv T+5oK5DCI/jo9i/GbH+ocWUll1oI8dpGXauNR+ZqE/c1+nW1w5EYDzVK/YQcBnxr7Sqg 21bZOvHrKM0/gUthVulYOhiMjUXbTIZyoe5YU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=H88b1cmdsQaoqoZRWizaECbaO0xqVJmOgfKDRxRxpmQ=; b=MFroIfV2h3RXgxOjiaXQofTtnfaFByeHKL7lbpmp4873EBlv67nxlHqtnbjWCSRvIR abusfP79RH30IT/7kwspc9WzgAdwwamoxWwdNJ3cvxL4/8yx4vmm6yHzIJSXXGsgNdYy q1h40yRGA6XcHy+BvwbVziTYl9f7bo4S538XnWSJ4A3wHgoAuJcAucZHbgvJ8dR3GAOO MepwnzdNpmdpnMvwD2QVCnkDERfeJJ/dPEGoiurzR5rMQ+VBgaegdCMo66Eg2kkDVkmx xkJdNYirFK3/tr9gLa51ln8+NsQjMP70oLLoQwju9+mhpdNuIc0VbL1GQ01TXkmfPy1d W6YA== X-Gm-Message-State: AA+aEWbLfYlq6TcYW2hhGAn7lQgBzawvmu+BoFlA3rVRMsHLSg59MDCC 21rnB1/SQy7wzUOsoMy7glMvOifQM6Q5ljnbpzIY3u4/dNzeODPw X-Received: by 2002:a17:902:8214:: with SMTP id x20-v6mr8712162pln.224.1542848742049; Wed, 21 Nov 2018 17:05:42 -0800 (PST) MIME-Version: 1.0 References: <20181111090341.120786-1-drinkcat@chromium.org> <0100016737801f14-84f1265d-4577-4dcf-ad57-90dbc8e0a78f-000000@email.amazonses.com> <20181121213853.GL3065@bombadil.infradead.org> In-Reply-To: From: Nicolas Boichat Date: Thu, 22 Nov 2018 09:05:30 +0800 Message-ID: Subject: Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables To: Robin Murphy Cc: willy@infradead.org, Christoph Lameter , Will Deacon , Joerg Roedel , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , Vlastimil Babka , Michal Hocko , Mel Gorman , Levin Alexander , Huaisheng Ye , Mike Rapoport , linux-arm Mailing List , iommu@lists.linux-foundation.org, lkml , linux-mm@kvack.org, Yong Wu , Matthias Brugger , Tomasz Figa , yingjoe.chen@mediatek.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 22, 2018 at 6:27 AM Robin Murphy wrote: > > On 2018-11-21 9:38 pm, Matthew Wilcox wrote: > > On Wed, Nov 21, 2018 at 06:20:02PM +0000, Christopher Lameter wrote: > >> On Sun, 11 Nov 2018, Nicolas Boichat wrote: > >> > >>> This is a follow-up to the discussion in [1], to make sure that the page > >>> tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit > >>> physical address space. > >> > >> Page tables? This means you need a page frame? Why go through the slab > >> allocators? > > > > Because this particular architecture has sub-page-size PMD page tables. > > We desperately need to hoist page table allocation out of the architectures; > > there're a bunch of different implementations and they're mostly bad, > > one way or another. > > These are IOMMU page tables, rather than CPU ones, so we're already well > outside arch code - indeed the original motivation of io-pgtable was to > be entirely independent of the p*d types and arch-specific MM code (this > Armv7 short-descriptor format is already "non-native" when used by > drivers in an arm64 kernel). > > There are various efficiency reasons for using regular kernel memory > instead of coherent DMA allocations - for the most part it works well, > we just have the odd corner case like this one where the 32-bit format > gets used on 64-bit systems such that the tables themselves still need > to be allocated below 4GB (although the final output address can point > at higher memory by virtue of the IOMMU in question not implementing > permissions and repurposing some of those PTE fields as extra address bits). > > TBH, if this DMA32 stuff is going to be contentious we could possibly > just rip out the offending kmem_cache - it seemed like good practice for > the use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied > upon to give the same 1KB alignment and chance of succeeding as the > equivalent kmem_cache_alloc(), then we could quite easily make do with > that instead. Yes, but if we want to use kzalloc, we'll need to create kmalloc_caches for DMA32, which seems wasteful as there are no other users (see my comment here: https://patchwork.kernel.org/patch/10677525/#22332697). Thanks, > Thanks, > Robin. > > > For each level of page table we generally have three cases: > > > > 1. single page > > 2. sub-page, naturally aligned > > 3. multiple pages, naturally aligned > > > > for 1 and 3, the page allocator will do just fine. > > for 2, we should have a per-MM page_frag allocator. s390 already has > > something like this, although it's more complicated. ppc also has > > something a little more complex for the cases when it's configured with > > a 64k page size but wants to use a 4k page table entry. > > > > I'd like x86 to be able to simply do: > > > > #define pte_alloc_one(mm, addr) page_alloc_table(mm, addr, 0) > > #define pmd_alloc_one(mm, addr) page_alloc_table(mm, addr, 0) > > #define pud_alloc_one(mm, addr) page_alloc_table(mm, addr, 0) > > #define p4d_alloc_one(mm, addr) page_alloc_table(mm, addr, 0) > > > > An architecture with 4k page size and needing a 16k PMD would do: > > > > #define pmd_alloc_one(mm, addr) page_alloc_table(mm, addr, 2) > > > > while an architecture with a 64k page size needing a 4k PTE would do: > > > > #define ARCH_PAGE_TABLE_FRAG > > #define pte_alloc_one(mm, addr) pagefrag_alloc_table(mm, addr, 4096) > > > > I haven't had time to work on this, but perhaps someone with a problem > > that needs fixing would like to, instead of burying yet another awful > > implementation away in arch/ somewhere. > >