Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp2737151rwd; Fri, 26 May 2023 10:25:32 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7UqG7SOPifvd9UDCVnZCaNhC7Om1pryBBSHcx3r0MDopmStaetdh92N8zSlBZxlQd00Q64 X-Received: by 2002:a17:902:d4ca:b0:1b0:440:7f5f with SMTP id o10-20020a170902d4ca00b001b004407f5fmr4058600plg.49.1685121932648; Fri, 26 May 2023 10:25:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685121932; cv=none; d=google.com; s=arc-20160816; b=XO/eJ3tv9RiSssvMVDDw4jZYWU1OsyY7S1zH9kOHbYdBgVWlCuuZ9AHBslgBvmvhhd kWBsIjAjQS/8/HEIFh1RMFy8J/OpJjZCtHKK08bdz88T7Ydan80gfuL3Y7dTUrZ8gt1k 7fEi00BcSUeBWT0gDF1ENzbYfCWWNSukVFf79g9xwqnirukvOljTyVEcKaWUZ+zzYjTO ZIHElDIhUiriQzJLmiNIfcpvBf4uLnRBJ4fNZbUccbrFdegjqOEZ/JMBwcf4mtcXr4sj 3CW5YUJwfdE9TC75RHAQ2TuozTuvm5n6zwZY2XCNvkzqjnNwpHeIzUkeAjDzYYJDU1Jt T1qA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=gSbgvItc65HDiMwvG4nVg+Z/WNKqHrK68+GIyGBAKZg=; b=nNQ4JCUPSKk7PQPz73MM2whIswHROga6YV9eHcN6GthSkbsTOWLXkFBV8Ym6I422Tw NML1MAfOPA7SsJyfNk5Qj4XVe5PXvz7VeCjt2ZZgFcRSuUeEDaSKOQ8ra7gsrdqWGFwU gzApO2IkeLoCuub8oru7F/rXOpq/MtQtXstxmWsDDZCHNe3hk1r/tBsrNwsBrnCbf1/q XGWrSNjLrfCPiZdezhcz0eREjhOr4ixaDcQrqxUfwCYYMk7vq23c5tG33iZ53vSDTKqf Mzo5rTVoZOuyuwCsV8Khi6FwMgUU2JMI7HKZnNDgNdtzbQ4rSIFt3iR7aXVRNX6rt76a 7XTA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Hv54ZxcX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k16-20020a170902c41000b001aafecaa768si4789099plk.641.2023.05.26.10.25.20; Fri, 26 May 2023 10:25:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Hv54ZxcX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242344AbjEZRLS (ORCPT + 99 others); Fri, 26 May 2023 13:11:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33342 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242299AbjEZRLP (ORCPT ); Fri, 26 May 2023 13:11:15 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 383711B7 for ; Fri, 26 May 2023 10:11:12 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 8EFCD651BE for ; Fri, 26 May 2023 17:11:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8FCC7C433AA; Fri, 26 May 2023 17:11:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1685121071; bh=sQf8nvL0qon3rojAWzrVu8GvM6yNzwhTLBg4MCurido=; h=From:To:Cc:Subject:Date:From; b=Hv54ZxcXVKtKv17i3AEUpmzl9SWIGIcLpo2GNRqvoxU2UD1Yi2b9rqR6gqoFsgOoj PGtah19agVR5zsZrFgR4gfRptCpu+K9TGmmFAloRVWVy+DsFhDhA10psD7uaTmsNQI yvWPu/7WvOof6S4bSD+9OQt5r1P80vdruJ6KQW2uHdJ8Vr54EAiHcxrEFMzm4Dpe1p WDi3gyeLEky9rTKaUbkMFNi4EANajxna/wBVazl0/relBIHVmTHZS1kMOZeTFmTSC/ qAv6XY9/NqZoy1p4Wwmy7xL1SMU/b1h2GuBioJ62pAL3Yl1DFhz00G6n2OWL50jW3R EEngp5IjwgnXg== From: Jisheng Zhang To: Paul Walmsley , Palmer Dabbelt , Albert Ou Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Catalin Marinas Subject: [PATCH 0/6] riscv: Reduce ARCH_KMALLOC_MINALIGN to 8 Date: Sat, 27 May 2023 00:59:52 +0800 Message-Id: <20230526165958.908-1-jszhang@kernel.org> X-Mailer: git-send-email 2.40.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently, riscv defines ARCH_DMA_MINALIGN as L1_CACHE_BYTES, I.E 64Bytes, if CONFIG_RISCV_DMA_NONCOHERENT=y. To support unified kernel Image, usually we have to enable CONFIG_RISCV_DMA_NONCOHERENT, thus it brings some bad effects to for coherent platforms: Firstly, it wastes memory, kmalloc-96, kmalloc-32, kmalloc-16 and kmalloc-8 slab caches don't exist any more, they are replaced with either kmalloc-128 or kmalloc-64. Secondly, larger than necessary kmalloc aligned allocations results in unnecessary cache/TLB pressure. This issue also exists on arm64 platforms. From last year, Catalin tried to solve this issue by decoupling ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN, limiting kmalloc() minimum alignment to dma_get_cache_alignment() and replacing ARCH_KMALLOC_MINALIGN usage in various drivers with ARCH_DMA_MINALIGN etc. One fact we can make use of for riscv: if the CPU doesn't support ZICBOM or T-HEAD CMO, we know the platform is coherent. Based on Catalin's work and above fact, we can easily solve the kmalloc align issue for riscv: we can override dma_get_cache_alignment(), then let it return ARCH_DMA_MINALIGN at the beginning and return 1 once we know the underlying HW neither supports ZICBOM nor supports T-HEAD CMO. So what about if the CPU supports ZICBOM and T-HEAD CMO, but all the devices are dma coherent? Well, we use ARCH_DMA_MINALIGN as the kmalloc minimum alignment, nothing changed in this case. This case can be improved in the future. After this patch, a simple test of booting to a small buildroot rootfs on qemu shows: kmalloc-96 5041 5041 96 ... kmalloc-64 9606 9606 64 ... kmalloc-32 5128 5128 32 ... kmalloc-16 7682 7682 16 ... kmalloc-8 10246 10246 8 ... So we save about 1268KB memory. The saving will be much larger in normal OS env on real HW platforms. patch 1,2,3,4 are either clean up or preparation patches. patch5 allows kmalloc() caches aligned to the smallest value. patch6 enables DMA_BOUNCE_UNALIGNED_KMALLOC. After this series: As for coherent platforms, kmalloc-{8,16,32,96} caches come back on coherent both RV32 and RV64 platforms, I.E !ZICBOM and !THEAD_CMO. As for noncoherent RV32 platforms, nothing changed. As for noncoherent RV64 platforms, I.E either ZICBOM or THEAD_CMO, the above kmalloc caches also come back if > 4GB memory or users pass "swiotlb=mmnn,force" to force swiotlb creation if <= 4GB memory. How much mmnn should be depends on the specific platform, it need to be tried and tested all possible usage case on the specific hardware. For example, I can use the minimal I/O TLB slabs on Sipeed M1S Dock. [1] Link: https://lore.kernel.org/linux-arm-kernel/20230524171904.3967031-1-catalin.marinas@arm.com/ Jisheng Zhang (6): riscv: errata: thead: only set cbom size & noncoherent during boot riscv: mm: mark CBO relate initialization funcs as __init riscv: mm: mark noncoherent_supported as __ro_after_init riscv: mm: pass noncoherent or not to riscv_noncoherent_supported() riscv: allow kmalloc() caches aligned to the smallest value riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherent arch/riscv/Kconfig | 1 + arch/riscv/errata/thead/errata.c | 22 ++++++++++++++-------- arch/riscv/include/asm/cache.h | 14 ++++++++++++++ arch/riscv/include/asm/cacheflush.h | 4 ++-- arch/riscv/kernel/setup.c | 6 +++++- arch/riscv/mm/cacheflush.c | 8 ++++---- arch/riscv/mm/dma-noncoherent.c | 16 +++++++++++----- 7 files changed, 51 insertions(+), 20 deletions(-) -- 2.40.1