Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp622634pxb; Thu, 21 Apr 2022 07:04:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxils3VdNvL8zQysp45vRU5X1QukzYrx3Gm8Obkf4IOeQ6K1ofYTHfEZVm73SDQQ8t1MNxC X-Received: by 2002:a17:907:2d24:b0:6f0:9f6:b3af with SMTP id gs36-20020a1709072d2400b006f009f6b3afmr7084817ejc.727.1650549873837; Thu, 21 Apr 2022 07:04:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650549873; cv=none; d=google.com; s=arc-20160816; b=1DzujRr/AVbzxRRpLDT19p/JUAK+z2iGPzcT0OrmsgDdnSjp7IrkXr2d4Nuf2BAJHL X+seVDXxd9C8+dSn1gi1wykJi3GcoSWKxSiPaRDMHpWWoMJPViKlxeihbE48Qz38bUCb 2H8snUTQsbvkXZtpMhGgxJZMQTL0sePRl/jdPXoZrVynah/u2yyD+Y1qXGUzoUVbNbPN PqVxAJUYNqJr5SjCnR1SMsJr6COMYU2c06/o2dBPUPHJ4FHG5wmpMc4aiOUZ7YzVdYAB EGREH8o0HP5M6iVJDI1j4Xqa440yrhntJXO+t891IjEaGHHscRHcwhF2xGSyyC+nhjbd LqQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=9JD4VE20++Pp8gml6fFuxvyquyJzqWepSLLtlDinftM=; b=zhkdeaV0bR1nCudmecXkyrzMaCitsNrV6altR38CwOf/4jZWTn6phlVe227+cj7rui K9fcI44aGi5Hh0sy0YOP99KH7WIqJ+Op2ryz1lH3ZoFqtP0fGUE6QTfjL0bniO04kIg+ BlRqByakg9vi0EDLKkQAKXMpqY2fFQx9XRBYXu+4Nenw3iRyVp6sWSNrm5WFyIzmySTA RK/eINi2AhEvNGPiil9YGoXkMonttG2ILZmerlW1xt/hIgV0AnT+ok+V3RVY881ivDB+ Dih1xpktMgSNOWOo2amh3yIQhDPoVSM/3ety4qIWVY7kriu8Tn6Q51GZv42EWiHGen6F 3BbQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j16-20020a05640211d000b00418c2b5bf18si4275967edw.506.2022.04.21.07.04.07; Thu, 21 Apr 2022 07:04:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1388695AbiDULJ6 (ORCPT + 99 others); Thu, 21 Apr 2022 07:09:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48490 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1388612AbiDULJz (ORCPT ); Thu, 21 Apr 2022 07:09:55 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 83C801036 for ; Thu, 21 Apr 2022 04:07:05 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 20A9A61B04 for ; Thu, 21 Apr 2022 11:07:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1FDB7C385A5; Thu, 21 Apr 2022 11:07:01 +0000 (UTC) Date: Thu, 21 Apr 2022 12:06:58 +0100 From: Catalin Marinas To: Christoph Hellwig Cc: Arnd Bergmann , Ard Biesheuvel , Herbert Xu , Will Deacon , Marc Zyngier , Greg Kroah-Hartman , Andrew Morton , Linus Torvalds , Linux Memory Management List , Linux ARM , Linux Kernel Mailing List , "David S. Miller" Subject: Re: [PATCH 07/10] crypto: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 21, 2022 at 12:20:22AM -0700, Christoph Hellwig wrote: > Btw, there is another option: Most real systems already require having > swiotlb to bounce buffer in some cases. We could simply force bounce > buffering in the dma mapping code for too small or not properly aligned > transfers and just decrease the dma alignment. We can force bounce if size is small but checking the alignment is trickier. Normally the beginning of the buffer is aligned but the end is at some sizeof() distance. We need to know whether the end is in a kmalloc-128 cache and that requires reaching out to the slab internals. That's doable and not expensive but it needs to be done for every small size getting to the DMA API, something like (for mm/slub.c): folio = virt_to_folio(x); slab = folio_slab(folio); if (slab->slab_cache->align < ARCH_DMA_MINALIGN) ... bounce ... (and a bit different for mm/slab.c) If we scrap ARCH_DMA_MINALIGN altogether from arm64, we can check the alignment against cache_line_size(), though I'd rather keep it for code that wants to avoid bouncing and goes for this compile-time alignment. I think we are down to four options (1 and 2 can be combined): 1. ARCH_DMA_MINALIGN == 128, dynamic arch_kmalloc_minalign() to reduce kmalloc() alignment to 64 on most arm64 SoC - this series. 2. ARCH_DMA_MINALIGN == 128, ARCH_KMALLOC_MINALIGN == 128, add explicit __GFP_PACKED for small allocations. It can be combined with (1) so that allocations without __GFP_PACKED can still get 64-byte alignment. 3. ARCH_DMA_MINALIGN == 128, ARCH_KMALLOC_MINALIGN == 8, swiotlb bounce. 4. undef ARCH_DMA_MINALIGN, ARCH_KMALLOC_MINALIGN == 8, swiotlb bounce. (3) and (4) don't require histogram analysis. Between them, I have a preference for (3) as it gives drivers a chance to avoid the bounce. If (2) is feasible, we don't need to bother with any bouncing or structure alignments, it's an opt-in by the driver/subsystem. However, it may be tedious to analyse the hot spots. While there are a few obvious places (kstrdup), I don't have access to a multitude of devices that may exercise the drivers and subsystems. With (3) the risk is someone complaining about performance or even running out of swiotlb space on some SoCs (I guess the fall-back can be another kmalloc() with an appropriate size). I guess we can limit the choice to either (2) or (3). I have (2) already (needs some more testing). I can attempt (3) and try to run it on some real hardware to see the perf impact. -- Catalin