Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp1025499pxb; Fri, 15 Apr 2022 18:45:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxmKl1jXAgyAf1Kmwz5CmdXq7sA7wZpl/QQgxaMSetPcos7wgaNbGPV8M4f6uk6/OPHOwp5 X-Received: by 2002:a17:90b:1d0a:b0:1d0:a15f:2851 with SMTP id on10-20020a17090b1d0a00b001d0a15f2851mr6678394pjb.218.1650073556100; Fri, 15 Apr 2022 18:45:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650073556; cv=none; d=google.com; s=arc-20160816; b=di5gFU/93RPsntkyVPS72m9gvnZlWLQZ8dedRjeBzqYBBbh8eaZpmh7VQaAqQfYoZE Cd1ak+SmyzdZnMLVZscYkc2MyNtpS/WrJxsJIdjepS8qdC3IGWva5BzPsXcqETbMqHWI 1Oq4YVY+62alTFEkzqbliD29fguzHacBjg5AqYEAnJZAIwKqDt1SeK5wxIun8DsXaBw3 pyzBh9gm92xrpuL4+WaQkm7xR2OvJxoIwMuF683ZlULNSEkIEyMm3L56AUch+BnY1JD9 q367+zP/AzEVgdqurCH+yjDz38qedztpbC0Gg5brZ7fLki+s678A3a/rEO0XwOL/fp6a qIlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=+1fr+NTRK9qGPO1oilrR4CekG7Lw3+uPWCGzXhZ3j8s=; b=SKkdSnPHkSXYSVZXIiD0I+XWw1IqNJPDU6rQnZvYflyGlMewsDMgXq47UKStAN+NYX 7zHfkSj970eqW2uVqAhwkzuYXnPN9TlT7ziMF6m17Wc0TiGJEMsM1pOtk2a7+Qmz/E0b NGGaRwAWACB6GkDWI7Y106qIPPS9axJBk5OAE0lKbjXQuHpJQeXZ5gajahLik10R8ny0 teNCpue43sKKT7NmowZWCPoy4xrS/0w6ugY4qLGRNmELhVpKdutIcIafsOU5EtVcpmdg 8G17a+tcZyPuNsoX+dmfFqz3cjIDr3Lfm8NXzNpKq207xykkiEXj7RkhHQyi2rhL8JaW nj7Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id cq18-20020a056a00331200b0050a3e40daffsi2687353pfb.156.2022.04.15.18.45.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Apr 2022 18:45:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B8FFF173B3C; Fri, 15 Apr 2022 18:09:31 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345946AbiDNTwP (ORCPT + 99 others); Thu, 14 Apr 2022 15:52:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34466 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345914AbiDNTwO (ORCPT ); Thu, 14 Apr 2022 15:52:14 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D5C63CA7B for ; Thu, 14 Apr 2022 12:49:48 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id DE17CB82BAB for ; Thu, 14 Apr 2022 19:49:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8A81FC385A1; Thu, 14 Apr 2022 19:49:43 +0000 (UTC) Date: Thu, 14 Apr 2022 20:49:40 +0100 From: Catalin Marinas To: Linus Torvalds Cc: Ard Biesheuvel , Herbert Xu , Will Deacon , Marc Zyngier , Arnd Bergmann , Greg Kroah-Hartman , Andrew Morton , Linux Memory Management List , Linux ARM , Linux Kernel Mailing List , "David S. Miller" Subject: Re: [PATCH 07/10] crypto: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 13, 2022 at 09:53:24AM -1000, Linus Torvalds wrote: > On Tue, Apr 12, 2022 at 10:47 PM Catalin Marinas > wrote: > > I agree. There is also an implicit expectation that the DMA API works on > > kmalloc'ed buffers and that's what ARCH_DMA_MINALIGN is for (and the > > dynamic arch_kmalloc_minalign() in this series). But the key point is > > that the driver doesn't need to know the CPU cache topology, coherency, > > the DMA API and kmalloc() take care of these. > > Honestly, I think it would probably be worth discussing the "kmalloc > DMA alignment" issues. > > 99.9% of kmalloc users don't want to do DMA. > > And there's actually a fair amount of small kmalloc for random stuff. > Right now on my laptop, I have > > kmalloc-8 16907 18432 8 512 1 : ... > > according to slabinfo, so almost 17 _thousand_ allocations of 8 bytes. > > It's all kinds of sad if those allocations need to be 64 bytes in size > just because of some silly DMA alignment issue, when none of them want > it. It's a lot worse, ARCH_KMALLOC_MINALIGN is currently 128 bytes on arm64. I want to at least get it down to 64 with this series while preserving the current kmalloc() semantics. If we know the SoC is fully coherent (a bit tricky with late probed devices), we could get the alignment down to 8. In the mobile space, unfortunately, most DMA is non-coherent. I think it's worth investigating the __dma annotations that Greg suggested, though I have a suspicion it either is too difficult to track or we just end up with this annotation everywhere. There are cases where the memory is allocated outside the driver that knows the DMA needs, though I guess these are either full page allocations or kmem_cache_alloc() (e.g. page cache pages, skb). There's also Ard's suggestion to bounce the (inbound DMA) buffer if not aligned. That's doable but dma_map_single(), for example, only gets the size of some random structure/buffer. If the size is below ARCH_DMA_MINALIGN (or cache_line_size()), the DMA API implementation would have to retrieve the slab cache, check the real allocation size and then bounce if necessary. Irrespective of which option we go for, I think at least part of this series decoupling ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN is still needed since currently the minalign is used in some compile time attributes. Even getting the kmalloc() size down to 64 is a significant improvement over 128. Subsequently I'd attempt Ard's bouncing idea as a quick workaround and assess the bouncing overhead on some real platforms. This may be needed before we track down all places to use dma_kmalloc(). I need to think some more on Greg's __dma annotation, as I said the allocation may be decoupled from the driver in some cases. -- Catalin