Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp4804167pxb; Wed, 20 Apr 2022 10:19:41 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw80SUvCKrbITnhar440f/IW41IORMUbc9bDQoD2bDyr5QTeGqjZZ7vFvw9kT8fFWY5aBgc X-Received: by 2002:a05:6402:254c:b0:423:e51b:e1b2 with SMTP id l12-20020a056402254c00b00423e51be1b2mr17863082edb.161.1650475181131; Wed, 20 Apr 2022 10:19:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650475181; cv=none; d=google.com; s=arc-20160816; b=lOY0xVUUWiD/qfKEHN5t3eVAEOdRWzh448MFGH8ZsvMCHxzqBfyGmf81/TN9S9tE42 6yqAeNKTAeDyloGQNeXPNRYLPISqukgGjrp9fa487pJYSP2u8l7099pdmv1sPRpTbgCC b0wwYKbyq7D+4SYpYaszA3YoY3gfhMuUKf/iUMqbnTrmYzt3J+ufi2Zz2phbZKqfmYRe cQa4oQHCHReR7vaeTjzvL6A+t6nbPDpyqtaxA8OyaZqnrgm3Sn3dalfTcrPHXDdjDFPK r0uTduyfB8OrFuvS7izDE+4ywXeFJclOVo7KKYw2iESHu3j7skeLTKEDVqP6ExEa0ns+ Vegw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=rwGltiYNuBS9HrmiZkbCsCtcJejnSS8rTsMhsahwMxc=; b=sFwclobsD6e2MFA0s8z6D9T56YpLxIrAS5QN5mhIEpMx9HW3N0PG/ROW9jyuFtW3yl mXhSQBTc7edqnZUqZYdC2q8ZFtxIVYD0KWOVpUHymtVEJAuwoAzi58pFL/IFQl0ov7LM ik9k3gtVqcuJjWPaLCpywVxPVueeGcwH49Lyvx8ufd7nyw55MlUQDZ5laXbcmSGqI7MM 3f2VJh5S3EOJWKTid5ynqV/gln7ReXXQOgvhoMIwhEiZG+V8QHkZ1+VUzRr0gmPInogq TCK/CCCT9DviL8CpOvwpT1hFJ8gEJLU72wrRus+JwhcEHzmVHH8QubecTT1gG3+SJg4r jgpQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mj23-20020a170906af9700b006efc2a9c4basi2227607ejb.733.2022.04.20.10.19.16; Wed, 20 Apr 2022 10:19:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1352065AbiDTKjc (ORCPT + 99 others); Wed, 20 Apr 2022 06:39:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37440 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377741AbiDTKja (ORCPT ); Wed, 20 Apr 2022 06:39:30 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BCB53FBF9 for ; Wed, 20 Apr 2022 03:36:44 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id DE879B81EAA for ; Wed, 20 Apr 2022 10:36:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8925BC385A0; Wed, 20 Apr 2022 10:36:39 +0000 (UTC) Date: Wed, 20 Apr 2022 11:36:36 +0100 From: Catalin Marinas To: Ard Biesheuvel Cc: Herbert Xu , Will Deacon , Marc Zyngier , Arnd Bergmann , Greg Kroah-Hartman , Andrew Morton , Linus Torvalds , Linux Memory Management List , Linux ARM , Linux Kernel Mailing List , "David S. Miller" Subject: Re: [PATCH 07/10] crypto: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 19, 2022 at 11:50:11PM +0200, Ard Biesheuvel wrote: > On Mon, 18 Apr 2022 at 18:44, Catalin Marinas wrote: > > On Mon, Apr 18, 2022 at 04:37:17PM +0800, Herbert Xu wrote: > > > I've seen Ard's patches already and I think I understand what your > > > needs are. So let me whip up some code to show you guys what I > > > think needs to be done. > > > > BTW before you have a go at this, there's also Linus' idea that does not > > change the crypto code (at least not functionally). Of course, you and > > Ard can still try to figure out how to reduce the padding but if we go > > with Linus' idea of a new GFP_NODMA flag, there won't be any changes to > > the crypto code as long as it doesn't pass such flag. So, the options: > > > > 1. Change ARCH_KMALLOC_MINALIGN to 8 (or ARCH_SLAB_MINALIGN if higher) > > while keeping ARCH_DMA_MINALIGN to 128. By default kmalloc() will > > honour the 128-byte alignment, unless GDP_NODMA is passed. This still > > requires changing CRYPTO_MINALIGN to ARCH_DMA_MINALIGN but there is > > no functional change, kmalloc() without the new flag will return > > CRYPTO_MINALIGN-aligned pointers. > > > > 2. Leave ARCH_KMALLOC_MINALIGN as ARCH_DMA_MINALIGN (128) and introduce > > a new GFP_PACKED (I think it fits better than 'NODMA') flag that > > reduces the minimum kmalloc() below ARCH_KMALLOC_MINALIGN (and > > probably at least ARCH_SLAB_MINALIGN). It's equivalent to (1) but > > does not touch the crypto code at all. > > > > (1) and (2) are the same, just minor naming difference. Happy to go with > > any of them. They still have the downside that we need to add the new > > GFP_ flag to those hotspots that allocate small objects (Arnd provided > > an idea on how to find them with ftrace) but at least we know it won't > > inadvertently break anything. > > I'm not sure GFP_NODMA adds much here. What it buys is that the crypto code won't need to be changed immediately, so we can go ahead with the kmalloc() optimisation while you and Herbert figure out how to refactor the crypto code. Another option is to make the change but pass a new GFP_DMA_MINALIGN flag to all kmalloc() calls in the crypto code (or a new dma_kmalloc() that Linus suggested). This way the allocations in the crypto code are guaranteed to be ARCH_DMA_MINALIGN (we'd still change CRYPTO_MINALIGN to this). > The way I see it, the issue in the crypto code is that we are relying > on a ARCH_KMALLOC_ALIGN aligned zero length __ctx[] array for three > different things: > - ensuring/informing the compiler that top-level request/TFM > structures are aligned to ARCH_KMALLOC_ALIGN, > - adding padding to ensure that driver context structures that are > embedded in those top-level request/TFM structures are sufficiently > aligned so that any member C types appear at the expected alignment > (but those structures are not usually defined as being aligned to > ARCH_KMALLOC_ALIGN) In this case, a void * array type would cover most structures that don't have special alignment needs. In both the above cases, we can get ARCH_KMALLOC_MINALIGN down to 8. > - adding padding to ensure that these driver context structures do not > share cache lines with the preceding top-level struct. > > One thing to note here is that the padding is only necessary when the > driver context size > 0, and has nothing to do with the alignment of > the top-level struct. > > Using a single aligned ctx member was a nice way to accomplish all of > these when it was introduced, but I think it might be better to get > rid of it, and move the padding logic to the static inline helpers > instead. > > So something like > > struct skcipher_request { > ... > } CRYPTO_MINALIGN_ATTR; > > which states/ensures the alignment of the struct, and > > void *skcipher_request_ctx(struct skcipher_request *req) { > return (void *)PTR_ALIGN(req + 1, ARCH_DMA_MINALIGN); > } > > to get at the context struct, instead of using a struct field. > > Then, we could update skcipher_request_alloc() to only round up > sizeof(struct skipher_request) to ARCH_DMA_MINALIGN if the reqsize is > >0 to begin with, and if it is, to also round reqsize up to > ARCH_DMA_MINALIGN when accessed. That way, we spell out the DMA > padding requirements with relying on aligned struct members. I think this should work and CRYPTO_MINALIGN can go down to whatever ARCH_KMALLOC_MINALIGN without breaking the non-coherent DMA. -- Catalin