Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp942640pxb; Fri, 15 Apr 2022 16:08:46 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwSJpw57dhaCuG9/oDG+mTD+5XhUxpMD+kLblk5DJgyC6pRWouGARmg3+eE2aacto4otspI X-Received: by 2002:a17:907:980b:b0:6e8:b8a8:d045 with SMTP id ji11-20020a170907980b00b006e8b8a8d045mr976087ejc.460.1650064125229; Fri, 15 Apr 2022 16:08:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650064125; cv=none; d=google.com; s=arc-20160816; b=bCcCgQjwrf1UHxNoBAhNTKitP09w6HnONtnhImTzAhCnzAMDP33G4EmVp+TVMN69L8 ENqLXLF6tdIoXpCXyt44NjDHMcgyQD/jNfHcB63gUQ+v+WwwxLSFSnxc2eKMO8OWRdzK /q+Ol+gq0mk0G0BsILRCn7T+i7JOl/yhznLRd4X9jFTgVLhSJUvvCuo+JqeE5YyMca+S nDYcg9rdIYVSi3DUC2eNuGYnntPsq0VtOF3i5+phuCO6/vMAGCvhu07+l9F2iCg/G2q5 AOKuI+82ECHS0lemmfRWknABU1/U2jOTj4vF8G0ih/Dd7WshLHuOav3UU2xrPWWmKEmh BXDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=dfu3MCcXrNS4JvKI+n1oiWSBhNhJUzT4DadM7AsVXrs=; b=vw8pDW/Nl2T1v3aAQYpgSmPH4wQJIO0fIgPyPmpy5rGubB6tcRjJytZQ3Sv7o1OThV uMuxaVztuSoEoUrc2NCSRpu/8Tgh6BMTAls7sHKoiFFmr+QuuoIpT1BMYgl3tWXuk1Cw j1bFfXpmkgW5h8QsjP1JdGQXahlQNdUv/0Pbr64iUsSEu9EzjYDZLKJMaEas++v3YiJj 4XFwQkm4qTYxYGc1G/NuzWA6/gU/+ue2GPJJ5pksr4QdmXxCiIvdyBLJdOXO3Bn+F7fi sPwHfjdHhMR8jK3GSHwpLfEMFk1gy6/0ztRVdjgev+4v3kvpvPi7Lm7nA1V3wQ/OmRKN tWig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=P03NZEvL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mp38-20020a1709071b2600b006ef5e62fd61si2575903ejc.744.2022.04.15.16.08.18; Fri, 15 Apr 2022 16:08:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=P03NZEvL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1359373AbiDNPoW (ORCPT + 99 others); Thu, 14 Apr 2022 11:44:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49350 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1352570AbiDNPRj (ORCPT ); Thu, 14 Apr 2022 11:17:39 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6CAEBB927 for ; Thu, 14 Apr 2022 08:01:40 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 426C861976 for ; Thu, 14 Apr 2022 15:01:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A1059C385B0 for ; Thu, 14 Apr 2022 15:01:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1649948499; bh=uLBpQvyEuj7UIyisLgnSpnl7dIxguh1LQBJxeQR74JA=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=P03NZEvLNnl/qHpOe9RZzVKFIzmSB+T7HTapxwWoFBMr87YqR9pQXB6kyXdbkBnMG 6j7qWLQm6ZrhqYn00VggYn4pO0puIBKuEuz5eL6Z17s2PYTJdwAHhIy8F/tG/GW75F HEJdWdHnvhU9GQGobD4788L1uXVxbtNjZMnpe+VKm2mhlsBEG8dBkZkCzJf8CKxBIS MLda3oVzcMK+hZGtmkKwyZ0t4QQ4T4OKZEIftdJSKGOOV1Z3PghS0w3Y/pPlnff8vR 3yAK/vBB8PS7JWhhI7MOAjLrFE8aeJDbZ0T+CrBF4pfh48C5D0ehFDeieE/OBWdS8y h+B2AkLfQx9yw== Received: by mail-oa1-f52.google.com with SMTP id 586e51a60fabf-de3eda6b5dso5563247fac.0 for ; Thu, 14 Apr 2022 08:01:39 -0700 (PDT) X-Gm-Message-State: AOAM533wPb40cU9oNguMN21mPibzeIIv3U6sPz172t+Uh/u4shjnsVLv SJ0ZdNF5KCh3miqWWFGy15mvxYrvMpCA3CozhzI= X-Received: by 2002:a05:6870:eaa5:b0:da:b3f:2b45 with SMTP id s37-20020a056870eaa500b000da0b3f2b45mr1774023oap.228.1649948498648; Thu, 14 Apr 2022 08:01:38 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ard Biesheuvel Date: Thu, 14 Apr 2022 17:01:26 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 07/10] crypto: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN To: Greg Kroah-Hartman Cc: Linus Torvalds , Catalin Marinas , Herbert Xu , Will Deacon , Marc Zyngier , Arnd Bergmann , Andrew Morton , Linux Memory Management List , Linux ARM , Linux Kernel Mailing List , "David S. Miller" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 14 Apr 2022 at 16:53, Greg Kroah-Hartman wrote: > > On Thu, Apr 14, 2022 at 04:36:46PM +0200, Ard Biesheuvel wrote: > > On Thu, 14 Apr 2022 at 16:27, Greg Kroah-Hartman > > wrote: > > > > > > On Thu, Apr 14, 2022 at 03:52:53PM +0200, Ard Biesheuvel wrote: ... > > > > What we might do, given the fact that only inbound non-cache coherent > > > > DMA is problematic, is dropping the kmalloc alignment to 8 like on > > > > x86, and falling back to bounce buffering when a misaligned, non-cache > > > > coherent inbound DMA mapping is created, using the SWIOTLB bounce > > > > buffering code that we already have, and is already in use on most > > > > affected systems for other reasons (i.e., DMA addressing limits) > > > > > > Ick, that's a mess. > > > > > > > This will cause some performance regressions, but in a way that seems > > > > fixable to me: taking network drivers as an example, the RX buffers > > > > that are filled using inbound DMA are typically owned by the driver > > > > itself, which could be updated to round up its allocations and DMA > > > > mappings. Block devices typically operate on quantities that are > > > > aligned sufficiently already. In other cases, we will likely notice > > > > if/when this fallback is taken on a hot path, but if we don't, at > > > > least we know a bounce buffer is being used whenever we cannot perform > > > > the DMA safely in-place. > > > > > > We can move to having an "allocator-per-bus" for memory like this to > > > allow the bus to know if this is a DMA requirement or not. > > > > > > So for all USB drivers, we would have: > > > usb_kmalloc(size, flags); > > > and then it might even be easier to verify with static tools that the > > > USB drivers are sending only properly allocated data. Same for SPI and > > > other busses. > > > > > > > As I pointed out earlier in the thread, alignment/padding requirements > > for non-coherent DMA are a property of the CPU's cache hierarchy, not > > of the device. So I'm not sure I follow how a per-subsystem > > distinction would help here. In the case of USB especially, would that > > mean that block, media and networking subsystems would need to be > > aware of the USB-ness of the underlying transport? > > That's what we have required today, yes. That's only because we knew > that for some USB controllers, that was a requirement and we had no way > of passing that information back up the stack so we just made it a > requirement. > > But I do agree this is messy. It's even messier for things like USB > where it's not the USB device itself that matters, it's the USB > controller that the USB device is attached to. And that can be _way_ up > the device hierarchy. Attach something like a NFS mount over a PPP > network connection on a USB to serial device and ugh, where do you > begin? :) > Exactly. > And is this always just an issue of the CPU cache hierarchy? And not the > specific bridge that a device is connected to that CPU on? Or am I > saying the same thing here? > Yes, this is a system property not a device property, and the driver typically doesn't have any knowledge of this. For example, if a PCI host bridge happens to be integrated in a non-cache coherent way, any PCI device plugged into it becomes non-coherent, and the associated driver needs to do the right thing. This is why we rely on the DMA layer to take care of this. > I mean take a USB controller for example. We could have a system where > one USB controller is on a PCI bus, while another is on a "platform" > bus. Both of those are connected to the CPU in different ways and so > could have different DMA rules. Do we downgrade everything in the > system for the worst connection possible? > No, we currently support a mix of coherent and non-coherent just fine, and this shouldn't change. It's just that the mere fact that non-coherent devices might exist is increasing the memory footprint of all kmalloc allocations. > Again, consider a USB driver allocating memory to transfer stuff, should > it somehow know the cache hierarchy that it is connected to? Right now > we punt and do not do that at the expense of a bit of potentially > wasted memory for small allocations. > This whole discussion is based on the premise that this is an expense we would prefer to avoid. Currently, every kmalloc allocation is rounded up to 128 bytes on arm64, while x86 uses only 8.