Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp1107521rwb; Sat, 1 Oct 2022 15:36:49 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6euEss3AmPP0L6UVSSkPhPQuUwSMFvxuYSGMB1312tbYUNKSykM0SZFqtiCfNDTiDhBtYb X-Received: by 2002:aa7:dd45:0:b0:458:7474:1fbe with SMTP id o5-20020aa7dd45000000b0045874741fbemr8786237edw.334.1664663808929; Sat, 01 Oct 2022 15:36:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664663808; cv=none; d=google.com; s=arc-20160816; b=MGAzi90qoPcwe0f21GsUiGORkOe3kp5LcA9EryAcQho9nvDQD0R/khvPW0Psg48FhS 1yqyu6f6VM9V5OGk5nQlkat5V84+Dv30qpCimXyXUMScU+PTQ1c52qIL+QhCvMgrHXXN lyl99A3AhlEF8wGUve1YMH8peUO8fizv5JXK/EwE6678rQCfz0dM4Ha043o1KtP0HYPc AHgp+WTJ6qDFk5fmliox88V0kP55Q7qkQ2/a0XfG7TB/SNTAt1ze+Th7XohlTX2bwhcp nK5b861aCsKaMnB3j4SO4qKWK2aud5Oz7j7WEcKaioT7QdX3ps1wc+cp5Szno/qF/dSt 9Syg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=7Xgs5zQCw4aX7k4F9ZRcPHNyAud8f9j4uhBDsXJc0a0=; b=z5Rzn2RS9bIaGzkBRRDQ5MCyn5Gh1pe4dS9WdsYk1FoT7bdaR45Y1lssDEnx3CCopT m7xDKpGN4E6DX16rIEw3wRp2Y5AomA2uTfroimp44xWqP19IqLZ/+bzmgKd6DjfQbcJa AyvWfb/24hRAgDID4dp3MlouvYDyuNk2vYNq/8bJv0L8rCTc6413SVHy9/+ObtfNg088 MnL3b1DY1gu+ZTlgHKmlCOlvTemiysSKMF3gYBvvVCdI31Xyw7xPJEm/umvGcbGGtr4f qABcCczbURnuuVa/bojLj4PR2s54QGWQQWkitHAVgV/J8TBd0GDNclRcT4YeMkUYzBru OBtQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bg6-20020a170906a04600b007708400bee5si3968122ejb.1003.2022.10.01.15.36.23; Sat, 01 Oct 2022 15:36:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229482AbiJAWaD (ORCPT + 99 others); Sat, 1 Oct 2022 18:30:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54618 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229461AbiJAWaB (ORCPT ); Sat, 1 Oct 2022 18:30:01 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E218C19C2C for ; Sat, 1 Oct 2022 15:29:58 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 5EF4260DD1 for ; Sat, 1 Oct 2022 22:29:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 26403C433D6; Sat, 1 Oct 2022 22:29:54 +0000 (UTC) Date: Sat, 1 Oct 2022 23:29:51 +0100 From: Catalin Marinas To: Linus Torvalds Cc: Isaac Manjarres , Herbert Xu , Ard Biesheuvel , Will Deacon , Marc Zyngier , Arnd Bergmann , Greg Kroah-Hartman , Andrew Morton , Linux Memory Management List , Linux ARM , Linux Kernel Mailing List , "David S. Miller" , Saravana Kannan , kernel-team@android.com Subject: Re: [PATCH 07/10] crypto: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 30, 2022 at 12:35:45PM -0700, Linus Torvalds wrote: > On Fri, Sep 30, 2022 at 11:33 AM Catalin Marinas > wrote: > > I started refreshing the series but I got stuck on having to do bouncing > > for small buffers even if when they go through the iommu (and I don't > > have the set up to test it yet). > > May I suggest doing that "force bouncing" and "change kmalloc to have > a 8-byte minalign" to be the two first commits? > > IOW, if we force bouncing for unaligned DMA, then that *should* mean > that allocation alignment is no longer a correctness issue, it's > purely a performance one due to the bouncing. I've been thinking about this and even ended up with a CBMC model (included below; it found a bug in dma_kmalloc_needs_bounce()). The "force bouncing" in my series currently only checks for small (potentially kmalloc'ed) sizes under the assumption that intra-object DMA buffers were properly aligned to 128. So for something like below: struct devres { struct devres_node node; u8 __aligned(ARCH_DMA_MINALIGN) data[]; }; we'd need ARCH_DMA_MINALIGN of 128 even if ARCH_KMALLOC_MINALIGN is 8. Original the code has __aligned(ARCH_KMALLOC_MINALIGN), so lowering the latter to 8 without any changes would be problematic (the sizeof(devres) may be sufficiently large to look cacheline-aligned). If data[] contains a single DMA buffer, dma_kmalloc_needs_bounce() can get the start of the buffer as another parameter and check that it's a multiple of cache_line_size(). However, things get more complicated if data[] is used for several sub-allocations of multiples of ARCH_KMALLOC_MINALIGN. Not much to do with kmalloc() caches at this point. I haven't got my head around the crypto code but it looked to me like it needs ARCH_DMA_MINALIGN in some places if we are to lower ARCH_KMALLOC_MINALIGN. We could attempt to force bouncing in dma_kmalloc_needs_bounce() by: if (ptr % dma_align != || size % dma_align != 0) return true; but that's orthogonal to the kmalloc caches. I tried this some years ago and IIRC many buffers get bounced even with ARCH_KMALLOC_MINALIGN of 128 because drivers don't necessarily have cacheline-aligned sized structures shared with devices (but they are allocated from a cacheline-aligned slab). So this check results in unnecessary bouncing. So my series attempts to (1) fix the (static) alignment for intra-object buffers by changing a few ARCH_KMALLOC_MINALIGN uses to ARCH_DMA_MINALIGN and (2) address the kmalloc() DMA safety by bouncing non-cacheline-aligned sizes. I don't think we can do (2) first as the logic for handling (1) in the absence of a large ARCH_DMA_MINALIGN is different. And that's the CMBC model: ------------------------------------8<---------------------------- // SPDX-License-Identifier: GPL-2.0-only /* * Check with: * cbmc --trace dma-bounce.c */ #define PAGE_SIZE 4096 #define ARCH_KMALLOC_MINALIGN 8 #define ARCH_DMA_MINALIGN 128 #define KMALLOC_MIN_SIZE ARCH_KMALLOC_MINALIGN #define KMALLOC_SHIFT_LOW 3 #define KMALLOC_SHIFT_HIGH 25 #define KMALLOC_MAX_SIZE (1UL << KMALLOC_SHIFT_HIGH) #define INIT_KMALLOC_INFO(__size, __short_size) \ { \ .size = __size, \ } static unsigned int nondet_uint(void); struct kmalloc_info_struct { unsigned int size; }; struct kmalloc_slab { unsigned int ptr; unsigned int size; }; static const struct kmalloc_info_struct kmalloc_info[] = { INIT_KMALLOC_INFO(0, 0), INIT_KMALLOC_INFO(96, 96), INIT_KMALLOC_INFO(192, 192), INIT_KMALLOC_INFO(8, 8), INIT_KMALLOC_INFO(16, 16), INIT_KMALLOC_INFO(32, 32), INIT_KMALLOC_INFO(64, 64), INIT_KMALLOC_INFO(128, 128), INIT_KMALLOC_INFO(256, 256), INIT_KMALLOC_INFO(512, 512), INIT_KMALLOC_INFO(1024, 1k), INIT_KMALLOC_INFO(2048, 2k), INIT_KMALLOC_INFO(4096, 4k), INIT_KMALLOC_INFO(8192, 8k), INIT_KMALLOC_INFO(16384, 16k), INIT_KMALLOC_INFO(32768, 32k), INIT_KMALLOC_INFO(65536, 64k), INIT_KMALLOC_INFO(131072, 128k), INIT_KMALLOC_INFO(262144, 256k), INIT_KMALLOC_INFO(524288, 512k), INIT_KMALLOC_INFO(1048576, 1M), INIT_KMALLOC_INFO(2097152, 2M), INIT_KMALLOC_INFO(4194304, 4M), INIT_KMALLOC_INFO(8388608, 8M), INIT_KMALLOC_INFO(16777216, 16M), INIT_KMALLOC_INFO(33554432, 32M) }; static unsigned int cache_line_size(void) { static const unsigned int cls = nondet_uint(); __CPROVER_assume(cls == 32 || cls == 64 || cls == 128); return cls; } static unsigned int kmalloc_index(unsigned int size) { if (!size) return 0; if (size <= KMALLOC_MIN_SIZE) return KMALLOC_SHIFT_LOW; if (KMALLOC_MIN_SIZE <= 32 && size > 64 && size <= 96) return 1; if (KMALLOC_MIN_SIZE <= 64 && size > 128 && size <= 192) return 2; if (size <= 8) return 3; if (size <= 16) return 4; if (size <= 32) return 5; if (size <= 64) return 6; if (size <= 128) return 7; if (size <= 256) return 8; if (size <= 512) return 9; if (size <= 1024) return 10; if (size <= 2 * 1024) return 11; if (size <= 4 * 1024) return 12; if (size <= 8 * 1024) return 13; if (size <= 16 * 1024) return 14; if (size <= 32 * 1024) return 15; if (size <= 64 * 1024) return 16; if (size <= 128 * 1024) return 17; if (size <= 256 * 1024) return 18; if (size <= 512 * 1024) return 19; if (size <= 1024 * 1024) return 20; if (size <= 2 * 1024 * 1024) return 21; if (size <= 4 * 1024 * 1024) return 22; if (size <= 8 * 1024 * 1024) return 23; if (size <= 16 * 1024 * 1024) return 24; if (size <= 32 * 1024 * 1024) return 25; __CPROVER_assert(0, "Invalid kmalloc() size"); return -1; } unsigned int kmalloc(unsigned int size, struct kmalloc_slab *slab) { unsigned int nr = nondet_uint(); slab->size = kmalloc_info[kmalloc_index(size)].size; slab->ptr = nr * slab->size; __CPROVER_assume(slab->ptr < PAGE_SIZE); __CPROVER_assume(slab->ptr % slab->size == 0); return slab->ptr; } /* * Implemented only for 32, 64 and 128 cache line sizes. */ int dma_kmalloc_needs_bounce(unsigned int size) { unsigned int dma_align = cache_line_size(); /* * Less than half dma_align, there's definitely a smaller kmalloc() * cache. */ if (size <= dma_align / 2) return 1; /* * From this point, any kmalloc cache size is 32-byte aligned. */ if (dma_align == 32) return 0; /* * dma_align == 64 => 96 needs bouncing. * dma_align == 128 => 96 and 192 need bouncing. */ if (size > 64 && size <= 96) return 1; if (dma_align == 128 && size > 128 && size <= 192) return 1; return 0; } /* * Simulate DMA cache maintenance. The 'slab' object is only used for * verification. */ void dma_map_single(unsigned int ptr, unsigned int size, struct kmalloc_slab *slab) { unsigned int mask = cache_line_size() - 1; if (dma_kmalloc_needs_bounce(size)) { /* was the bounce really necessary? */ __CPROVER_assert((ptr & mask) != 0 || (size & mask) != 0, "Bouncing aligned DMA buffer"); return; } /* * Check for cache maintenance outside the kmalloc'ed object. We don't * care about intra-object overlap, it's the caller's responsibility * to ensure alignment. */ __CPROVER_assert((ptr & ~mask) >= slab->ptr, "DMA cache maintenance underflow"); __CPROVER_assert(((ptr + size + mask) & ~mask) <= slab->ptr + slab->size, "DMA cache maintenance overflow"); } int main(void) { struct kmalloc_slab slab; unsigned int size = nondet_uint(); unsigned int offset = nondet_uint(); unsigned int ptr; __CPROVER_assume(size <= KMALLOC_MAX_SIZE); __CPROVER_assume(offset < size); __CPROVER_assume(offset % ARCH_DMA_MINALIGN == 0); ptr = kmalloc(size, &slab); dma_map_single(ptr + offset, size - offset, &slab); return 0; } -- Catalin