Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp86353imm; Thu, 20 Sep 2018 18:56:38 -0700 (PDT) X-Google-Smtp-Source: ANB0VdauCHQSBQa492OeBwzlea+/XX8PBpcC82w68+k1HkKmLVajfEjnZzXx3wiwRITSRa7E6+7e X-Received: by 2002:a17:902:820a:: with SMTP id x10-v6mr41969135pln.261.1537494997985; Thu, 20 Sep 2018 18:56:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537494997; cv=none; d=google.com; s=arc-20160816; b=TQOUx8x7A2YdpsylFC6uJK2GNTWxnj+DNDVykC+hXfMU68wWIr+GK4Cu9RH+qKgR2g 8jPXsRMcAFoYzDpwC9n1Cy/d+0BZHRbOuEvHsrOjBMMk/Fz0uoOQps5y0A7jdY0lei9Z 7mI4JqxUVFOmvJ/s8ajanXF+kLOyF8t5iDJhiOrcdT4uirNFz/shRk9LPDGHBRudOCwR 4zw5Ti0QZdlGpzC36GBCu3XmH5kkkc4kNtr2CmuO3LSomnJA+tOZJOER0HtT8Qln6NON +fptTUSzM9+4bq+s3xqD8rkuQHOcjqoL2yweeSHwTB/vxlIyfgdbeLlO+1d+0AomsSlF BnZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=TXhvH6Z77lAqkCE9A8WfuJYUB33qQgsACB19P9CJ2cA=; b=iijMuPNyDgDt7pSgUlAHh0pnlvoRKOjDN+caSEeIlKSxs1ODfj9xuGp492Mk1mf7kK +m301bJcgkD/uXF9Zuv02GSiREmoL3DvhW40n8NRF+Bp0hX2IDr31wMMSufL3xmvT+mY 77iqH8dTH2Nab9qAy6TwRnX7vy2LEV0W2llw1U60zLsbWRYVeVNnREWl3NipjnngRHao yvhUT4EgGq8+I1NMIJIqY6rhqgT8vkrHu4splkS4poGvuj+Wy0uuCf7mC/GkEiINrV14 qIZ+fuR7PJcGHMwJhhlulKZldvvzA9aFQkQ7pMMqKv7HHrOMIIASZ3FKCn8w+ZnqrEnj wuMA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y3-v6si24179728pgg.266.2018.09.20.18.56.20; Thu, 20 Sep 2018 18:56:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388798AbeIUHmn (ORCPT + 99 others); Fri, 21 Sep 2018 03:42:43 -0400 Received: from ipmail03.adl2.internode.on.net ([150.101.137.141]:25238 "EHLO ipmail03.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725855AbeIUHmn (ORCPT ); Fri, 21 Sep 2018 03:42:43 -0400 Received: from ppp59-167-129-252.static.internode.on.net (HELO dastard) ([59.167.129.252]) by ipmail03.adl2.internode.on.net with ESMTP; 21 Sep 2018 11:26:11 +0930 Received: from dave by dastard with local (Exim 4.80) (envelope-from ) id 1g3Afw-00089q-VT; Fri, 21 Sep 2018 11:56:09 +1000 Date: Fri, 21 Sep 2018 11:56:08 +1000 From: Dave Chinner To: Ming Lei Cc: linux-block , linux-mm , Linux FS Devel , "open list:XFS FILESYSTEM" , Dave Chinner , Vitaly Kuznetsov , Linux Kernel Mailing List , Christoph Hellwig , Jens Axboe , Ming Lei Subject: Re: block: DMA alignment of IO buffer allocated from slab Message-ID: <20180921015608.GA31060@dastard> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 19, 2018 at 05:15:43PM +0800, Ming Lei wrote: > Hi Guys, > > Some storage controllers have DMA alignment limit, which is often set via > blk_queue_dma_alignment(), such as 512-byte alignment for IO buffer. > > Block layer now only checks if this limit is respected for buffer of > pass-through request, > see blk_rq_map_user_iov(), bio_map_user_iov(). > > The userspace buffer for direct IO is checked in dio path, see > do_blockdev_direct_IO(). > IO buffer from page cache should be fine wrt. this limit too. > > However, some file systems, such as XFS, may allocate single sector IO buffer > via slab. Usually I guess kmalloc-512 should be fine to return > 512-aligned buffer. > But once KASAN or other slab debug options are enabled, looks this > isn't true any > more, kmalloc-512 may not return 512-aligned buffer. Then data corruption > can be observed because the IO buffer from fs layer doesn't respect the DMA > alignment limit any more. > > Follows several related questions: > > 1) does kmalloc-N slab guarantee to return N-byte aligned buffer? If > yes, is it a stable rule? It has behaved like this for both slab and slub for many, many years. A quick check indicates that at least XFS and hfsplus feed kmalloc()d buffers straight to bios without any memory buffer alignment checks at all. > 2) If it is a rule for kmalloc-N slab to return N-byte aligned buffer, > seems KASAN violates this > rule? XFS has been using kmalloc()d memory like this since 2012 and lots of people use KASAN on XFS systems, including me. From this, it would seem that the problem of mishandling unaligned memory buffers is not widespread in the storage subsystem - it's taken years of developers using slub debug and/or KASAN to find a driver that has choked on an inappropriately aligned memory buffer.... > 3) If slab can't guarantee to return 512-aligned buffer, how to fix > this data corruption issue? I think that the block layer needs to check the alignment of memory buffers passed to it and take appropriate action rather than corrupting random memory and returning a sucess status to the bad bio. IMO, trusting higher layers of kernel code to get everything right is somewhat naive. The block layer does not trust userspace to get everything right for good reason and those same reasons extend to kernel code. i.e. all software has bugs, we have an impossible complex kernel config test matrix, and even if correctly written, proven bug-free software existed, that perfect code can still misbehave when things like memory corruption from other bad code or hardware occurs. From that persepective, I think that if the the receiver of a bio has specific alignment requirements and the bio does not meet them, then it needs to either enforce the alignment requirements (i.e. error out) or make it right by bouncing the bio to an acceptible alignment. Erroring out will cause things to fail hard until whatever problem causing the error is fixed, while bouncing them provides the "everything just works normally" solution... Cheers, Dave. -- Dave Chinner david@fromorbit.com