Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp536016imm; Wed, 19 Sep 2018 03:04:15 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZWvywrUt8Ombq+Bk41K6rDZOnCPX1sdvoSrpUbfx6ce5cK/RrTIJPYtZ7lOxPnBKnSHjJz X-Received: by 2002:a63:9752:: with SMTP id d18-v6mr29714301pgo.405.1537351455735; Wed, 19 Sep 2018 03:04:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537351455; cv=none; d=google.com; s=arc-20160816; b=l9b9hGWm900FfNOmtZJrz3S9yqHNcNM4B/ic3md9f6wXgxGLeUgzD4HA1X0SuQZvmb e9CrFySL6aw/TTRw4yy5lOIaSvzvovrwVJ4DI9uRzRdhrda+ZHZkn+JXwYQiX9cl3PDq /4Yz1BZT9XTTPvR0+WB5/V3HTv0oRCempZ0jd2Vir8jS3W7BHjMKh+nSFnxdjztjvjeM 5CF0xbN9PA/Zn8peFxK5Zrcd4606LRnDbS3YZoDl8kMybxOilM4o77KuDYN8yXN0UGqa BMCzRLHz69wjbDijZdkp1sNRAFUSsl+NLi7z1Sh60EouzkUmNrSATHcQAPqJhoIOywf/ xeGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=qij+2uAniR7ZhraTLuUiI+V6idFuxJNJVT/TWQEwBjM=; b=eCcLN55wfuUPTAGsvCLRiSqirI/yyRv2+4ttnAOPGmGEN/GylHWXxtS4rEnc1GdP7o nEGtIC28R9N878iSkwXxA9ryyJqC7n60W3seylMbWf2eBsBIMyfyxvmoKObMy8F2WLYj OspMr51pni90AskJol1r/WzCwXOhP6dpWxQ/w4NVvF2mHIi6uD4K4sVcX+uLHEx1qsaw WAQt7FZOjGJZ39jK0/dVdSg1QW1QPilcq/Nu4Bw9r9UnnMHiT0h+oXehw82jY04K4/Ct Riam0tUbhASHCl46/4BDccgA9AdWXtjNke2mQmyFBqXV9eLA90Aq9U4njtOM+938Nc4L zYpQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1-v6si22587905plt.148.2018.09.19.03.03.58; Wed, 19 Sep 2018 03:04:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731111AbeISPkV (ORCPT + 99 others); Wed, 19 Sep 2018 11:40:21 -0400 Received: from mx1.redhat.com ([209.132.183.28]:40796 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727605AbeISPkV (ORCPT ); Wed, 19 Sep 2018 11:40:21 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 234D2C047B69; Wed, 19 Sep 2018 10:03:12 +0000 (UTC) Received: from ming.t460p (ovpn-8-24.pek2.redhat.com [10.72.8.24]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D2BA3272D8; Wed, 19 Sep 2018 10:03:02 +0000 (UTC) Date: Wed, 19 Sep 2018 18:02:57 +0800 From: Ming Lei To: Vitaly Kuznetsov Cc: Ming Lei , linux-block , linux-mm , Linux FS Devel , "open list:XFS FILESYSTEM" , Dave Chinner , Linux Kernel Mailing List , Christoph Hellwig , Jens Axboe Subject: Re: block: DMA alignment of IO buffer allocated from slab Message-ID: <20180919100256.GD23172@ming.t460p> References: <877ejh3jv0.fsf@vitty.brq.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <877ejh3jv0.fsf@vitty.brq.redhat.com> User-Agent: Mutt/1.9.1 (2017-09-22) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Wed, 19 Sep 2018 10:03:12 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Vitaly, On Wed, Sep 19, 2018 at 11:41:07AM +0200, Vitaly Kuznetsov wrote: > Ming Lei writes: > > > Hi Guys, > > > > Some storage controllers have DMA alignment limit, which is often set via > > blk_queue_dma_alignment(), such as 512-byte alignment for IO buffer. > > While mostly drivers use 512-byte alignment it is not a rule of thumb, > 'git grep' tell me we have: > ide-cd.c with 32-byte alignment > ps3disk.c and rsxx/dev.c with variable alignment. > > What if our block configuration consists of several devices (in raid > array, for example) with different requirements, e.g. one requiring > 512-byte alignment and the other requiring 256? 512-byte alignment is also 256-byte aligned, and the sector size is 512 byte. > > > > > Block layer now only checks if this limit is respected for buffer of > > pass-through request, > > see blk_rq_map_user_iov(), bio_map_user_iov(). > > > > The userspace buffer for direct IO is checked in dio path, see > > do_blockdev_direct_IO(). > > IO buffer from page cache should be fine wrt. this limit too. > > > > However, some file systems, such as XFS, may allocate single sector IO buffer > > via slab. Usually I guess kmalloc-512 should be fine to return > > 512-aligned buffer. > > But once KASAN or other slab debug options are enabled, looks this > > isn't true any > > more, kmalloc-512 may not return 512-aligned buffer. Then data corruption > > can be observed because the IO buffer from fs layer doesn't respect the DMA > > alignment limit any more. > > > > Follows several related questions: > > > > 1) does kmalloc-N slab guarantee to return N-byte aligned buffer? If > > yes, is it a stable rule? > > > > 2) If it is a rule for kmalloc-N slab to return N-byte aligned buffer, > > seems KASAN violates this > > rule? > > (as I was kinda involved in debugging): the issue was observed with SLUB > allocator KASAN is not to blame, everything wich requires aditional > metadata space will break this, see e.g. calculate_sizes() in slub.c Buffer allocated via kmalloc() should be aligned with L1 HW cache size at least. I have raised the question: does kmalloc-512 slab guarantee to return 512-byte aligned buffer, let's see what the answer is from MM guys,:-) From the Red Hat BZ, looks I understand this issue is only triggered when KASAN is enabled, or you have figured out how to reproduce it without KASAN involved? > > > > > 3) If slab can't guarantee to return 512-aligned buffer, how to fix > > this data corruption issue? > > I'm no expert in block layer but in case of complex block device > configurations when bio submitter can't know all the requirements I see > no other choice than bouncing. I guess that might be the last straw, given the current way without bouncing works for decades, and seems no one complains before. Thanks, Ming