Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp2355754rwb; Thu, 29 Sep 2022 09:10:11 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6/i7xM94o9V5Lnvqy/VEtNrxscSdRrOBLck4V8wOIwoyMmBT31r6DH6/wjS7SXxQQUuTx9 X-Received: by 2002:a05:6402:35c1:b0:451:e1aa:e66e with SMTP id z1-20020a05640235c100b00451e1aae66emr3819261edc.275.1664467811102; Thu, 29 Sep 2022 09:10:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664467811; cv=none; d=google.com; s=arc-20160816; b=aY3EdT4wUYKzivRYc0uo9wN7aajarXboUs2MN4jYLYe7cOl2Dxf871b8qPmA15T/Lh IO7Mq3E5wTtyquJuxnFlg9FyJ48EyE7iEhCQMVB/yAkbXZJ4i+GU03y+X0axbRMgNOvg jU1CuQExo66OSmzBsXG7cBhvbMu0aTHq8lKp9eNTMI/blnm1LT1e5nPJq5pCgiK59ROy 9hnl87pP/b4MIeU+1ume4awMM0Mit9kt4lSxgWS1Hi7F4z+PEERzjRFj4iElsKTxC73d SXKK88EYL/1Xn7iwGy1XrZInEpt99LxcubOJ/pxZwnEalSSLw6Igz6LUiBlS8sTdWGV/ vvlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:date:cc:to:from:subject:message-id:dkim-signature; bh=d6dmokZDm5kVDQlkFm4C1MuGfv3oJNgAvYSSmFlb+ok=; b=q1kQ8cWMx51ym2GhDWI9cv7zJfo0zOUY14xSXfmBVO+UjsOj2jG//qn4BSBtkQ6nVt fOXGYR5fd4HuA1ibPGN6FTZuIB71rD1QdnWdOlnKLl1Vh0VCuWpQh8QyTLzPA5BdczYZ MjyD2JhSwYqi6NNhlSp+2FKtplkqvN0ATjuNAHcKYnEQwS7/ltk07ADE7ZnUo732qi6W zXaMbdy9TWHIY49jzY9M2Aj7HQ9XSbyXkpz8/XRTadE1uvWTIZi9jdoC1UtizZBJc4lb esGNxI64OumLiOLxt5Bf9Rzp3LpJ/xIrfgTpAEVaKeEC50mkFx7lbsgbVnk/R1JElp/I qIxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ebsZNbgM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id wt6-20020a170906ee8600b0072a9e62a9dfsi11145738ejb.111.2022.09.29.09.09.45; Thu, 29 Sep 2022 09:10:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ebsZNbgM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235553AbiI2Pm4 (ORCPT + 99 others); Thu, 29 Sep 2022 11:42:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235507AbiI2PmV (ORCPT ); Thu, 29 Sep 2022 11:42:21 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E9B0B115BFD for ; Thu, 29 Sep 2022 08:41:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664466095; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=d6dmokZDm5kVDQlkFm4C1MuGfv3oJNgAvYSSmFlb+ok=; b=ebsZNbgM+PxoH3Xjl2A+VkZJY9e2yBLDVa3uURR2WZpnFZiOdUNzzf6avEpgi2AfROLvqe HTJhdYw+H0I7kkLgU6CP1M8IVH3FXvvJCv3zO98XCRmM4yWf+LKQte+vm+MB/tm8rmw6vr BOqNUE3TNs1rKUCvASu6Ofk4jxZv38I= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-403-pYZ6JNfZPDaQr7OC8gPKcw-1; Thu, 29 Sep 2022 11:41:30 -0400 X-MC-Unique: pYZ6JNfZPDaQr7OC8gPKcw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F0D10801231; Thu, 29 Sep 2022 15:41:26 +0000 (UTC) Received: from starship (unknown [10.40.193.233]) by smtp.corp.redhat.com (Postfix) with ESMTP id 210062166B2D; Thu, 29 Sep 2022 15:41:24 +0000 (UTC) Message-ID: Subject: Commit 'iomap: add support for dma aligned direct-io' causes qemu/KVM boot failures From: Maxim Levitsky To: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Keith Busch , Christoph Hellwig , qemu-devel@nongnu.org, kvm@vger.kernel.org Date: Thu, 29 Sep 2022 18:41:23 +0300 Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.36.5 (3.36.5-2.fc32) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi! Recently I noticed that this commit broke the boot of some of the VMs that I run on my dev machine. It seems that I am not the first to notice this but in my case it is a bit different https://lore.kernel.org/all/e0038866ac54176beeac944c9116f7a9bdec7019.camel@linux.ibm.com/ My VM is a normal x86 VM, and it uses virtio-blk in the guest to access the virtual disk, which is a qcow2 file stored on ext4 filesystem which is stored on NVME drive with 4K sectors. (however I was also able to reproduce this on a raw file) It seems that the only two things that is needed to reproduce the issue are: 1. The qcow2/raw file has to be located on a drive which has 4K hardware block size. 2. Qemu needs to use direct IO (both aio and 'threads' reproduce this). I did some debugging and I isolated the kernel change in behavior from qemu point of view: Qemu, when using direct IO, 'probes' the underlying file. It probes two things: 1. It probes the minimum block size it can read. It does so by trying to read 1, 512, 1024, 2048 and 4096 bytes at offset 0, using a 4096 bytes aligned buffer, and notes the first read that works as the hardware block size. (The relevant function is 'raw_probe_alignment' in src/block/file-posix.c in qemu source code). 2. It probes the buffer alignment by reading 4096 bytes also at file offset 0, this time using a buffer that is 1, 512, 1024, 2048 and 4096 aligned (this is done by allocating a buffer which is 4K aligned and adding 1/512 and so on to its address) First successful read is saved as the required buffer alignment. Before the patch, both probes would yield 4096 and everything would work fine. (The file in question is stored on 4K block device) After the patch the buffer alignment probe succeeds at 512 bytes. This means that the kernel now allows to read 4K of data at file offset 0 with a buffer that is only 512 bytes aligned. It is worth to note that the probe was done using 'pread' syscall. Later on, qemu likely reads the 1st 512 sector of the drive. It uses preadv with 2 io vectors: First one is for 512 bytes and it seems to have 0xC00 offset into page (likely depends on debug session but seems to be consistent) Second one is for 3584 bytes and also has a buffer that is not 4K aligned. (0x200 page offset this time) This means that the qemu does respect the 4K block size but only respects 512 bytes buffer alignment, which is consistent with the result of the probing. And that preadv fails with -EINVAL Forcing qemu to use 4K buffer size fixes the issue, as well as reverting the offending commit. Any patches, suggestions are welcome. I use 6.0-rc7, using mainline master branch as yesterday. Best regards, Maxim Levitsky