Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp3518782pxx; Mon, 2 Nov 2020 11:01:52 -0800 (PST) X-Google-Smtp-Source: ABdhPJyLoV7ZwGhuJrt8ttOsmrcSQ3oNQNAMJ7BjH0yDBlqapACQEhErWGxWt9h3W1/yDMdRLe+J X-Received: by 2002:a1c:b387:: with SMTP id c129mr7938979wmf.58.1604343711829; Mon, 02 Nov 2020 11:01:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604343711; cv=none; d=google.com; s=arc-20160816; b=I9jrgU+EC8KnOoRDPtHlIM/22jKC/OfLqdyAKFq3Z2vfo3ByvVxNVgRVmBfyC7SKku 6IMCi/AFweU0TzhAfob53DGP3/tYI2o3vAq0H3ieliJr+G0cunRooBNE9eZbtP10pi+E IzMFf+9JK/hDiA+enrOzIIEvOzCY4+slwpB183dPAobAKF1kyJ0zQNsxeYFZOJxY1/Cy CZTtRmrkAUKu1qfdTegCGlSd4peksBGy5ukyKbBPiXu8salMx8uvzM7+DD06Q43lIQZt znl6rrqeqdhmRlwJN1SE4ypm9dzGCOr2m5Dm45QeSt6HqgXpObooLxRlxorpSrsSXEf5 us7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=Wnk2CLIL87tXQgWK+SrdFKrU8Jb11+wiVutkxS8FJAk=; b=koQGIX6qTfGeGQ5WaFUpHazuuklqwKVRpF/kxofIMrMZkKl+VbvdrpaJ40DuMRDj4T fGWQTMVe+Qa+ef8jeq1XFPScNSQ55q1tedcWO4C9V/MYzdFaZC/ibRvp/XBDRbG7/d// WNEuBmtrQYIR8BB+L9Bh88uABUP6KJRzEU7W4fhQAau8SiKlz+GTiQFRYXqIsxqmnEUT Jp61SLcMOKTntSe0nnjznnMC7yhEkW/XS14g7/ocKypuOkAdTN3U/J8sxYN7Xa6LWYuv R1/z3bzblVjq17boV4QJkIZKom8odwcgYBJtWzjzeUN6ifIfed3nEPAEx3lqT17pNx8A j4vw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@drivescale-com.20150623.gappssmtp.com header.s=20150623 header.b=y8K5h6gM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ck9si10168918ejb.445.2020.11.02.11.01.27; Mon, 02 Nov 2020 11:01:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@drivescale-com.20150623.gappssmtp.com header.s=20150623 header.b=y8K5h6gM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725806AbgKBS7x (ORCPT + 99 others); Mon, 2 Nov 2020 13:59:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35022 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725817AbgKBS7x (ORCPT ); Mon, 2 Nov 2020 13:59:53 -0500 Received: from mail-oi1-x243.google.com (mail-oi1-x243.google.com [IPv6:2607:f8b0:4864:20::243]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 106FDC061A04 for ; Mon, 2 Nov 2020 10:59:53 -0800 (PST) Received: by mail-oi1-x243.google.com with SMTP id j7so15697232oie.12 for ; Mon, 02 Nov 2020 10:59:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=drivescale-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Wnk2CLIL87tXQgWK+SrdFKrU8Jb11+wiVutkxS8FJAk=; b=y8K5h6gMZ1Tpu2gKkrgqxtPSgjoXfyfnfxZ1Gi6no1fM6Ufkq2SZwgGLU9tdovF8FD H1T5vstQrNtQKJuCst5PHACGrAbrRpFWF4UQ2X4WOTR/wX0+FGnay+51KJKTHp2DItpL K36OdYDSGrFeiX14jO+skjrnVyGgEq5LWrdeYEWwCOoQ7PpcpO57ytLH5vsaU7bBqf34 CTYNsKAOoh6Ou3CvKYPkTSccGuyKV+/Q3JJNGxVlby3c5UYNkwckVW82HdOABbSSmhoM RLY5KqVxx25NG6jAp1gM9G8CFTXTCXCdzie9kdZIgwj9LkzYbToK7rROvBRNBDGGy+o3 Rb6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Wnk2CLIL87tXQgWK+SrdFKrU8Jb11+wiVutkxS8FJAk=; b=Qxllwnl9WaYhoCnuZmWGgl6vfr20jp+cZuzzmkE5lh/uOziFOdD9/8cV7pIqbyy/nS 65uZVhjnOV05OL+MMj+R2Twzl4XjnIL8Q8rapJDnzuaiPrimlmNtF31tGtJApFxf8gEb UQsABPruXKWu9L/A+3ALtz7BaviNuewt3g2OS/567I0QYfiqjbNDv/wx+ii0Iibfct1r Ayw5Oq3eIVPHp3gzhKZegpMh2dThqgwT74JxhMW1UgmF8XCF71RqDrpVEe+GUXavZcvR oiw7Jmqc5TB0Rua3xtqoSQm/IXYY4akbfIgfGULsU5LDvvssuJpy6BkAUqAwB44h5xFe f8pQ== X-Gm-Message-State: AOAM533jpPR3amgUrRadpjwJJhzCNFvDK3ynmin1E/a+y9vGv/8Fb5T6 5cpxSJT0+OIM3LCBTKS7LFRH037MErKzx/tbndoAiQ== X-Received: by 2002:aca:aa90:: with SMTP id t138mr6303107oie.171.1604343592266; Mon, 02 Nov 2020 10:59:52 -0800 (PST) MIME-Version: 1.0 References: <20201023033130.11354-1-cunkel@drivescale.com> In-Reply-To: From: Chris Unkel Date: Mon, 2 Nov 2020 10:59:39 -0800 Message-ID: Subject: Re: [PATCH 0/3] mdraid sb and bitmap write alignment on 512e drives To: Xiao Ni Cc: linux-raid , open list , Song Liu Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Xiao, That particular array is super1.2. The block trace was captured on the disk underlying the partition device on which the md array member resides, not on the partition device itself. The partition starts 2048 sectors into the disk (1MB). So the 2048 sectors offset to the beginning of the partition, plus the 8 sector superblock offset for super1.2 ends up at 2056. Sorry for the confusion there. Regards, --Chris On Sun, Nov 1, 2020 at 11:04 PM Xiao Ni wrote: > > > > On 10/23/2020 11:31 AM, Christopher Unkel wrote: > > Hello all, > > > > While investigating some performance issues on mdraid 10 volumes > > formed with "512e" disks (4k native/physical sector size but with 512 > > byte sector emulation), I've found two cases where mdraid will > > needlessly issue writes that start on 4k byte boundary, but are are > > shorter than 4k: > > > > 1. writes of the raid superblock; and > > 2. writes of the last page of the write-intent bitmap. > > > > The following is an excerpt of a blocktrace of one of the component > > members of a mdraid 10 volume during a 4k write near the end of the > > array: > > > > 8,32 11 2 0.000001687 711 D WS 2064 + 8 [kworker/11:1H] > > * 8,32 11 5 0.001454119 711 D WS 2056 + 1 [kworker/11:1H] > > * 8,32 11 8 0.002847204 711 D WS 2080 + 7 [kworker/11:1H] > > 8,32 11 11 0.003700545 3094 D WS 11721043920 + 8 [md127_raid1] > > 8,32 11 14 0.308785692 711 D WS 2064 + 8 [kworker/11:1H] > > * 8,32 11 17 0.310201697 711 D WS 2056 + 1 [kworker/11:1H] > > 8,32 11 20 5.500799245 711 D WS 2064 + 8 [kworker/11:1H] > > * 8,32 11 23 15.740923558 711 D WS 2080 + 7 [kworker/11:1H] > > > > Note the starred transactions, which each start on a 4k boundary, but > > are less than 4k in length, and so will use the 512-byte emulation. > > Sector 2056 holds the superblock, and is written as a single 512-byte > > write. Sector 2086 holds the bitmap bit relevant to the written > > sector. When it is written the active bits of the last page of the > > bitmap are written, starting at sector 2080, padded out to the end of > > the 512-byte logical sector as required. This results in a 3.5kb > > write, again using the 512-byte emulation. > > Hi Christopher > > Which superblock version do you use? If it's super1.1, superblock starts > at 0 sector. > If it's super1.2, superblock starts at 8 sector. If it's super1.0, > superblock starts at the > end of device and bitmap is before superblock. As mentioned above, > bitmap is behind > the superblock, so it should not be super1.0. So I have a question why > does 2056 hold > the superblock? > > Regards > Xiao > > > > > Note that in some arrays the last page of the bitmap may be > > sufficiently full that they are not affected by the issue with the > > bitmap write. > > > > As there can be a substantial penalty to using the 512-byte sector > > emulation (turning writes into read-modify writes if the relevant > > sector is not in the drive's cache) I believe it makes sense to pad > > these writes out to a 4k boundary. The writes are already padded out > > for "4k native" drives, where the short access is illegal. > > > > The following patch set changes the superblock and bitmap writes to > > respect the physical block size (e.g. 4k for today's 512e drives) when > > possible. In each case there is already logic for padding out to the > > underlying logical sector size. I reuse or repeat the logic for > > padding out to the physical sector size, but treat the padding out as > > optional rather than mandatory. > > > > The corresponding block trace with these patches is: > > > > 8,32 1 2 0.000003410 694 D WS 2064 + 8 [kworker/1:1H] > > 8,32 1 5 0.001368788 694 D WS 2056 + 8 [kworker/1:1H] > > 8,32 1 8 0.002727981 694 D WS 2080 + 8 [kworker/1:1H] > > 8,32 1 11 0.003533831 3063 D WS 11721043920 + 8 [md127_raid1] > > 8,32 1 14 0.253952321 694 D WS 2064 + 8 [kworker/1:1H] > > 8,32 1 17 0.255354215 694 D WS 2056 + 8 [kworker/1:1H] > > 8,32 1 20 5.337938486 694 D WS 2064 + 8 [kworker/1:1H] > > 8,32 1 23 15.577963062 694 D WS 2080 + 8 [kworker/1:1H] > > > > I do notice that the code for bitmap writes has a more sophisticated > > and thorough check for overlap than the code for superblock writes. > > (Compare write_sb_page in md-bitmap.c vs. super_1_load in md.c.) From > > what I know since the various structures starts have always been 4k > > aligned anyway, it is always safe to pad the superblock write out to > > 4k (as occurs on 4k native drives) but not necessarily futher. > > > > Feedback appreciated. > > > > --Chris > > > > > > Christopher Unkel (3): > > md: align superblock writes to physical blocks > > md: factor sb write alignment check into function > > md: pad writes to end of bitmap to physical blocks > > > > drivers/md/md-bitmap.c | 80 +++++++++++++++++++++++++----------------- > > drivers/md/md.c | 15 ++++++++ > > 2 files changed, 63 insertions(+), 32 deletions(-) > > >