Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp306540pxu; Fri, 23 Oct 2020 00:52:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxV3o2GDbiNoC/1Zb4II2R2V6ws1dXnhc5TY7+0oLJ1uL6+/gvn+E1xAdXA2rrkdgcxgvxn X-Received: by 2002:a17:906:a1d4:: with SMTP id bx20mr764439ejb.262.1603439553584; Fri, 23 Oct 2020 00:52:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603439553; cv=none; d=google.com; s=arc-20160816; b=GzeZg7hnisO4ftuvAuEAK8f+JkqvilXRYUmFR4IMVO0OSKAkcF3JtPqmkVev0kM5lH 8ZAUxTrYO8+xoku2bPJJB8VKOt0tWxC02n23MA1TS3V0M2pcAXBOEiJqj7yka+xyPEyz 8kRTZrLelMddgHisjewlZTn0UGsmDuIDqlC/HHCzUwjobL5cpChZvIzvhh3l6DVkonej OW9B6qn9zZI5v1402RD6o0PiuW6if7/a2/QdktAYMKulUOJXqExQ7MQNwR3cACWD67uc GeJIH3+7C8RHXDiHj2sS+uMdb0C9gKC5az1/7QETX5sjpj5KC5zNEH+4GziBykpDQ8pV pjBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from :dkim-signature; bh=ReTTaNs/1URCBe015zmmsnkZL8avfnlCHdKrYfMsUo4=; b=EkvCxpHMvt5ZsyeJP5D31gT0B5QIPbrYpq4krpOn4R4JcyN95StokcZrPqKvSfJQ7V 0IvEhmVKT4ErUmSp8WcmdWUnpj82PAxS28wkyXwmMrG8SWqD4/uFioWwhn6slB4kDP39 u0Qzl/uCkXDicGNyKqLOI7h5YHJJHWf8MZW+Clv68bynl8U4TkA/n7vz0JuNsnZ0PNLs G48TDL+FM4EYIvQmaHGYWl8UU4K84Mo/XRZDw0P8jCU+qPcceTKxTeUW1Du8xvl59/Hp LGQVhpdPlh984RXfgq+5zBlzdHepDHoygiW7ZsazdpFux7l0Jq0fOxM6lBaglYQUcGYB k2Hg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@drivescale-com.20150623.gappssmtp.com header.s=20150623 header.b=iX0Eh39a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id jx2si342878ejb.241.2020.10.23.00.52.11; Fri, 23 Oct 2020 00:52:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@drivescale-com.20150623.gappssmtp.com header.s=20150623 header.b=iX0Eh39a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S460384AbgJWDbs (ORCPT + 99 others); Thu, 22 Oct 2020 23:31:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40948 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S460381AbgJWDbp (ORCPT ); Thu, 22 Oct 2020 23:31:45 -0400 Received: from mail-ua1-x961.google.com (mail-ua1-x961.google.com [IPv6:2607:f8b0:4864:20::961]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E36E0C0613CF for ; Thu, 22 Oct 2020 20:31:43 -0700 (PDT) Received: by mail-ua1-x961.google.com with SMTP id x11so1123405uav.1 for ; Thu, 22 Oct 2020 20:31:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=drivescale-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id; bh=ReTTaNs/1URCBe015zmmsnkZL8avfnlCHdKrYfMsUo4=; b=iX0Eh39aqpOwnxJBatJaScEMnOAfgzq3qvGvZLeXjM61EmZgTGO89tuYRjCMi4Vks6 qrjr0Cua8S6hdGMKyDsTBRvygocQyPw+RKQa4U++kuTdKniSwXCoYQ3zAtg1Cik9M8my xIUsq5GOicpeaB6RciM+uis9JHr5+7lizqD5ram1wHEhcWoqZjrwrbYW6pg0lAwlw7kY Gy+KctTCtbD4G3F/SdXA8YY0t9egtxPxywawRsvnw84v95zN9mgHqT63yn0Ch7OuRvZh MtFgXjlqxV0KsqWdGGjmAvT3afzaxfhN0//s7pRfWmHzy9C2FG3iKHUdA8ehJi5kfmv7 /dAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=ReTTaNs/1URCBe015zmmsnkZL8avfnlCHdKrYfMsUo4=; b=g3eqOYEBEKPHBQB+fJCBffiAig+olsI6aKCqMozlTTXcL9XOjoA6DsjIN5sAv5loLp px6SnU9HBt3+SrT3iCMp9euOF1T7FiCK+vFmjY1QGEw0JvD8ypT1fbaZvytNMaZ2bQBK Km3msbOrW0zql5yZp1yU5UNUfk8r/jfD0M5GLjTBNHhsH4eMMOxojdJnBXUXoG8qOOvq UzUdkR+SEzhV2N/zIYHLWeFv+9/cc9Qp8ZzqQO3eTDDx2E/RxU62ZuFlyhBQ8WnEzujs vZtsUMt2MfbowFT9addUYZSH2KJdVmA1+IyVXeGOuTHSDdmh4gYAH5Z1KUZyX0/eNyTM ujzA== X-Gm-Message-State: AOAM530/4GTNf5z2GdB0wz4oORpW21+14GgI9zqLXEwGkcbAI3RSPquD CsPoqjUnY65slVsD5iA5DnnL8SMEiHz3S6g+rBhtkSNaqLieHA== X-Received: by 2002:ab0:77d8:: with SMTP id y24mr130681uar.72.1603423903067; Thu, 22 Oct 2020 20:31:43 -0700 (PDT) Received: from dcs.hq.drivescale.com ([68.74.115.3]) by smtp-relay.gmail.com with ESMTP id p17sm40671vkf.7.2020.10.22.20.31.42; Thu, 22 Oct 2020 20:31:43 -0700 (PDT) X-Relaying-Domain: drivescale.com Received: from localhost.localdomain (gw1-dc.hq.drivescale.com [192.168.33.175]) by dcs.hq.drivescale.com (Postfix) with ESMTP id 2E05A420D3; Fri, 23 Oct 2020 03:31:42 +0000 (UTC) From: Christopher Unkel To: linux-raid@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Song Liu , cunkel@drivescale.com Subject: [PATCH 0/3] mdraid sb and bitmap write alignment on 512e drives Date: Thu, 22 Oct 2020 20:31:27 -0700 Message-Id: <20201023033130.11354-1-cunkel@drivescale.com> X-Mailer: git-send-email 2.17.1 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello all, While investigating some performance issues on mdraid 10 volumes formed with "512e" disks (4k native/physical sector size but with 512 byte sector emulation), I've found two cases where mdraid will needlessly issue writes that start on 4k byte boundary, but are are shorter than 4k: 1. writes of the raid superblock; and 2. writes of the last page of the write-intent bitmap. The following is an excerpt of a blocktrace of one of the component members of a mdraid 10 volume during a 4k write near the end of the array: 8,32 11 2 0.000001687 711 D WS 2064 + 8 [kworker/11:1H] * 8,32 11 5 0.001454119 711 D WS 2056 + 1 [kworker/11:1H] * 8,32 11 8 0.002847204 711 D WS 2080 + 7 [kworker/11:1H] 8,32 11 11 0.003700545 3094 D WS 11721043920 + 8 [md127_raid1] 8,32 11 14 0.308785692 711 D WS 2064 + 8 [kworker/11:1H] * 8,32 11 17 0.310201697 711 D WS 2056 + 1 [kworker/11:1H] 8,32 11 20 5.500799245 711 D WS 2064 + 8 [kworker/11:1H] * 8,32 11 23 15.740923558 711 D WS 2080 + 7 [kworker/11:1H] Note the starred transactions, which each start on a 4k boundary, but are less than 4k in length, and so will use the 512-byte emulation. Sector 2056 holds the superblock, and is written as a single 512-byte write. Sector 2086 holds the bitmap bit relevant to the written sector. When it is written the active bits of the last page of the bitmap are written, starting at sector 2080, padded out to the end of the 512-byte logical sector as required. This results in a 3.5kb write, again using the 512-byte emulation. Note that in some arrays the last page of the bitmap may be sufficiently full that they are not affected by the issue with the bitmap write. As there can be a substantial penalty to using the 512-byte sector emulation (turning writes into read-modify writes if the relevant sector is not in the drive's cache) I believe it makes sense to pad these writes out to a 4k boundary. The writes are already padded out for "4k native" drives, where the short access is illegal. The following patch set changes the superblock and bitmap writes to respect the physical block size (e.g. 4k for today's 512e drives) when possible. In each case there is already logic for padding out to the underlying logical sector size. I reuse or repeat the logic for padding out to the physical sector size, but treat the padding out as optional rather than mandatory. The corresponding block trace with these patches is: 8,32 1 2 0.000003410 694 D WS 2064 + 8 [kworker/1:1H] 8,32 1 5 0.001368788 694 D WS 2056 + 8 [kworker/1:1H] 8,32 1 8 0.002727981 694 D WS 2080 + 8 [kworker/1:1H] 8,32 1 11 0.003533831 3063 D WS 11721043920 + 8 [md127_raid1] 8,32 1 14 0.253952321 694 D WS 2064 + 8 [kworker/1:1H] 8,32 1 17 0.255354215 694 D WS 2056 + 8 [kworker/1:1H] 8,32 1 20 5.337938486 694 D WS 2064 + 8 [kworker/1:1H] 8,32 1 23 15.577963062 694 D WS 2080 + 8 [kworker/1:1H] I do notice that the code for bitmap writes has a more sophisticated and thorough check for overlap than the code for superblock writes. (Compare write_sb_page in md-bitmap.c vs. super_1_load in md.c.) From what I know since the various structures starts have always been 4k aligned anyway, it is always safe to pad the superblock write out to 4k (as occurs on 4k native drives) but not necessarily futher. Feedback appreciated. --Chris Christopher Unkel (3): md: align superblock writes to physical blocks md: factor sb write alignment check into function md: pad writes to end of bitmap to physical blocks drivers/md/md-bitmap.c | 80 +++++++++++++++++++++++++----------------- drivers/md/md.c | 15 ++++++++ 2 files changed, 63 insertions(+), 32 deletions(-) -- 2.17.1