Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753065AbcDORRx (ORCPT ); Fri, 15 Apr 2016 13:17:53 -0400 Received: from mail-bn1on0135.outbound.protection.outlook.com ([157.56.110.135]:55456 "EHLO na01-bn1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752529AbcDORRu (ORCPT ); Fri, 15 Apr 2016 13:17:50 -0400 Authentication-Results: fromorbit.com; dkim=none (message not signed) header.d=none;fromorbit.com; dmarc=none action=none header.from=hpe.com; Message-ID: <57112235.1090201@hpe.com> Date: Fri, 15 Apr 2016 13:17:41 -0400 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: Dave Chinner CC: "Theodore Ts'o" , Andreas Dilger , , , Tejun Heo , Christoph Lameter , Scott J Norton , Douglas Hatch , Toshimitsu Kani Subject: Re: [PATCH v3 1/2] ext4: Pass in DIO_SKIP_DIO_COUNT flag if inode_dio_begin() called References: <1460484775-33359-1-git-send-email-Waiman.Long@hpe.com> <1460484775-33359-2-git-send-email-Waiman.Long@hpe.com> <20160414031634.GJ10643@dastard> <570FC379.7000107@hpe.com> <20160415081757.GK10643@dastard> In-Reply-To: <20160415081757.GK10643@dastard> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [72.71.243.152] X-ClientProxiedBy: SN1PR10CA0084.namprd10.prod.outlook.com (10.164.10.180) To TU4PR84MB0317.NAMPRD84.PROD.OUTLOOK.COM (10.162.186.27) X-MS-Office365-Filtering-Correlation-Id: 1ef12ba3-4650-46b8-c3a2-08d36551dd6e X-Microsoft-Exchange-Diagnostics: 1;TU4PR84MB0317;2:oDxCJV2yDRgjO9Eix+Fn/8Er7EnUpz3WkVnL3ctcFyIfLz00d7jihUQPYUKQSq91oJEvZivOIiUMXysQZm2KsmVu7o348HKDEjJ/Gn26vqCNIWj5xY/mk+ACI0IDO2pPnuGlVppIBinpbpwQ7tIM0YP4ldbLrqKoFzHZ77er8+ciP5jm0WDqbqmpD+UwB5dM;3:twn7kpGXNpGiCQnuJVCLOTU95tAAWwx17BlMsvv6+A+hp5gxmu+yloZuxYTT3NxDHsMcMrV5oc5aTd9ewpDPV+DsnN3LebeT8S9ZD5kpyrhWa/k2GwIykaW0fmjli76v;25:uQSvJ6LtUQT2K0xSKu8SGuh63ldHCJJrd25Jj4O62uRC5NhQRc9gxetRTDniWoFWNjLWdXoZj5zRXdg4e7vK5GFfcCTNszhUUE1B4gMsHIz43pcOHxkeHOfAGAXKCN6GjlN9G7C1EsJS//tj/xG3uYBa25G9e/hDhNiffIpxnWC46LbOwzAj9LdMcAIjE7W+CcoEalaDfl2RKnAim5qfZhqFEMshKxfY79NhKwZ4rUeNwoa1CHGDMXDkWbqR1dJ12UufqCmauZoT2/icg0FDfQB17EJ6dapfKOKjypw60wg6bC+Fs/FMk5BPelJX51kKCW6NYOGCXZwI8V8MNLSxEg== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:TU4PR84MB0317; X-LD-Processed: 105b2061-b669-4b31-92ac-24d304d195dc,ExtAddr X-Microsoft-Exchange-Diagnostics: 1;TU4PR84MB0317;20:fyHsAQFTPf0GAWU0ZTZi4eufLBkevqrz28E4mZ1qeyPHk9WfdzPKj6enoR5/bJHI/cl8n9u3JZAwYlORvi070LkUbQmQHtMRMQokME86nXyUAEzW5fzjKW5dO4ot91qbzxmdLtj4IIMcNgJdq6k0V1cEFcLM7jzxAIJlNeRscraWB1QxZKF34lFumnZojLyswLu0pyjwiro7SxqgRY6vej+WwJ+wx0BzHYievT/OofniIf5k7iBOSTVmzNfAf21vOMrNQHGM0dlbfJHych5ESvSwcHssg+SRnqqVjNLhdBYdnP+SSFvEvkwj5JpOA4LNmNH9h6MqZd3w/gJqR83h5g==;4:mQZmOVXuWmYPlLMz4g9mZ5zDN5XdyDtQZRkraEOxZxtbBrhRlaA2spVnz68xlCen6/z9AL+hmYMcWW0fN3U4CEoSfgV3E3oGSmmwWPH6mSqxPZz9Sllw2NG8kYO/ztbeuY9d2WMwyy7w4vGyIB3p1x1k8J03Mz88w8enYaH199WBVrx/LxQtGLNSphc5Ex+lw555piVeOAtJvNu2uSdFN1nly8cKLA2xxLRpZCFZ3CGdfjGmnwUGjZUJFs6ZXTpFiLzYfZsvHix5jKFEGjE/Ie7sz8DzaD2m/znv0+9t/95PGPkiFYi9l/C9pzceno77s1t8B4CFvA86ONkiYqRmp2G+mDc/KqQuacEl1aEzFsrLxRK/vcqjRcQu6sbNuhlMU47pCKZfIhSBpLzBa8BcNg== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(9101521026)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001);SRVR:TU4PR84MB0317;BCL:0;PCL:0;RULEID:;SRVR:TU4PR84MB0317; X-Forefront-PRVS: 0913EA1D60 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(377454003)(24454002)(92566002)(99136001)(86362001)(5008740100001)(230700001)(87266999)(54356999)(76176999)(2950100001)(50986999)(23756003)(4001350100001)(110136002)(36756003)(77096005)(5004730100002)(42186005)(93886004)(189998001)(117636001)(59896002)(64126003)(117156001)(50466002)(83506001)(4326007)(47776003)(65956001)(66066001)(81166005)(65806001)(586003)(6116002)(2906002)(1096002)(122286003)(3846002)(62816006);DIR:OUT;SFP:1102;SCL:1;SRVR:TU4PR84MB0317;H:[192.168.142.158];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?iso-8859-1?Q?1;TU4PR84MB0317;23:VyM2Fi0a2CQKcq+vpqQJiNqFS4JxD2nULMoth/P?= =?iso-8859-1?Q?7py0Ark7ofE+sXwpZ70Rw4lT6YRRf68Pie9eQuLg0xw78Ue4eXhoKu0r6c?= =?iso-8859-1?Q?VhMZUhLlxlqj/2FB0981LeJqQNnpomauttzIASF47a2pGecI9YPvqR0MbR?= =?iso-8859-1?Q?IOFF2OC7DFJrYlrdYD1pmRy5XLMoRemFKXZ8THl6uyZcUX2wUqqnn8odg7?= =?iso-8859-1?Q?R1Zv8s+NpOJLpuYsZ4qlIPxDZ2Hqzr0aAzPAdT7jAUYFG+S8lwXA3502+K?= =?iso-8859-1?Q?26SX2r5e1NuhEzq0BRrykVHgF9zdHqPqM6yTEG8GFyqsGdrFuhmrHltvTO?= =?iso-8859-1?Q?ZAYzLjKTk5BZXJVX/YNvYuqL9yzp47S4F78BCpOzEY+mY+7cZpdzGGsWSA?= =?iso-8859-1?Q?4kdjXdGDIjHmc5P5J/Brs/NYtnqeHpOB2jhCY2b/GNQA0NuWXjEiy3E6Ud?= =?iso-8859-1?Q?B/ldhe0dCBX54zDt1TKsYZh3mtIfk1WXF9QuhVNEWVlGV3YJhunq8kD58/?= =?iso-8859-1?Q?Zb7sG6RwMFNL7YwyDGgZmf0yNvTw6wW3keriL8Dmq9C27GuW0gHCPf+nG0?= =?iso-8859-1?Q?uMgrb6UBbEnQI53EiQxN7f0pPa2gHQlg3bBjXsoOiQca8rIud22IG6WI4f?= =?iso-8859-1?Q?+w7De7WwImt3ejgUo1rMZVnf5BIlh+mppErSpzQqqU2Lio8ZM3jOul4zpV?= =?iso-8859-1?Q?5VnX6w65IHbQpM6RUF+/RPROEun3tXTUym9GDpywT8jCg3sZF+68cKTGVc?= =?iso-8859-1?Q?FlXjA90oM+mWYUvL5HJkQTGjc25VYQplGTkwPl1pSOAbSjf3oy9mCqiT6u?= =?iso-8859-1?Q?XhRnlLASrLwbqQNjI8X8dPOtkVgP8wpLYj3SfX9OEF9OA8VqZFiIAp4Dej?= =?iso-8859-1?Q?khSIh+t2uyQGyOD2MR3mr/33mPZTuyv4HJLgEg91wkrkCBqvTU72HCfgV2?= =?iso-8859-1?Q?/jsheoSwd4468SWrDrcOex5cbEOgb5TSEc8yJXKSt6mTz8wx60tSotmuwJ?= =?iso-8859-1?Q?URN3Ief6PTvetzkpZzAJsZd0xdyk4XLsiwa9Hza8dw1nluv4pVZuNBeE6U?= =?iso-8859-1?Q?yl0qUU2aChv8BNIvlRROXu32APk0uEGsgt3QG0CF8CgHlWZ67xzzOhoUYW?= =?iso-8859-1?Q?tmyP7ddVJErDDi+U2dsTytbJoRw=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1;TU4PR84MB0317;5:ncI+ax7Kl396yP667HonhpnEPF9rDuMnXVza3QpUx2CQeeGFx9IBOQSJC4LFHC1LTDl9Xjtw1PqhR8hJ1+TyOI3li//HnZx26jAEmPzTFIUa85RhUGxd3hs0iU2g5MiWqWXfIIrwl27Jh5x2yJ3v2w==;24:i/dLyi5i/a3r/1QhPV7kFxr6JDV8yYJMzuI+xvJ7L27KV700feJE1GQrBdvRRC5o5AnccsqWM1BPwKQqS3DgNmASQohPF5LNkUCipX+RyjY= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: hpe.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Apr 2016 17:17:46.0995 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: TU4PR84MB0317 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2804 Lines: 72 On 04/15/2016 04:17 AM, Dave Chinner wrote: > On Thu, Apr 14, 2016 at 12:21:13PM -0400, Waiman Long wrote: >> On 04/13/2016 11:16 PM, Dave Chinner wrote: >>> On Tue, Apr 12, 2016 at 02:12:54PM -0400, Waiman Long wrote: >>>> When performing direct I/O, the current ext4 code does >>>> not pass in the DIO_SKIP_DIO_COUNT flag to dax_do_io() or >>>> __blockdev_direct_IO() when inode_dio_begin() has, in fact, been >>>> called. This causes dax_do_io()/__blockdev_direct_IO() to invoke >>>> inode_dio_begin()/inode_dio_end() internally. This doubling of >>>> inode_dio_begin()/inode_dio_end() calls are wasteful. >>>> >>>> This patch removes the extra internal inode_dio_begin()/inode_dio_end() >>>> calls when those calls are being issued by the caller directly. For >>>> really fast storage systems like NVDIMM, the removal of the extra >>>> inode_dio_begin()/inode_dio_end() can give a meaningful boost to >>>> I/O performance. >>> Doesn't this break truncate IO serialisation? >>> >>> i.e. it appears to me that the ext4 use of inode_dio_begin()/ >>> inode_dio_end() does not cover AIO, where the IO is still in flight >>> when submission returns. i.e. the inode_dio_end() call >>> needs to be in IO completion, not in the submitter context. The only >>> reason it doesn't break right now is that the duplicate accounting >>> in the DIO code is correct w.r.t. AIO. Hence bypassing the DIO >>> accounting will cause AIO writes to race with truncate. >>> >>> Same AIO vs truncate problem occurs with the indirect read case you >>> modified to skip the direct IO layer accounting. >> I don't quite understand how the duplicate accounting is correct wrt >> AIO. Both the direct and indirect paths are something like: >> >> inode_dio_begin() >> ... >> inode_dio_begin() >> ... >> inode_dio_end() >> ... >> inode_dio_end() > With AIO: > > inode_dio_begin() > ... > inode_dio_begin() > > ... > inode_dio_end() > > > > dio_complete > inode_dio_end() > > IOWs, the ext4 accounting is broken w.r.t. AIO, where IO submission > does not wait for IO completion before returning. > >> What the patch does is to eliminate the innermost >> inode_dio_begin/end pair. > Yes, and with that change inode_dio_wait() no longer waits for > AIO+DIO writes on ext4, hence breaking truncate IO barrier > requirements of inode_dio_wait(). > > Cheers, > > Dave. You are right and thank for pointing this out to me. I think I focus too much on the dax_do_io() internal and didn't realize that inode_dio_end() can be deferred in __blockdev_direct_IO(). I will update my patch to eliminate the extra inode_dio_begin/end pair only for dax_do_io(). Cheers, Longman