Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752783AbbFKK3s (ORCPT ); Thu, 11 Jun 2015 06:29:48 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:47042 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750752AbbFKK3h (ORCPT ); Thu, 11 Jun 2015 06:29:37 -0400 Date: Thu, 11 Jun 2015 03:29:35 -0700 From: Christoph Hellwig To: Matias Bjorling Cc: Christoph Hellwig , axboe@fb.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, Stephen.Bates@pmcs.com, keith.busch@intel.com, javier@lightnvm.io Subject: Re: [PATCH v4 0/8] Support for Open-Channel SSDs Message-ID: <20150611102935.GA4419@infradead.org> References: <1433508870-28251-1-git-send-email-m@bjorling.me> <20150609074643.GA5707@infradead.org> <55787DDE.7020801@bjorling.me> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55787DDE.7020801@bjorling.me> User-Agent: Mutt/1.5.23 (2014-03-12) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1936 Lines: 36 On Wed, Jun 10, 2015 at 08:11:42PM +0200, Matias Bjorling wrote: > 1. A get/put flash block API, that user-space applications can use. > That will enable application-driven FTLs. E.g. RocksDB can be integrated > tightly with the SSD. Allowing data placement and garbage collection to > be strictly controlled. Data placement will reduce the need for > over-provisioning, as data that age at the same time are placed in the > same flash block, and garbage collection can be scheduled to not > interfere with user requests. Together, it will remove I/O outliers > significantly. > > 2. Large drive arrays with global FTL. The stacking block device model > enables this. It allows an FTL to span multiple devices, and thus > perform data placement and garbage collection over tens to hundred of > devices. That'll greatly improve wear-leveling, as there is a much > higher probability of a fully inactive block with more flash. > Additionally, as the parallelism grows within the storage array, we can > slice and dice the devices using the get/put flash block API and enable > applications to get predictable performance, while using large arrays > that have a single address space. > > If it too much for now to get upstream, I can live with (2) removed and > then I make the changes you proposed. In this case your driver API really isn't the Linux block API anymore. I think the right API is a simple asynchronous submit with callback into the driver, with the block device only provided by the lightnvm layer. Note that for NVMe it might still make sense to implement this using blk-mq and a struct request, but those should be internal similar to how NVMe implements admin commands. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/