Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752288AbdDHU4Z (ORCPT ); Sat, 8 Apr 2017 16:56:25 -0400 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:20084 "EHLO esa6.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750982AbdDHU4P (ORCPT ); Sat, 8 Apr 2017 16:56:15 -0400 X-IronPort-AV: E=Sophos;i="5.37,174,1488816000"; d="scan'208";a="9609006" From: Bart Van Assche To: =?iso-8859-1?Q?Javier_Gonz=E1lez?= , "mb@lightnvm.io" CC: "linux-block@vger.kernel.org" , "linux-kernel@vger.kernel.org" , =?iso-8859-1?Q?Javier_Gonz=E1lez?= Subject: Re: [PATCH v3] lightnvm: physical block device (pblk) target Thread-Topic: [PATCH v3] lightnvm: physical block device (pblk) target Thread-Index: AQHSsKqN1Za0Hn/Syk2atMNQRkTzgA== Date: Sat, 8 Apr 2017 20:56:10 +0000 Message-ID: References: <1491591015-7554-1-git-send-email-javier@cnexlabs.com> <1491591015-7554-2-git-send-email-javier@cnexlabs.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: lightnvm.io; dkim=none (message not signed) header.d=none;lightnvm.io; dmarc=none action=none header.from=sandisk.com; x-originating-ip: [76.102.111.178] x-microsoft-exchange-diagnostics: 1;CY1PR0401MB1535;7:fEwfgUYsefI4nh2RrLnmqL/qbrvWgaDEKEoXkCRDsCw64f566N5OsRVR6GyLT1FBoAMsNRitxFRklyKZdUEwkG9YVD1IrQxx2nu1uzf90IjsIPdNVsokCnd2DciJFDFC9ld0PtoA2PfifACJtJhHXYtYnvBk3Pl81uqkWLKFnIMyKLGGulxMvq3l0UB0itZUm1VMRdPKzqm+Ep4r8Cyt2jkdIqGzgmIQkYvE24kUwd47rw+G+ckPuctfXkf+CD7SmK58I+ixvtNJBwEhYDax/OO94ty8EH3oioPRHBonf5QZhrgmKH+J/chEoQFmbIz6nteHPASbXoaI0JvP+82mYw==;20:I5ov6jhSQm4S12O7PVJV/4F+OODbnJ0rfDsOeUjYk4qKYO73NmvpFW0GC/OkOEthqMeKm49njgAX+MjkqcrtoFR6VyBcwjW0/f9WZoRDUjQEzMVwZppj8gGs5Q3Wof3yoj1P7G2Ew9SFdQoHE1OPs2/ZKfKAND3jPC2PTi3lXLc= x-ms-office365-filtering-correlation-id: e834dd92-9167-49c5-5f7c-08d47ec1aff1 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(2017030254075)(48565401081)(201703131423075)(201703031133081);SRVR:CY1PR0401MB1535; wdcipoutbound: EOP-TRUE x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(17755550239193); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040450)(601004)(2401047)(8121501046)(5005006)(10201501046)(93006095)(93001095)(3002001)(6055026)(6041248)(201703131423075)(201702281528075)(201703061421075)(20161123564025)(20161123560025)(20161123555025)(20161123562025)(6072148);SRVR:CY1PR0401MB1535;BCL:0;PCL:0;RULEID:;SRVR:CY1PR0401MB1535; x-forefront-prvs: 0271483E06 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(39410400002)(39850400002)(39400400002)(39860400002)(39450400003)(39840400002)(24454002)(122556002)(3280700002)(6506006)(53936002)(77096006)(4326008)(6246003)(7696004)(2906002)(3660700001)(99286003)(2900100001)(6436002)(55016002)(9686003)(54906002)(53546009)(2501003)(25786009)(86362001)(8936002)(38730400002)(229853002)(54356999)(50986999)(76176999)(3846002)(102836003)(66066001)(6116002)(189998001)(33656002)(7736002)(74316002)(5660300001)(81166006)(8676002)(305945005);DIR:OUT;SFP:1102;SCL:1;SRVR:CY1PR0401MB1535;H:CY1PR0401MB1536.namprd04.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" MIME-Version: 1.0 X-OriginatorOrg: sandisk.com X-MS-Exchange-CrossTenant-originalarrivaltime: 08 Apr 2017 20:56:11.0139 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: b61c8803-16f3-4c35-9b17-6f65f441df86 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR0401MB1535 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id v38KuUgL002866 Content-Length: 5179 Lines: 144 On 04/07/17 11:50, Javier Gonz?lez wrote: > Documentation/lightnvm/pblk.txt | 21 + > drivers/lightnvm/Kconfig | 19 + > drivers/lightnvm/Makefile | 5 + > drivers/lightnvm/pblk-cache.c | 112 +++ > drivers/lightnvm/pblk-core.c | 1641 ++++++++++++++++++++++++++++++++++++++ > drivers/lightnvm/pblk-gc.c | 542 +++++++++++++ > drivers/lightnvm/pblk-init.c | 942 ++++++++++++++++++++++ > drivers/lightnvm/pblk-map.c | 135 ++++ > drivers/lightnvm/pblk-rb.c | 859 ++++++++++++++++++++ > drivers/lightnvm/pblk-read.c | 513 ++++++++++++ > drivers/lightnvm/pblk-recovery.c | 1007 +++++++++++++++++++++++ > drivers/lightnvm/pblk-rl.c | 182 +++++ > drivers/lightnvm/pblk-sysfs.c | 500 ++++++++++++ > drivers/lightnvm/pblk-write.c | 408 ++++++++++ > drivers/lightnvm/pblk.h | 1127 ++++++++++++++++++++++++++ > include/linux/lightnvm.h | 57 +- > pblk-sysfs.c | 581 ++++++++++++++ This patch introduces two slightly different versions of pblk-sysfs.c - one at the top level and one in drivers/lightnvm. Please remove the file at the top level. > +config NVM_PBLK_L2P_OPT > + bool "PBLK with optimized L2P table for devices up to 8TB" > + depends on NVM_PBLK > + ---help--- > + Physical device addresses are typical 64-bit integers. Since we store > + the logical to physical (L2P) table on the host, this takes 1/500 of > + host memory (e.g., 2GB per 1TB of storage media). On drives under 8TB, > + it is possible to reduce this to 1/1000 (e.g., 1GB per 1TB). This > + option allows to do this optimization on the host L2P table. Why is NVM_PBLK_L2P_OPT a compile-time option instead of a run-time option? Since this define does not affect the definition of the ppa_addr I don't see why this has to be a compile-time option. For e.g. Linux distributors the only choice would be to disable NVM_PBLK_L2P_OPT. I think it would be unfortunate if no Linux distribution ever would be able to benefit from this optimization. > +#ifdef CONFIG_NVM_DEBUG > + atomic_add(nr_entries, &pblk->inflight_writes); > + atomic_add(nr_entries, &pblk->req_writes); > +#endif Has it been considered to use the "static key" feature such that consistency checks can be enabled at run-time instead of having to rebuild the kernel to enable CONFIG_NVM_DEBUG? > +#ifdef CONFIG_NVM_DEBUG > + BUG_ON(nr_rec_entries != valid_entries); > + atomic_add(valid_entries, &pblk->inflight_writes); > + atomic_add(valid_entries, &pblk->recov_gc_writes); > +#endif Are you aware that Linus is strongly opposed against using BUG_ON()? > +#ifdef CONFIG_NVM_DEBUG > + lockdep_assert_held(&l_mg->free_lock); > +#endif Why is lockdep_assert_held() surrounded with #ifdef CONFIG_NVM_DEBUG / #endif? Are you aware that lockdep_assert_held() do not generate any code with CONFIG_PROVE_LOCKING=n? > +static const struct block_device_operations pblk_fops = { > + .owner = THIS_MODULE, > +}; Is this data structure useful? If so, where is pblk_fops used? > +static void pblk_l2p_free(struct pblk *pblk) > +{ > + vfree(pblk->trans_map); > +} > + > +static int pblk_l2p_init(struct pblk *pblk) > +{ > + sector_t i; > + > + pblk->trans_map = vmalloc(sizeof(pblk_addr) * pblk->rl.nr_secs); > + if (!pblk->trans_map) > + return -ENOMEM; > + > + for (i = 0; i < pblk->rl.nr_secs; i++) > + pblk_ppa_set_empty(&pblk->trans_map[i]); > + > + return 0; > +} Has it been considered to add support for keeping a subset of the L2P translation table in memory instead of keeping it in memory in its entirety? > + sprintf(cache_name, "pblk_line_m_%s", pblk->disk->disk_name); Please use snprintf() or kasprintf() instead of printf(). That makes it easier for humans to verify that no buffer overflow is triggered. > +/* physical block device target */ > +static struct nvm_tgt_type tt_pblk = { > + .name = "pblk", > + .version = {1, 0, 0}, Are version numbers useful inside the kernel tree? > +void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, > + unsigned long *lun_bitmap, unsigned int valid_secs, > + unsigned int off) > +{ > + struct pblk_sec_meta *meta_list = rqd->meta_list; > + unsigned int map_secs; > + int min = pblk->min_write_pgs; > + int i; > + > + for (i = off; i < rqd->nr_ppas; i += min) { > + map_secs = (i + min > valid_secs) ? (valid_secs % min) : min; > + pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i], > + lun_bitmap, &meta_list[i], map_secs); > + } > +} Has it been considered to implement the above code such that no modulo (%) computation is needed, which is a relatively expensive operation? I think for this loop that can be done easily. > +static DECLARE_RWSEM(pblk_rb_lock); > + > +void pblk_rb_data_free(struct pblk_rb *rb) > +{ > + struct pblk_rb_pages *p, *t; > + > + down_write(&pblk_rb_lock); > + list_for_each_entry_safe(p, t, &rb->pages, list) { > + free_pages((unsigned long)page_address(p->pages), p->order); > + list_del(&p->list); > + kfree(p); > + } > + up_write(&pblk_rb_lock); > +} Global locks like pblk_rb_lock are a performance bottleneck on multi-socket systems. Why is that lock global? Bart.