Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1031115AbcCQRuy (ORCPT ); Thu, 17 Mar 2016 13:50:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46182 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932983AbcCQRuw (ORCPT ); Thu, 17 Mar 2016 13:50:52 -0400 Subject: Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks To: Linus Torvalds , Gregory Farnum References: <20160313233049.GA30721@dastard> <56E69398.7030508@redhat.com> <20160314144603.GO29218@thunk.org> <20160315201431.GG30721@dastard> <20160315223313.GH30721@dastard> <20160315225224.GD23848@thunk.org> <20160316015139.GC5826@birch.djwong.org> <7674C689-C07E-4D38-85EB-4FD9B55CBB35@dilger.ca> <20160317001502.GF23593@thunk.org> <56E9FB73.6040803@redhat.com> Cc: Eric Sandeen , "Theodore Ts'o" , Andreas Dilger , "Darrick J. Wong" , Dave Chinner , Andy Lutomirski , One Thousand Gnomes , Martin Petersen , Christoph Hellwig , Jens Axboe , Andrew Morton , Linux API , Linux Kernel Mailing List , shane.seymour@hpe.com, Bruce Fields , linux-fsdevel , Jeff Layton From: Ric Wheeler Message-ID: <56EAEE79.4030809@redhat.com> Date: Thu, 17 Mar 2016 13:50:49 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2185 Lines: 48 On 03/17/2016 01:47 PM, Linus Torvalds wrote: > On Wed, Mar 16, 2016 at 10:18 PM, Gregory Farnum wrote: >> So we've not asked for NO_HIDE_STALE on the mailing lists, but I think >> it was one of the problems Sage had using xfs in his BlueStore >> implementation and was a big part of why it moved to pure userspace. >> FileStore might use NO_HIDE_STALE in some places but it would be >> pretty limited. When it came up at Linux FAST we were discussing how >> it and similar things had been problems for us in the past and it >> would've been nice if they were upstream. > Hmm. > > So to me it really sounds like somebody should cook up a patch, but we > shouldn't put it in the upstream kernel until we get numbers and > actual "yes, we'd use this" from outside of google. > > I say "outside of google", because inside of google not only do we not > get numbers, but google can maintain their own patch. > > But maybe Ted could at least post the patch google uses, and somebody > in the Ceph community might want to at least try it out... > >> What *is* a big deal for >> FileStore (and would be easy to take advantage of) is the thematically >> similar O_NOMTIME flag, which is also about reducing metadata updates >> and got blocked on similar stupid-user grounds (although not security >> ones): http://thread.gmane.org/gmane.linux.kernel.api/10727. > Hmm. I don't hate that patch, because the NOATIME thing really does > wonders on many loads. NOMTIME makes sense. > > It's not like you can't do this with utimes() anyway. > > That said, I do wonder if people wouldn't just prefer to expand on and > improve on the lazytime. > > Is there some reason you guys didn't use that? > >> As noted though, we've basically given up and are moving to a >> pure-userspace solution as quickly as we can. > That argues against worrying about this all in the kernel unless there > are other users. > > Linus Just a note, when Greg says "user space solution", Ceph is looking at writing directly to raw block devices which is kind of a through back to early enterprise database trends. Ric