2006-03-03 07:32:15

by Jim

[permalink] [raw]
Subject: SEEK_HOLE and SEEK_DATA support?


All,

Has there been any thought about adding SEEK_HOLE and SEEK_DATA (*)
support to Linux?

I ask primarily because of the interplay between 64-bit systems and
things like /var/log/lastlog (which appears as a 1.2TiB file due to
the nfsnobody UID of 4294967294).

(I'm realize that adding support for these additional seek() flags
wouldn't solve the problem ... archiving tools would still have to
implement it. And I can also hear the argument that Red Hat and other
distributions should re-implement lastlog handling to use a more modern
and efficient hashing/index format and perhaps that they should set
nfsnobody to "-1" ... I'd be curious if those details are driven by
some published standard or if they are artifacts of porting. I'd also
be curious what's happened with other 64-bit UNIX ports and whether
this issue ever came up in Linux ports to the Alpha or other 64-bit
processors).

As a stray data point I just did a quick experiment and just doing
a time cat /var/log/lastlog > /dev/null took about:

36.33user 2453.99system 41:35.90elapsed 99%CPU
(0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (133major+15minor)pagefaults 0swaps


On an otherwise idle 2GHz dual Opteron (yes, of course the extra
CPU is wasted for this job), reading SCSI disk hanging off a Fusion MPT
controller.

From what I hear our Networker processes pore over these NULs for about
two hours any time someone fails to exclude /var/log/lastlog from their
backup list.

* (see http://blogs.sun.com/roller/page/bonwick?entry=seek_hole_and_seek_data
for details)


(Please feel free to cc me on any responses, or I'll pick them up via
the archives and KT ... my account dropped off LKML years ago and I
don't want to punish my poor old IDSL line with the traffic now)

--
Jim Dennis


2006-03-03 08:33:24

by Lee Revell

[permalink] [raw]
Subject: Re: SEEK_HOLE and SEEK_DATA support?

On Thu, 2006-03-02 at 13:49 -0800, Jim Dennis wrote:
>
> I ask primarily because of the interplay between 64-bit systems and
> things like /var/log/lastlog (which appears as a 1.2TiB file due to
> the nfsnobody UID of 4294967294).
>
> (I'm realize that adding support for these additional seek() flags
> wouldn't solve the problem ... archiving tools would still have to
> implement it. And I can also hear the argument that Red Hat and other
> distributions should re-implement lastlog handling to use a more modern
> and efficient hashing/index format and perhaps that they should set
> nfsnobody to "-1" ...

So the presence of very high UIDs causes lastlog to be huge? That just
sounds like a RedHat bug.

Lee

2006-03-03 09:15:12

by Arjan van de Ven

[permalink] [raw]
Subject: Re: SEEK_HOLE and SEEK_DATA support?

On Fri, 2006-03-03 at 03:33 -0500, Lee Revell wrote:
> On Thu, 2006-03-02 at 13:49 -0800, Jim Dennis wrote:
> >
> > I ask primarily because of the interplay between 64-bit systems and
> > things like /var/log/lastlog (which appears as a 1.2TiB file due to
> > the nfsnobody UID of 4294967294).
> >
> > (I'm realize that adding support for these additional seek() flags
> > wouldn't solve the problem ... archiving tools would still have to
> > implement it. And I can also hear the argument that Red Hat and other
> > distributions should re-implement lastlog handling to use a more modern
> > and efficient hashing/index format and perhaps that they should set
> > nfsnobody to "-1" ...
>
> So the presence of very high UIDs causes lastlog to be huge? That just
> sounds like a RedHat bug.
it causes it to be a sparse file

lastlog is an array based file format ;)
but sparse

2006-03-03 17:04:33

by Jim

[permalink] [raw]
Subject: Re: SEEK_HOLE and SEEK_DATA support?


On Fri, 03 Mar 2006 10:15:05 +0100 Arjan van de Ven Wrote:

> On Fri, 2006-03-03 at 03:33 -0500, Lee Revell wrote:
>> On Thu, 2006-03-02 at 13:49 -0800, Jim Dennis wrote:

>>> I ask primarily because of the interplay between 64-bit systems and
>>> things like /var/log/lastlog (which appears as a 1.2TiB file due to
>>> the nfsnobody UID of 4294967294).

>>> (I'm realize that adding support for these additional seek() flags
>>> wouldn't solve the problem ... archiving tools would still have to
>>> implement it. And I can also hear the argument that Red Hat and
>>> other
>>> distributions should re-implement lastlog handling to use a more
>>> modern
>>> and efficient hashing/index format and perhaps that they should set
^^- NOT
>>> nfsnobody to "-1" ...

[correction: they should NOT set ...]

>> So the presence of very high UIDs causes lastlog to be huge? That
>> just
>> sounds like a RedHat bug.

> it causes it to be a sparse file

> lastlog is an array based file format ;) but sparse

Perhaps I should have been a bit more clear. /var/log/lastlog has
been a sparse file in most implementation for ... well ... forever.

The example issue is that the support for large UIDs and the convention
of setting nfsnobody to -1 (4294967294) combine to create a file whose
size is very large. The du of the file is (in my case) only about
100KiB. So there's a small cluster of used blocks for the valid
corporate UIDs that have ever accessed this machine ... then a huge
allocate hole, and then one block storing the lastlog timestamp for
nfsnobody.

However, this message was not intended to dwell on the cause of that
huge sparse file ... but rather to inquire as to the core issue;
how do we efficiently handle skipping over (potentially huge)
allocation holes in a portable fashion that might be adopted by
archiving and other tools? I provided this example simply to point
out that it does happen, in the real world and has a significant
cost (40 minutes to scan through NULs with which the filesystem fills
the hole for read()s).

OpenSolaris has implemented a mechanism for doing this and it sounds
reasonable from my admittedly superficial perspective.

--
Jim Dennis

2006-03-03 17:58:21

by Phillip Susi

[permalink] [raw]
Subject: Re: SEEK_HOLE and SEEK_DATA support?

Aren't there already apis to query for the holes in the file, and
doesn't tar already use them to efficiently back up sparse files? I
seem to remember seeing that somewhere.

Jim Dennis wrote:
>
> Perhaps I should have been a bit more clear. /var/log/lastlog has
> been a sparse file in most implementation for ... well ... forever.
>
> The example issue is that the support for large UIDs and the convention
> of setting nfsnobody to -1 (4294967294) combine to create a file whose
> size is very large. The du of the file is (in my case) only about
> 100KiB. So there's a small cluster of used blocks for the valid
> corporate UIDs that have ever accessed this machine ... then a huge
> allocate hole, and then one block storing the lastlog timestamp for
> nfsnobody.
>
> However, this message was not intended to dwell on the cause of that
> huge sparse file ... but rather to inquire as to the core issue;
> how do we efficiently handle skipping over (potentially huge)
> allocation holes in a portable fashion that might be adopted by
> archiving and other tools? I provided this example simply to point
> out that it does happen, in the real world and has a significant
> cost (40 minutes to scan through NULs with which the filesystem fills
> the hole for read()s).
>
> OpenSolaris has implemented a mechanism for doing this and it sounds
> reasonable from my admittedly superficial perspective.
>

2006-03-03 22:03:31

by Nicholas Miell

[permalink] [raw]
Subject: Re: SEEK_HOLE and SEEK_DATA support?

On Fri, 2006-03-03 at 12:56 -0500, Phillip Susi wrote:
> Aren't there already apis to query for the holes in the file,

Yes, but they are filesystem-specific. (I think only XFS has them at
this point.)

> and doesn't tar already use them to efficiently back up sparse files?

No (at least, not in GNU tar 1.15.1).

--
Nicholas Miell <[email protected]>

2006-03-05 03:07:21

by Andrew Morton

[permalink] [raw]
Subject: Re: SEEK_HOLE and SEEK_DATA support?

[email protected] (Jim Dennis) wrote:
>
>
> All,
>
> Has there been any thought about adding SEEK_HOLE and SEEK_DATA (*)
> support to Linux?
>
> I ask primarily because of the interplay between 64-bit systems and
> things like /var/log/lastlog (which appears as a 1.2TiB file due to
> the nfsnobody UID of 4294967294).
>
> (I'm realize that adding support for these additional seek() flags
> wouldn't solve the problem ... archiving tools would still have to
> implement it. And I can also hear the argument that Red Hat and other
> distributions should re-implement lastlog handling to use a more modern
> and efficient hashing/index format and perhaps that they should set
> nfsnobody to "-1" ... I'd be curious if those details are driven by
> some published standard or if they are artifacts of porting. I'd also
> be curious what's happened with other 64-bit UNIX ports and whether
> this issue ever came up in Linux ports to the Alpha or other 64-bit
> processors).
>
> As a stray data point I just did a quick experiment and just doing
> a time cat /var/log/lastlog > /dev/null took about:
>
> 36.33user 2453.99system 41:35.90elapsed 99%CPU
> (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (133major+15minor)pagefaults 0swaps
>
>
> On an otherwise idle 2GHz dual Opteron (yes, of course the extra
> CPU is wasted for this job), reading SCSI disk hanging off a Fusion MPT
> controller.
>
> From what I hear our Networker processes pore over these NULs for about
> two hours any time someone fails to exclude /var/log/lastlog from their
> backup list.

This can already be solved in userspace (and I'm sure it already has been
in some backup programs). Use the FIBMAP ioctl() against the fd to find
out whether a particular block in the file is actually instantiated.

Not all filesystems necessarily implement FIBMAP, so one would need to fall
back to sucky mode if FIBMAP failed.

2006-03-05 04:18:47

by Nicholas Miell

[permalink] [raw]
Subject: Re: SEEK_HOLE and SEEK_DATA support?

On Sat, 2006-03-04 at 19:05 -0800, Andrew Morton wrote:
> [email protected] (Jim Dennis) wrote:
> >
> >
> > All,
> >
> > Has there been any thought about adding SEEK_HOLE and SEEK_DATA (*)
> > support to Linux?
> >
> > I ask primarily because of the interplay between 64-bit systems and
> > things like /var/log/lastlog (which appears as a 1.2TiB file due to
> > the nfsnobody UID of 4294967294).
> >
> > (I'm realize that adding support for these additional seek() flags
> > wouldn't solve the problem ... archiving tools would still have to
> > implement it. And I can also hear the argument that Red Hat and other
> > distributions should re-implement lastlog handling to use a more modern
> > and efficient hashing/index format and perhaps that they should set
> > nfsnobody to "-1" ... I'd be curious if those details are driven by
> > some published standard or if they are artifacts of porting. I'd also
> > be curious what's happened with other 64-bit UNIX ports and whether
> > this issue ever came up in Linux ports to the Alpha or other 64-bit
> > processors).
> >
> > As a stray data point I just did a quick experiment and just doing
> > a time cat /var/log/lastlog > /dev/null took about:
> >
> > 36.33user 2453.99system 41:35.90elapsed 99%CPU
> > (0avgtext+0avgdata 0maxresident)k
> > 0inputs+0outputs (133major+15minor)pagefaults 0swaps
> >
> >
> > On an otherwise idle 2GHz dual Opteron (yes, of course the extra
> > CPU is wasted for this job), reading SCSI disk hanging off a Fusion MPT
> > controller.
> >
> > From what I hear our Networker processes pore over these NULs for about
> > two hours any time someone fails to exclude /var/log/lastlog from their
> > backup list.
>
> This can already be solved in userspace (and I'm sure it already has been
> in some backup programs). Use the FIBMAP ioctl() against the fd to find
> out whether a particular block in the file is actually instantiated.
>
> Not all filesystems necessarily implement FIBMAP, so one would need to fall
> back to sucky mode if FIBMAP failed.
>

FIBMAP is a privileged operation.

--
Nicholas Miell <[email protected]>