2004-01-30 03:36:09

by Kevin P. Fleming

[permalink] [raw]
Subject: tmpfs sparse file failure in glibc "make check"

I've been tracking down a problem in CVS glibc "make check" and it
appears that either it's a bug in tmpfs or an undocumented limitation of
tmpfs.

My system is running 2.6.2-rc2, with 1G of physical RAM (4G highmem mode
is enabled in the kernel). The glibc test does the following (snipped
from the source because it's a simple test):

int fd;
#define TWO_GB 2147483648LL

...

fd = mkstemp64 (name);
ret = lseek64 (fd, TWO_GB+100, SEEK_SET);
ret = write (fd, "Hello", 5);


On my system the temp file is created in /tmp, and tmpfs is mounted on
/tmp (with no mount options limiting maximum size or anything of that
type). With no swap space turned on, this write() returns ENOMEM.

With 512MB or 1GB of swap space, it still returns ENOMEM. With 1.5GB of
swap space, the write() succeeds. However, this is a sparse file with a
total of 6 bytes of content :-)

I could understand if tmpfs was limiting the file size to half of
physical RAM+swap, but the test succeeds at 2.5GB total even though the
sparse file is created at 2GB size.

For now I work around the test failure by pointing glibc to a different
filesystem for this test, but I'm wondering why the tmpfs filesystem
can't pass this test like a "normal" filesystem does...


2004-01-30 16:53:30

by Hugh Dickins

[permalink] [raw]
Subject: Re: tmpfs sparse file failure in glibc "make check"

On Thu, 29 Jan 2004, Kevin P. Fleming wrote:
>
> I've been tracking down a problem in CVS glibc "make check" and it
> appears that either it's a bug in tmpfs or an undocumented limitation of
> tmpfs.
>
> My system is running 2.6.2-rc2, with 1G of physical RAM (4G highmem mode
> is enabled in the kernel). The glibc test does the following (snipped
> from the source because it's a simple test):
>
> int fd;
> #define TWO_GB 2147483648LL
>
> ...
>
> fd = mkstemp64 (name);
> ret = lseek64 (fd, TWO_GB+100, SEEK_SET);
> ret = write (fd, "Hello", 5);
>
>
> On my system the temp file is created in /tmp, and tmpfs is mounted on
> /tmp (with no mount options limiting maximum size or anything of that
> type). With no swap space turned on, this write() returns ENOMEM.
>
> With 512MB or 1GB of swap space, it still returns ENOMEM. With 1.5GB of
> swap space, the write() succeeds. However, this is a sparse file with a
> total of 6 bytes of content :-)
>
> I could understand if tmpfs was limiting the file size to half of
> physical RAM+swap, but the test succeeds at 2.5GB total even though the
> sparse file is created at 2GB size.
>
> For now I work around the test failure by pointing glibc to a different
> filesystem for this test, but I'm wondering why the tmpfs filesystem
> can't pass this test like a "normal" filesystem does...

Drat. Thank you for your efforts to track this down and describe it.
I'd call it a bug, a regression from 2.4, rather than an undocumented
limitation (generous of you to allow that interpretation). Though not
a very urgent one to fix, given you're the first to notice in 18 months.

It's a side-effect of the non-overcommit memory mode (from 2.4-ac)
added in 2.5.30. That was supposed not to change behaviour so long as
/proc/sys/vm/overcommit_memory remained at its traditional default 0.
But the extra vm_enough_memory checks needed for mode 2 (here the test
in shmem_file_write) have inadvertently imposed this limitation on mode 0.

A workaround is to "echo 1 > /proc/sys/vm/overcommit_memory" (or use
the VM_OVERCOMMIT_MEMORY sysctl), to skip all such tests. But I don't
pretend that's a decent answer - especially not since vm_enough_memory
became a security_operations function, which may take no interest in
sysctl_overcommit_memory setting.

The difficulty is that conflicting conventions collide here in tmpfs.
In the case of shm and mmap, it's normal to check the full extent of
the mapping when it's set up (because the only way out later is OOM
killing); whereas in the case of a filesystem, it's normal to allow
sparseness and allocate only when written (though mmap of any sparse
file is an old contentious problem: what to do when no space?).

At present, the non-overcommit-memory arithmetic in mm/shmem.c works
simply by filesize. You can imagine an alternative accounting method
for the tmpfs mounts, which follows the actual page allocation (as
it already does to enforce its half(-or-whatever%)-of-memory limit).
But that gets more complicated once you mmap the tmpfs file, the two
conventions have to be reconciled in a consistent way (and it would
make a nonsense of strict non-overcommit memory mode to fall back
on the excuse that other filesystems have a sparse mmap problem).

I ought to fix this, but I'm averse to complexity. I'll mull over
the options before fixing it: please don't hold your breath.

Hugh

2004-01-30 17:08:13

by Kevin P. Fleming

[permalink] [raw]
Subject: Re: tmpfs sparse file failure in glibc "make check"

Hugh Dickins wrote:

> I ought to fix this, but I'm averse to complexity. I'll mull over
> the options before fixing it: please don't hold your breath.

No problem, as I said I have a workaround that causes me no pain. It
seems that the use of tmpfs for both a traditional filesystem _and_
shmem is what's the root of this problem, what is the real advantage of
both functions being performed by the same code?

2004-01-30 17:47:36

by Hugh Dickins

[permalink] [raw]
Subject: Re: tmpfs sparse file failure in glibc "make check"

On Fri, 30 Jan 2004, Kevin P. Fleming wrote:
>
> No problem, as I said I have a workaround that causes me no pain. It
> seems that the use of tmpfs for both a traditional filesystem _and_
> shmem is what's the root of this problem, what is the real advantage of
> both functions being performed by the same code?

Fair suggestion, but I don't actually agree that is the root of it.

The (peculiar but predating Linux) semantics of mmap shared writable
on /dev/zero demands that we have something very like a filesystem
handling something very like shared memory: from there on it makes
a lot of sense to have the same code supporting all this.

My accusing finger points in different directions at different moments,
one reason I want to mull it over. I might say the problem is that
tmpfs struggles to save memory by combining two otherwise distinct
layers (mapping pages and backingstore pages), and many difficulties
spring from that (all the swap/file swizzling). I might say the
problem is that the non-overcommit memory stuff is just too simplistic.
I might say the problem is that mmap of a sparse file is ill-defined
when the backingstore fills up.

Hugh