LinuxLists.cc - /proc/self/maps still not right in 2.6.10-mm3

2005-01-14 00:16:26

Subject: /proc/self/maps still not right in 2.6.10-mm3

Looks like there's another problem. If you read /proc/self/maps with
>4k chunks, the reads are always truncated to < 4k. However, at those
points, it misses an entry. If you read in smaller chunks, the results
look good.

For example (tm.c is attached):
$ ./tm &
$ dd bs=80 < /proc/$!/maps > 1
$ dd bs=100k < /proc/$!/maps > 2
$ diff -u 1 2
@@ -96,7 +96,6 @@
b7b43000-b7b44000 ---p b7b43000 00:00 0
b7b44000-b7b45000 r-xp b7b44000 00:00 0
b7b45000-b7b46000 ---p b7b45000 00:00 0
-b7b46000-b7b47000 r-xp b7b46000 00:00 0
b7b47000-b7b48000 ---p b7b47000 00:00 0
b7b48000-b7b49000 r-xp b7b48000 00:00 0
b7b49000-b7b4a000 ---p b7b49000 00:00 0
@@ -196,7 +195,6 @@
b7ba7000-b7ba8000 ---p b7ba7000 00:00 0
b7ba8000-b7ba9000 r-xp b7ba8000 00:00 0
b7ba9000-b7baa000 ---p b7ba9000 00:00 0
-b7baa000-b7bab000 r-xp b7baa000 00:00 0
b7bab000-b7bac000 ---p b7bab000 00:00 0
b7bac000-b7bad000 r-xp b7bac000 00:00 0
b7bad000-b7bae000 ---p b7bad000 00:00 0
[...]

Strace shows that the mapping at 0xb7b46000 is skipped because it
straddles the read boundary:
[...]
b7b3f000-b7b40000 ---p b7b3f000 00:00 0 \n
b7b40000-b7b41000 r-xp b7b40000 00:00 0 \n
b7b41000-b7b42000 ---p b7b41000 00:00 0 \n
b7b42000-b7b43000 r-xp b7b42000 00:00 0 \n
b7b43000-b7b44000 ---p b7b43000 00:00 0 \n
b7b44000-b7b45000 r-xp b7b44000 00:00 0 \n
b7b45000-b7b46000 ---p b7b45000 00:00 0 \n", 102400) = 4092
^^^^^^^^
read(0, "b7b47000-b7b48000 ---p b7b47000 00:00 0 \n
^^^^^^^^
b7b48000-b7b49000 r-xp b7b48000 00:00 0 \n
b7b49000-b7b4a000 ---p b7b49000 00:00 0 \n
b7b4a000-b7b4b000 r-xp b7b4a000 00:00 0 \n
[...]

J

Attachments:

tm.c (257.00 B)

2005-01-14 01:00:31

by Prasanna Meda

[permalink] [raw]

Subject: Re: /proc/self/maps still not right in 2.6.10-mm3

Jeremy Fitzhardinge wrote:

> Looks like there's another problem. If you read /proc/self/maps with
> >4k chunks, the reads are always truncated to < 4k. However, at those

Since m->buf starts at 4K, read truncation is expected. But the next
truncation should be at 8K, 16K and so on. We doube the buffer size.

>
> points, it misses an entry. If you read in smaller chunks, the results
> look good.

Yes, loosing record on truncation is bug. We now have no boundary
bugs in mmaps walking logic in our start, next or stop methods.

This is due to miscommunication between seq_file.c and our version
update logic.

m_show:show_maps:seq_printf()
int seq_printf(struct seq_file *m, const char *f, ...)
{
va_list args;
int len;

if (m->count < m->size) {
va_start(args, f);
len = vsnprintf(m->buf + m->count, m->size - m->count, f, args);
va_end(args);
if (m->count + len < m->size) {
m->count += len;
return 0;
}
}
m->count = m->size;
return -1;
}
It reads a partial data of the last record, and m->count becomes m->size.

Now in seq_read():
/* we need at least one record in buffer */
while (1) {
pos = m->index;
p = m->op->start(m, &pos);
err = PTR_ERR(p);
if (!p || IS_ERR(p))
break;
err = m->op->show(m, p);
if (err)
break;
if (m->count < m->size)
goto Fill;
m->op->stop(m, p);
kfree(m->buf);
m->buf = kmalloc(m->size <<= 1, GFP_KERNEL);
if (!m->buf)
goto Enomem;
m->count = 0;
}
m->op->stop(m, p);
m->count = 0;
goto Done;
Fill:
/* they want more? let's try to get some more */
while (m->count < size) {
size_t offs = m->count;
loff_t next = pos;
p = m->op->next(m, p, &next);
if (!p || IS_ERR(p)) {
err = PTR_ERR(p);
break;
}
err = m->op->show(m, p);
if (err || m->count == m->size) {
m->count = offs;
break;
}
pos = next;
}
m->op->stop(m, p);

It allocates the bigger buffer(8K) on 4k exhaustion.
It does not copy the partial record to user or to bigger
buffer and throws it away. But that is fine, since pos
is not updated. For us, it is a problem, since version
has already been updated.

One way of fixing this problem is, have 100 or 200
bytes of tail room in m->buf for overflows and handle
that like we had before moving to seq files.
This would be slightly better since, we do not throw
the data.

Other solution is, update f_version on successful
m_show, not on m_start or m_next. Slight variation
is reset the version on failed m_show. Since this does
not involve changes in seq_file, we go for this and
I will send you a patch.

Thanks,
Prasanna.