2002-10-22 14:14:05

by Jan Kasprzak

[permalink] [raw]
Subject: 2.4.20-pre11 /proc/partitions read

Hello, world!\n

while trying to figure out why my "vgchange -a y" sometimes works
and sometimes does not, I've come to the following problem:

# dd if=/proc/partitions bs=512|wc -l
1+1 records in
1+1 records out
12

# dd if=/proc/partitions bs=128k|wc -l
0+1 records in
0+1 records out
32


I.e. if you read the /proc/partitions in single read() call,
it gets read OK. However, if you read() with smaller-sized blocks,
you get the truncated contents.

Are applications expected to read the whole /proc file
in one read()?

-Yenya

--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/ Czech Linux Homepage: http://www.linux.cz/ |
|-- If you start doing things because you hate others and want to screw --|
|-- them over the end result is bad. --Linus Torvalds to the BBC News --|


2002-10-22 15:02:56

by Richard B. Johnson

[permalink] [raw]
Subject: Re: 2.4.20-pre11 /proc/partitions read

On Tue, 22 Oct 2002, Jan Kasprzak wrote:

> Hello, world!\n
>
> while trying to figure out why my "vgchange -a y" sometimes works
> and sometimes does not, I've come to the following problem:
>
> # dd if=/proc/partitions bs=512|wc -l
> 1+1 records in
> 1+1 records out
> 12
>
> # dd if=/proc/partitions bs=128k|wc -l
> 0+1 records in
> 0+1 records out
> 32
>
>
> I.e. if you read the /proc/partitions in single read() call,
> it gets read OK. However, if you read() with smaller-sized blocks,
> you get the truncated contents.
>
> Are applications expected to read the whole /proc file
> in one read()?
>
> -Yenya
>

Well yes, sorta. The proc file-system is a compromise. You can
`cat` it and `more` it, but anything that uses `lseek` will
fail in strange ways.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-10-22 16:08:43

by Jan Kasprzak

[permalink] [raw]
Subject: Re: 2.4.20-pre11 /proc/partitions read

Richard B. Johnson wrote:
:
: > # dd if=/proc/partitions bs=512|wc -l
: > 1+1 records in
: > 1+1 records out
: > 12
: >
: > # dd if=/proc/partitions bs=128k|wc -l
: > 0+1 records in
: > 0+1 records out
: > 32
:
: Well yes, sorta. The proc file-system is a compromise. You can
: `cat` it and `more` it, but anything that uses `lseek` will
: fail in strange ways.

I hope dd(1) does not use lseek() :-) The question is
whether the application should supply a big enough buffer to read(2)
or whether it is possible to read(2) in more smaller chunks.

-Y.

--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/ Czech Linux Homepage: http://www.linux.cz/ |
|-- If you start doing things because you hate others and want to screw --|
|-- them over the end result is bad. --Linus Torvalds to the BBC News --|

2002-10-22 18:34:31

by Andries Brouwer

[permalink] [raw]
Subject: Re: 2.4.20-pre11 /proc/partitions read

On Tue, Oct 22, 2002 at 04:19:57PM +0200, Jan Kasprzak wrote:

> I.e. if you read the /proc/partitions in single read() call,
> it gets read OK. However, if you read() with smaller-sized blocks,
> you get the truncated contents.

Having statistics in /proc/partitions leads to such problems.
Make sure you do not ask for them.

--- Documentation/Configure.help~ Mon Oct 14 01:12:13 2002
+++ Documentation/Configure.help Tue Oct 22 20:30:39 2002
@@ -561,6 +561,8 @@

This is required for the full functionality of sar(8) and interesting
if you want to do performance tuning, by tweaking the elevator, e.g.
+ On the other hand, it will cause random and mysterious failures for
+ fdisk, mount and other programs reading /proc/partitions.

If unsure, say N.


(this is about CONFIG_BLK_STATS).

Andries


[I still do not understand how hch can want to add this cruft to
/proc/partitions, and how marcelo can accept it.
If some vendor made this mistake, why force it on the rest of
the world? It is bad for RedHat users, and worse for all others.]

2002-10-22 18:39:14

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 2.4.20-pre11 /proc/partitions read

On Tue, Oct 22, 2002 at 08:40:34PM +0200, Andries Brouwer wrote:
> On Tue, Oct 22, 2002 at 04:19:57PM +0200, Jan Kasprzak wrote:
>
> > I.e. if you read the /proc/partitions in single read() call,
> > it gets read OK. However, if you read() with smaller-sized blocks,
> > you get the truncated contents.
>
> Having statistics in /proc/partitions leads to such problems.
> Make sure you do not ask for them.

Andries,

have you actually CHECKED whether he has it enabled?

I rather suspect it's the following bug (introduce by me, but not
depend on CONFIG_BLK_STATS):

--- 1.23/drivers/block/genhd.c Wed Aug 21 10:03:48 2002
+++ edited/drivers/block/genhd.c Tue Oct 22 20:43:16 2002
@@ -155,13 +155,14 @@

#ifdef CONFIG_PROC_FS
/* iterator */
-static void *part_start(struct seq_file *s, loff_t *pos)
+static void *part_start(struct seq_file *s, loff_t *ppos)
{
struct gendisk *gp;
+ loff_t pos = *ppos;

read_lock(&gendisk_lock);
for (gp = gendisk_head; gp; gp = gp->next)
- if (!*pos--)
+ if (!pos--)
return gp;
return NULL;
}

2002-10-22 18:42:58

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: 2.4.20-pre11 /proc/partitions read



On Tue, 22 Oct 2002, Andries Brouwer wrote:

> On Tue, Oct 22, 2002 at 04:19:57PM +0200, Jan Kasprzak wrote:
>
> > I.e. if you read the /proc/partitions in single read() call,
> > it gets read OK. However, if you read() with smaller-sized blocks,
> > you get the truncated contents.
>
> Having statistics in /proc/partitions leads to such problems.
> Make sure you do not ask for them.
>
> --- Documentation/Configure.help~ Mon Oct 14 01:12:13 2002
> +++ Documentation/Configure.help Tue Oct 22 20:30:39 2002
> @@ -561,6 +561,8 @@
>
> This is required for the full functionality of sar(8) and interesting
> if you want to do performance tuning, by tweaking the elevator, e.g.
> + On the other hand, it will cause random and mysterious failures for
> + fdisk, mount and other programs reading /proc/partitions.
>
> If unsure, say N.
>
>
> (this is about CONFIG_BLK_STATS).
>
> Andries
>
>
> [I still do not understand how hch can want to add this cruft to
> /proc/partitions, and how marcelo can accept it.
> If some vendor made this mistake, why force it on the rest of
> the world? It is bad for RedHat users, and worse for all others.]

Its not forced behaviour. Its a config option and its defaulted to off.

Some people want it.

2002-10-22 18:53:53

by Andries Brouwer

[permalink] [raw]
Subject: Re: 2.4.20-pre11 /proc/partitions read

On Tue, Oct 22, 2002 at 07:45:14PM +0100, Christoph Hellwig wrote:

> Andries, have you actually CHECKED whether he has it enabled?

No. I do not claim that his problem was caused by the stats.
It is just that I get reports from people with mysterious mount
and fdisk problems that go away when CONFIG_BLK_STATS is disabled.
And requests from RedHat to put ugly patches into mount to
tell stdio to use a larger buffer, increasing the probability that
all is read in one go.


> I rather suspect it's the following bug (introduce by me, but not
> depend on CONFIG_BLK_STATS):

Good!

So the very reproducible problem is solved, and only the
sporadic random problem is left.

I still hope that you will remove it again.

Andries

2002-10-22 18:57:38

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: 2.4.20-pre11 /proc/partitions read



On Tue, 22 Oct 2002, Andries Brouwer wrote:

> On Tue, Oct 22, 2002 at 07:45:14PM +0100, Christoph Hellwig wrote:
>
> > Andries, have you actually CHECKED whether he has it enabled?
>
> No. I do not claim that his problem was caused by the stats.
> It is just that I get reports from people with mysterious mount
> and fdisk problems that go away when CONFIG_BLK_STATS is disabled.

Could you forward me these reports, please?

Thats really bad.

> And requests from RedHat to put ugly patches into mount to
> tell stdio to use a larger buffer, increasing the probability that
> all is read in one go.
>
>
> > I rather suspect it's the following bug (introduce by me, but not
> > depend on CONFIG_BLK_STATS):
>
> Good!
>
> So the very reproducible problem is solved, and only the
> sporadic random problem is left.
>
> I still hope that you will remove it again.

2002-10-22 19:27:32

by Andries Brouwer

[permalink] [raw]
Subject: Re: 2.4.20-pre11 /proc/partitions read

On Tue, Oct 22, 2002 at 04:26:35PM -0200, Marcelo Tosatti wrote:

> > No. I do not claim that his problem was caused by the stats.
> > It is just that I get reports from people with mysterious mount
> > and fdisk problems that go away when CONFIG_BLK_STATS is disabled.
>
> Could you forward?
>
> Thats really bad.

The best reference is

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=35980

with fsck affected.

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=62414

shows that mount is affected.

Andries

2002-10-22 19:29:04

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 2.4.20-pre11 /proc/partitions read

On Tue, Oct 22, 2002 at 09:32:26PM +0200, Andries Brouwer wrote:
> On Tue, Oct 22, 2002 at 04:26:35PM -0200, Marcelo Tosatti wrote:
>
> > > No. I do not claim that his problem was caused by the stats.
> > > It is just that I get reports from people with mysterious mount
> > > and fdisk problems that go away when CONFIG_BLK_STATS is disabled.
> >
> > Could you forward?
> >
> > Thats really bad.
>
> The best reference is
>
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=35980
>
> with fsck affected.
>
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=62414
>
> shows that mount is affected.

Both of those should be fixed by my patch, i.e. were caused by a bug
in fpos handling in the seq_file /proc/partions. There is nothing
about the statistics in them.

2002-10-22 19:52:06

by Andries Brouwer

[permalink] [raw]
Subject: Re: 2.4.20-pre11 /proc/partitions read

On Tue, Oct 22, 2002 at 08:35:04PM +0100, Christoph Hellwig wrote:

> Both of those should be fixed by my patch, i.e. were caused by a bug
> in fpos handling in the seq_file /proc/partitions. There is nothing
> about the statistics in them.

True. I quoted these references (i) because they are readily available,
and (ii) because they show problems with the file changing contents while
it is being read.
These two references predate your 2.4.19 patch - they are about a
Redhat-private kernel that already had these disk statistics.

The default 1K buffer was not large enough. Some utilities have now
been patched to tell stdio to use a 16K buffer. We won't have to wait
long before also that will turn out to be insufficient.

It is bad that one has to patch mount and the ext2 utilities and fdisk
and I don't know what other programs because some irrelevant (to mount etc.)
and changing stuff was added to /proc/partitions.

Andries

2002-10-22 20:11:12

by Alan

[permalink] [raw]
Subject: Re: 2.4.20-pre11 /proc/partitions read

On Tue, 2002-10-22 at 20:58, Andries Brouwer wrote:
> The default 1K buffer was not large enough. Some utilities have now
> been patched to tell stdio to use a 16K buffer. We won't have to wait
> long before also that will turn out to be insufficient.
>
> It is bad that one has to patch mount and the ext2 utilities and fdisk
> and I don't know what other programs because some irrelevant (to mount etc.)
> and changing stuff was added to /proc/partitions.

There are two items to dela with

#1 A bug - which Christoph fixed
#2 A question about field lengths varying - which is true if the fstab
is updated for other reasons but more likely now. Fixable by using fixed
width fields

2002-10-23 08:34:05

by Jan Kasprzak

[permalink] [raw]
Subject: Re: 2.4.20-pre11 /proc/partitions read

Christoph Hellwig wrote:
: Both of those should be fixed by my patch, i.e. were caused by a bug
: in fpos handling in the seq_file /proc/partions. There is nothing
: about the statistics in them.

The problem is that without statistics, the /proc/partitions
contents is a *lot* smaller, so it probably fits in a single 1024-byte read()
syscall. So having stats in /proc/partitions only increases the probability
of this problem. I'll try your patch and let you know then.

-Y.

--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/ Czech Linux Homepage: http://www.linux.cz/ |
|-- If you start doing things because you hate others and want to screw --|
|-- them over the end result is bad. --Linus Torvalds to the BBC News --|

2002-10-23 08:30:40

by Jan Kasprzak

[permalink] [raw]
Subject: Re: 2.4.20-pre11 /proc/partitions read

Marcelo Tosatti wrote:
:
:
: On Tue, 22 Oct 2002, Andries Brouwer wrote:
:
: > On Tue, Oct 22, 2002 at 04:19:57PM +0200, Jan Kasprzak wrote:
: >
: > > I.e. if you read the /proc/partitions in single read() call,
: > > it gets read OK. However, if you read() with smaller-sized blocks,
: > > you get the truncated contents.
: >
: > Having statistics in /proc/partitions leads to such problems.
: > Make sure you do not ask for them.

Yes I have CONFIG_BLK_STATS (and I need this to know
when my server gets overloaded).
:
: Its not forced behaviour. Its a config option and its defaulted to off.
:
: Some people want it.

Yes.

Maybe it should be documented that you have to read it
in a single read() syscall with big enough buffer.

-Yenya

--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/ Czech Linux Homepage: http://www.linux.cz/ |
|-- If you start doing things because you hate others and want to screw --|
|-- them over the end result is bad. --Linus Torvalds to the BBC News --|

2002-10-23 09:08:24

by Jan Kasprzak

[permalink] [raw]
Subject: Re: 2.4.20-pre11 /proc/partitions read

Christoph Hellwig wrote:
: I rather suspect it's the following bug (introduce by me, but not
: depend on CONFIG_BLK_STATS):
:
: --- 1.23/drivers/block/genhd.c Wed Aug 21 10:03:48 2002
: +++ edited/drivers/block/genhd.c Tue Oct 22 20:43:16 2002
: @@ -155,13 +155,14 @@
:
: #ifdef CONFIG_PROC_FS
: /* iterator */
: -static void *part_start(struct seq_file *s, loff_t *pos)
: +static void *part_start(struct seq_file *s, loff_t *ppos)
: {
: struct gendisk *gp;
: + loff_t pos = *ppos;
:
: read_lock(&gendisk_lock);
: for (gp = gendisk_head; gp; gp = gp->next)
: - if (!*pos--)
: + if (!pos--)
: return gp;
: return NULL;
: }

Yes, this patch fixes my problem. Thanks!

-Yenya

--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/ Czech Linux Homepage: http://www.linux.cz/ |
|-- If you start doing things because you hate others and want to screw --|
|-- them over the end result is bad. --Linus Torvalds to the BBC News --|