2005-09-21 07:37:38

by David Sanchez

[permalink] [raw]
Subject: How to Force PIO mode on sata promise (Linux 2.6.10)

Hi,

I'm using the Linux kernel 2.6.10 on a DBAU1550 and I would like to
force PIO mode (and thus disable DMA) on my sata promise TX2.
How can I do that ?

Thanks

David


2005-09-21 11:47:10

by Jeff Garzik

[permalink] [raw]
Subject: Re: How to Force PIO mode on sata promise (Linux 2.6.10)

David Sanchez wrote:
> I'm using the Linux kernel 2.6.10 on a DBAU1550 and I would like to
> force PIO mode (and thus disable DMA) on my sata promise TX2.
> How can I do that ?

Why do you wish to do this?

Jeff



2005-09-21 12:30:44

by David Sanchez

[permalink] [raw]
Subject: RE: How to Force PIO mode on sata promise (Linux 2.6.10)

Hi,

I'm using the linux kernel 2.6.10 and busybox on an AMD db AU1550 with a hdd connected to the pata port of a PCI card (Promise PDC20579).

I've got file copy corruption on big files (around 500MB). I made some investigations and I found that sometimes the read(fd,4096) function returns unexpected data and sometimes after a write(fd,4096) function, the data on disk is not the data send in the write() function...

Maybe a hw bug ? A hw or sw DMA transfer bug ? or anything else... I need to known what happens so I try everything...

Any idea on my problem Jeff ?

David

Note: I've connected a hdd through the ide interface in PIO mode and my problem seems to not appear. But when I enable DMA, I've got a lot of error messages. In other word, the DMA doesn't work (as it is wrote in the HELP of the option in the kernel) with the ide interface.

-----Message d'origine-----
De?: Jeff Garzik [mailto:[email protected]]
Envoy??: mercredi 21 septembre 2005 14:01
??: David Sanchez
Cc?: [email protected]
Objet?: Re: How to Force PIO mode on sata promise (Linux 2.6.10)

David Sanchez wrote:
> I'm using the Linux kernel 2.6.10 on a DBAU1550 and I would like to
> force PIO mode (and thus disable DMA) on my sata promise TX2.
> How can I do that ?

Why do you wish to do this?

Jeff








2005-09-21 14:02:52

by Clemens Koller

[permalink] [raw]
Subject: Re: How to Force PIO mode on sata promise (Linux 2.6.10)

Hello, David, Jeff!

> I'm using the linux kernel 2.6.10 and busybox on an AMD db AU1550 with a
> hdd connected to the pata port of a PCI card (Promise PDC20579).

I'm using a PDC20269 (on Promise Ultra133TX2 card) w/ DMA disabled on
an embedded ppc (MPC8540). I cannot use DMA, too.
I guess it's a PCI IRQ/REQ/GNT problem on my board.
With DMA disabled I haven't had any problems on my system.
I am going to use a PDC20275 in the future.

My .config of a linux-2.6.13-rc7 is attached.

And... I know other guys with similar problems on embedded environments.
Unfortunately I wasn't able to have a closer look at this problem yet.

Best greets,

--
Clemens Koller
_______________________________
R&D Imaging Devices
Anagramm GmbH
Rupert-Mayer-Str. 45/1
81379 Muenchen
Germany

http://www.anagramm.de
Phone: +49-89-741518-50
Fax: +49-89-741518-19


Attachments:
.config.gz (4.97 kB)

2005-09-21 14:51:14

by David Sanchez

[permalink] [raw]
Subject: RE: How to Force PIO mode on sata promise (Linux 2.6.10)

Hi Clemens,

How did you disabled the DMA ?

David

-----Message d'origine-----
De?: Clemens Koller [mailto:[email protected]]
Envoy??: mercredi 21 septembre 2005 16:15
??: David Sanchez
Cc?: Jeff Garzik; [email protected]
Objet?: Re: How to Force PIO mode on sata promise (Linux 2.6.10)

Hello, David, Jeff!

> I'm using the linux kernel 2.6.10 and busybox on an AMD db AU1550 with a
> hdd connected to the pata port of a PCI card (Promise PDC20579).

I'm using a PDC20269 (on Promise Ultra133TX2 card) w/ DMA disabled on
an embedded ppc (MPC8540). I cannot use DMA, too.
I guess it's a PCI IRQ/REQ/GNT problem on my board.
With DMA disabled I haven't had any problems on my system.
I am going to use a PDC20275 in the future.

My .config of a linux-2.6.13-rc7 is attached.

And... I know other guys with similar problems on embedded environments.
Unfortunately I wasn't able to have a closer look at this problem yet.

Best greets,

--
Clemens Koller
_______________________________
R&D Imaging Devices
Anagramm GmbH
Rupert-Mayer-Str. 45/1
81379 Muenchen
Germany

http://www.anagramm.de
Phone: +49-89-741518-50
Fax: +49-89-741518-19


2005-09-22 02:57:30

by Chris Wedgwood

[permalink] [raw]
Subject: Re: How to Force PIO mode on sata promise (Linux 2.6.10)

On Wed, Sep 21, 2005 at 02:28:02PM +0200, David Sanchez wrote:

> I'm using the linux kernel 2.6.10 and busybox on an AMD db AU1550
> with a hdd connected to the pata port of a PCI card (Promise
> PDC20579).

Disable prefetch in lib/memcpy.S and see if that helps.

2005-09-22 08:06:12

by David Sanchez

[permalink] [raw]
Subject: RE: How to Force PIO mode on sata promise (Linux 2.6.10)

Hi Chris,
It was a good idea but it doesn't resolve the problem...

I add my card into the dma_black_list of the libata to force DMA disabled and the problem seems to no more appear...maybe PIO is so slow that the data has no time to be corrupted...
But I can NOT affirm that the problem is the DMA.

I try the linux kernel 2.4,2.6.11, 2.6.12 and 2.6.13. More I try 2 different toolchains and the problem persists...

Another idea??

Thanks,

David

-----Message d'origine-----
De?: [email protected] [mailto:[email protected]] De la part de Chris Wedgwood
Envoy??: jeudi 22 septembre 2005 05:00
??: David Sanchez
Cc?: Jeff Garzik; [email protected]
Objet?: Re: How to Force PIO mode on sata promise (Linux 2.6.10)

On Wed, Sep 21, 2005 at 02:28:02PM +0200, David Sanchez wrote:

> I'm using the linux kernel 2.6.10 and busybox on an AMD db AU1550
> with a hdd connected to the pata port of a PCI card (Promise
> PDC20579).

Disable prefetch in lib/memcpy.S and see if that helps.

2005-09-22 09:03:57

by Clemens Koller

[permalink] [raw]
Subject: Re: How to Force PIO mode on sata promise (Linux 2.6.10)

Hello, David!

David Sanchez wrote:
> How did you disabled the DMA ?

Have you seen my .config file?
(Remember that it's all PATA I'm talking about, YMMV!)

I think it is
CONFIG_IDEDMA_PCI_AUTO=n
while
CONFIG_BLK_DEV_IDEDMA_PCI=y
And I am using the
BLK_DEV_PDC202XX_NEW=y

One of the options there made it working...
It's pretty lame (<10MByte/sec) but thats
fine for now (just a test system).

The HDD is a a Maxtor DiamondMax 10, 120GB
(6B120P0), usually pretty fast.

Best greets,
--
Clemens Koller
_______________________________
R&D Imaging Devices
Anagramm GmbH
Rupert-Mayer-Str. 45/1
81379 Muenchen
Germany

http://www.anagramm.de
Phone: +49-89-741518-50
Fax: +49-89-741518-19

2005-09-22 16:37:10

by Bill Davidsen

[permalink] [raw]
Subject: Re: How to Force PIO mode on sata promise (Linux 2.6.10)

David Sanchez wrote:
> Hi Chris,
> It was a good idea but it doesn't resolve the problem...
>
> I add my card into the dma_black_list of the libata to force DMA disabled and the problem seems to no more appear...maybe PIO is so slow that the data has no time to be corrupted...
> But I can NOT affirm that the problem is the DMA.
>
> I try the linux kernel 2.4,2.6.11, 2.6.12 and 2.6.13. More I try 2 different toolchains and the problem persists...
>
> Another idea??

You disable DMA and the problem goes away, that seems to point to DMA as
the problem, and since others aren't having the same problem with the
DMA code in the kernel it certainly suggests the DMA hardware really is
the problem, or is causing it. It's remotely possible that the kernel
has a bug in the code just for your controller, although unlikely.

Have you run memtest86 on your memory? I have seen cases where memory
was marginal and using CPU and DMA was enough to cause it to fail, but
only when people had played with the memory timing in the BIOS. I
suspect the SATA disk has a higher datarate than anything else in your
system, so it could trigger some marginal cases in the memory system.
Unlikely, but possible.

ideas:
test your memory if you haven't
Check that your BIOS is up-to-date
check your BIOS memory timing settings
See if the controller came with tools to adjust DMA timing
try writing to a PATA drive while reading from the SATA
to generate high DMA rate, look for corruption there
try another controller of the same type
try another controler of another supported type
to see that it's not your drive
accept that your controller *belongs* on the blacklist

Since you asked for ideas, these are probably in order of easy to difficult.

--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

2005-09-22 16:46:03

by Chris Wedgwood

[permalink] [raw]
Subject: Re: How to Force PIO mode on sata promise (Linux 2.6.10)

On Thu, Sep 22, 2005 at 10:03:10AM +0200, David Sanchez wrote:

> I add my card into the dma_black_list of the libata to force DMA
> disabled and the problem seems to no more appear...maybe PIO is so
> slow that the data has no time to be corrupted... But I can NOT
> affirm that the problem is the DMA.

> I try the linux kernel 2.4,2.6.11, 2.6.12 and 2.6.13. More I try 2
> different toolchains and the problem persists...

I doubt any of those will help. It sounds like a driver bug wrt to
non-coherent PCI. Presumably this will also hit sparc people if they
use this hardware and some PPC too.

Does the driver use the *dma_sync* API(s)? You might want to read
over Documentation/DMA-API.txt and see if the driver is doing the
right things.

2005-09-23 08:37:22

by David Sanchez

[permalink] [raw]
Subject: RE: How to Force PIO mode on sata promise (Linux 2.6.10)

Hi Bill,

I've reduced the UDMA level step by step until the problem seems disappeared.
Finally with UDMA/25 I don't detect error after 1000 copies. I consider that this solution corrects the symptom but not the cause...

I try memtest for mips arch and all is ok.
I try 3 promise controllers: the problem was NOT resolved.
I try more than 4 kind of disk: the problem was NOT resolved.
I try to connect the disk on the pata port and on the sata port: the problem was NOT resolved.
More when I connect the disk to the IDE interface, it doesn't work at all (maybe bad driver implementation)...

David

My embedded system:
* AMD Development Board AU1550 (mips32) + hdd connected to the pata port of the Promise PDC20579 controller.
* Kernel2.6.10 + libata patch + busybox 1.0


-----Message d'origine-----
De?: [email protected] [mailto:[email protected]] De la part de Bill Davidsen
Envoy??: jeudi 22 septembre 2005 18:45
??: David Sanchez
Cc?: Jeff Garzik; [email protected]
Objet?: Re: How to Force PIO mode on sata promise (Linux 2.6.10)

David Sanchez wrote:
> Hi Chris,
> It was a good idea but it doesn't resolve the problem...
>
> I add my card into the dma_black_list of the libata to force DMA disabled and the problem seems to no more appear...maybe PIO is so slow that the data has no time to be corrupted...
> But I can NOT affirm that the problem is the DMA.
>
> I try the linux kernel 2.4,2.6.11, 2.6.12 and 2.6.13. More I try 2 different toolchains and the problem persists...
>
> Another idea??

You disable DMA and the problem goes away, that seems to point to DMA as
the problem, and since others aren't having the same problem with the
DMA code in the kernel it certainly suggests the DMA hardware really is
the problem, or is causing it. It's remotely possible that the kernel
has a bug in the code just for your controller, although unlikely.

Have you run memtest86 on your memory? I have seen cases where memory
was marginal and using CPU and DMA was enough to cause it to fail, but
only when people had played with the memory timing in the BIOS. I
suspect the SATA disk has a higher datarate than anything else in your
system, so it could trigger some marginal cases in the memory system.
Unlikely, but possible.

ideas:
test your memory if you haven't
Check that your BIOS is up-to-date
check your BIOS memory timing settings
See if the controller came with tools to adjust DMA timing
try writing to a PATA drive while reading from the SATA
to generate high DMA rate, look for corruption there
try another controller of the same type
try another controler of another supported type
to see that it's not your drive
accept that your controller *belongs* on the blacklist

Since you asked for ideas, these are probably in order of easy to difficult.

--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

2005-09-23 09:47:50

by Clemens Koller

[permalink] [raw]
Subject: Re: How to Force PIO mode on sata promise (Linux 2.6.10)

Hello, David!

David Sanchez wrote:
> I've reduced the UDMA level step by step until the problem seems disappeared.
> Finally with UDMA/25 I don't detect error after 1000 copies. I consider that this solution corrects the symptom but not the cause...

Hmm, no luck for you, today?
Have you possibilities to check the signal integrity in your design?
Did you play some of the electrical engineering tricks to see if things
are changing? (change some terminations, temperature, voltages)

> My embedded system:
> * AMD Development Board AU1550 (mips32) + hdd connected to the pata port of the Promise PDC20579 controller.
> * Kernel2.6.10 + libata patch + busybox 1.0

Well, what about getting a datasheet of the PDC, one of the latest kernels and
start to debug that thing down to the silicon?

There is a person called Ed Huang (Sales and Marketing Div.) at Promise
in Taiwan where we got our datasheets and reference designs for our embedded
project. Unfortunately, you need to sign a NDA. But beside that, the support
is pretty okay. (I just send you the contact in private email).

Best greets,

--
Clemens Koller
_______________________________
R&D Imaging Devices
Anagramm GmbH
Rupert-Mayer-Str. 45/1
81379 Muenchen
Germany

http://www.anagramm.de
Phone: +49-89-741518-50
Fax: +49-89-741518-19

2005-09-23 10:02:48

by Eric Dumazet

[permalink] [raw]
Subject: [PATCH] reduce sizeof(struct file)

--- linux-2.6.14-rc2/include/linux/fs.h 2005-09-20 05:00:41.000000000 +0200
+++ linux-2.6.14-rc2-ed2/include/linux/fs.h 2005-09-23 11:22:19.000000000 +0200
@@ -574,7 +574,16 @@
#define RA_FLAG_INCACHE 0x02 /* file is already in cache */

struct file {
- struct list_head f_list;
+/*
+ * f_list and f_rcuhead can share the same memory location
+ */
+ union {
+ struct list_head uf_f_list;
+ struct rcu_head uf_f_rcuhead;
+ } uf;
+#define f_list uf.uf_f_list
+#define f_rcuhead uf.uf_f_rcuhead
+
struct dentry *f_dentry;
struct vfsmount *f_vfsmnt;
struct file_operations *f_op;
@@ -598,7 +607,6 @@
spinlock_t f_ep_lock;
#endif /* #ifdef CONFIG_EPOLL */
struct address_space *f_mapping;
- struct rcu_head f_rcuhead;
};
extern spinlock_t files_lock;
#define file_list_lock() spin_lock(&files_lock);
--- linux-2.6.14-rc2/include/net/ip_vs.h 2005-09-20 05:00:41.000000000 +0200
+++ linux-2.6.14-rc2-ed2/include/net/ip_vs.h 2005-09-23 11:46:55.000000000 +0200
@@ -548,7 +548,7 @@
*/
struct ip_vs_service {
struct list_head s_list; /* for normal service table */
- struct list_head f_list; /* for fwmark-based service table */
+ struct list_head ipvs_f_list; /* for fwmark-based service table */
atomic_t refcnt; /* reference counter */
atomic_t usecnt; /* use counter */

--- linux-2.6.14-rc2/net/ipv4/ipvs/ip_vs_ctl.c 2005-09-20 05:00:41.000000000 +0200
+++ linux-2.6.14-rc2-ed2/net/ipv4/ipvs/ip_vs_ctl.c 2005-09-23 11:47:30.000000000 +0200
@@ -322,7 +322,7 @@
* Hash it by fwmark in ip_vs_svc_fwm_table
*/
hash = ip_vs_svc_fwm_hashkey(svc->fwmark);
- list_add(&svc->f_list, &ip_vs_svc_fwm_table[hash]);
+ list_add(&svc->ipvs_f_list, &ip_vs_svc_fwm_table[hash]);
}

svc->flags |= IP_VS_SVC_F_HASHED;
@@ -349,7 +349,7 @@
list_del(&svc->s_list);
} else {
/* Remove it from the ip_vs_svc_fwm_table table */
- list_del(&svc->f_list);
+ list_del(&svc->ipvs_f_list);
}

svc->flags &= ~IP_VS_SVC_F_HASHED;
@@ -395,7 +395,7 @@
/* Check for fwmark addressed entries */
hash = ip_vs_svc_fwm_hashkey(fwmark);

- list_for_each_entry(svc, &ip_vs_svc_fwm_table[hash], f_list) {
+ list_for_each_entry(svc, &ip_vs_svc_fwm_table[hash], ipvs_f_list) {
if (svc->fwmark == fwmark) {
/* HIT */
atomic_inc(&svc->usecnt);
@@ -1297,7 +1297,7 @@
*/
for(idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++) {
list_for_each_entry_safe(svc, nxt,
- &ip_vs_svc_fwm_table[idx], f_list) {
+ &ip_vs_svc_fwm_table[idx], ipvs_f_list) {
write_lock_bh(&__ip_vs_svc_lock);
ip_vs_svc_unhash(svc);
/*
@@ -1341,7 +1341,7 @@
}

for(idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++) {
- list_for_each_entry(svc, &ip_vs_svc_fwm_table[idx], f_list) {
+ list_for_each_entry(svc, &ip_vs_svc_fwm_table[idx], ipvs_f_list) {
ip_vs_zero_service(svc);
}
}
@@ -1666,7 +1666,7 @@

/* keep looking in fwmark */
for (idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++) {
- list_for_each_entry(svc, &ip_vs_svc_fwm_table[idx], f_list) {
+ list_for_each_entry(svc, &ip_vs_svc_fwm_table[idx], ipvs_f_list) {
if (pos-- == 0) {
iter->table = ip_vs_svc_fwm_table;
iter->bucket = idx;
@@ -1718,13 +1718,13 @@
}

/* next service in hashed by fwmark */
- if ((e = svc->f_list.next) != &ip_vs_svc_fwm_table[iter->bucket])
- return list_entry(e, struct ip_vs_service, f_list);
+ if ((e = svc->ipvs_f_list.next) != &ip_vs_svc_fwm_table[iter->bucket])
+ return list_entry(e, struct ip_vs_service, ipvs_f_list);

scan_fwmark:
while (++iter->bucket < IP_VS_SVC_TAB_SIZE) {
list_for_each_entry(svc, &ip_vs_svc_fwm_table[iter->bucket],
- f_list)
+ ipvs_f_list)
return svc;
}

@@ -2095,7 +2095,7 @@
}

for (idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++) {
- list_for_each_entry(svc, &ip_vs_svc_fwm_table[idx], f_list) {
+ list_for_each_entry(svc, &ip_vs_svc_fwm_table[idx], ipvs_f_list) {
if (count >= get->num_services)
goto out;
memset(&entry, 0, sizeof(entry));


Attachments:
shrink_struct_file (3.89 kB)

2005-09-23 10:05:50

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] reduce sizeof(struct file)

On Fri, Sep 23, 2005 at 12:02:18PM +0200, Eric Dumazet wrote:
> Hi all
>
> Now that RCU applied on 'struct file' seems stable, we can place f_rcuhead
> in a memory location that is not anymore used at call_rcu(&f->f_rcuhead,
> file_free_rcu) time, to reduce the size of this critical kernel object.
>
> The trick I used is to move f_rcuhead and f_list in an union and defining
> macros to access f_list and f_rcuhead
>
> Unfortunatly f_list was also used in IPVS so I had to change
> include/net/ip_vs.h and net/ipv4/ipvs/ip_vs_ctl.c, changing their f_list to
> ipvs_f_list to avoid name clash.
>
> (This is why I send this mail to IPVS maintainers)

Please just change all callers to use the union, there's not very many
of them.

2005-09-23 23:30:08

by J.A. Magallon

[permalink] [raw]
Subject: Re: [PATCH] reduce sizeof(struct file)

On Fri, 23 Sep 2005 11:05:41 +0100, Christoph Hellwig <[email protected]> wrote:

> On Fri, Sep 23, 2005 at 12:02:18PM +0200, Eric Dumazet wrote:
> > Hi all
> >
> > Now that RCU applied on 'struct file' seems stable, we can place f_rcuhead
> > in a memory location that is not anymore used at call_rcu(&f->f_rcuhead,
> > file_free_rcu) time, to reduce the size of this critical kernel object.
> >
> > The trick I used is to move f_rcuhead and f_list in an union and defining
> > macros to access f_list and f_rcuhead
> >
> > Unfortunatly f_list was also used in IPVS so I had to change
> > include/net/ip_vs.h and net/ipv4/ipvs/ip_vs_ctl.c, changing their f_list to
> > ipvs_f_list to avoid name clash.
> >
> > (This is why I send this mail to IPVS maintainers)
>
> Please just change all callers to use the union, there's not very many
> of them.
>

How about anonymous unions ? gcc-3.3.3 and above support them.
Is 2.6 supposed to be built with 2.95 ?

--
J.A. Magallon <jamagallon()able!es> \ Software is like sex:
werewolf!able!es \ It's better when it's free
Mandriva Linux release 2006.0 (2006 rc2) for i586
Linux 2.6.13-jam6 (gcc 4.0.1 (4.0.1-5mdk for Mandriva Linux release 2006.0))


Attachments:
(No filename) (189.00 B)

2005-09-24 00:09:16

by Jesper Juhl

[permalink] [raw]
Subject: Re: [PATCH] reduce sizeof(struct file)

On 9/24/05, J.A. Magallon <[email protected]> wrote:
> On Fri, 23 Sep 2005 11:05:41 +0100, Christoph Hellwig <[email protected]> wrote:
>
> > On Fri, Sep 23, 2005 at 12:02:18PM +0200, Eric Dumazet wrote:
> > > Hi all
> > >
> > > Now that RCU applied on 'struct file' seems stable, we can place f_rcuhead
> > > in a memory location that is not anymore used at call_rcu(&f->f_rcuhead,
> > > file_free_rcu) time, to reduce the size of this critical kernel object.
> > >
> > > The trick I used is to move f_rcuhead and f_list in an union and defining
> > > macros to access f_list and f_rcuhead
> > >
> > > Unfortunatly f_list was also used in IPVS so I had to change
> > > include/net/ip_vs.h and net/ipv4/ipvs/ip_vs_ctl.c, changing their f_list to
> > > ipvs_f_list to avoid name clash.
> > >
> > > (This is why I send this mail to IPVS maintainers)
> >
> > Please just change all callers to use the union, there's not very many
> > of them.
> >
>
> How about anonymous unions ? gcc-3.3.3 and above support them.
> Is 2.6 supposed to be built with 2.95 ?
>
Yes, 2.6.x is expected to build with gcc 2.95.3+

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2005-09-24 00:19:59

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] reduce sizeof(struct file)

On Sat, Sep 24, 2005 at 01:30:21AM +0200, J.A. Magallon wrote:
> How about anonymous unions ? gcc-3.3.3 and above support them.
> Is 2.6 supposed to be built with 2.95 ?

NAK. For one thing, yes, it is supposed to be built with 2.95. For
another, they do not buy you anything - few enough places use either
field, so there is no point in hiding that union.

2005-09-24 03:43:10

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] reduce sizeof(struct file)

--- linux-2.6.14-rc2-orig/include/linux/fs.h 2005-09-20 05:00:41.000000000 +0200
+++ linux-2.6.14-rc2/include/linux/fs.h 2005-09-24 04:52:20.000000000 +0200
@@ -574,7 +574,13 @@
#define RA_FLAG_INCACHE 0x02 /* file is already in cache */

struct file {
- struct list_head f_list;
+/*
+ * f_list and f_rcuhead can share the same memory location
+ */
+ union {
+ struct list_head fu_list;
+ struct rcu_head fu_rcuhead;
+ } f_u;
struct dentry *f_dentry;
struct vfsmount *f_vfsmnt;
struct file_operations *f_op;
@@ -598,7 +604,6 @@
spinlock_t f_ep_lock;
#endif /* #ifdef CONFIG_EPOLL */
struct address_space *f_mapping;
- struct rcu_head f_rcuhead;
};
extern spinlock_t files_lock;
#define file_list_lock() spin_lock(&files_lock);
--- linux-2.6.14-rc2-orig/fs/file_table.c 2005-09-20 05:00:41.000000000 +0200
+++ linux-2.6.14-rc2/fs/file_table.c 2005-09-24 05:02:35.000000000 +0200
@@ -56,13 +56,13 @@

static inline void file_free_rcu(struct rcu_head *head)
{
- struct file *f = container_of(head, struct file, f_rcuhead);
+ struct file *f = container_of(head, struct file, f_u.fu_rcuhead);
kmem_cache_free(filp_cachep, f);
}

static inline void file_free(struct file *f)
{
- call_rcu(&f->f_rcuhead, file_free_rcu);
+ call_rcu(&f->f_u.fu_rcuhead, file_free_rcu);
}

/* Find an unused file structure and return a pointer to it.
@@ -95,7 +95,7 @@
f->f_gid = current->fsgid;
rwlock_init(&f->f_owner.lock);
/* f->f_version: 0 */
- INIT_LIST_HEAD(&f->f_list);
+ INIT_LIST_HEAD(&f->f_u.fu_list);
return f;

over:
@@ -225,15 +225,15 @@
if (!list)
return;
file_list_lock();
- list_move(&file->f_list, list);
+ list_move(&file->f_u.fu_list, list);
file_list_unlock();
}

void file_kill(struct file *file)
{
- if (!list_empty(&file->f_list)) {
+ if (!list_empty(&file->f_u.fu_list)) {
file_list_lock();
- list_del_init(&file->f_list);
+ list_del_init(&file->f_u.fu_list);
file_list_unlock();
}
}
@@ -245,7 +245,7 @@
/* Check that no files are currently opened for writing. */
file_list_lock();
list_for_each(p, &sb->s_files) {
- struct file *file = list_entry(p, struct file, f_list);
+ struct file *file = list_entry(p, struct file, f_u.fu_list);
struct inode *inode = file->f_dentry->d_inode;

/* File with pending delete? */
--- linux-2.6.14-rc2-orig/drivers/char/tty_io.c 2005-09-20 05:00:41.000000000 +0200
+++ linux-2.6.14-rc2/drivers/char/tty_io.c 2005-09-24 05:02:35.000000000 +0200
@@ -809,7 +809,7 @@
check_tty_count(tty, "do_tty_hangup");
file_list_lock();
/* This breaks for file handles being sent over AF_UNIX sockets ? */
- list_for_each_entry(filp, &tty->tty_files, f_list) {
+ list_for_each_entry(filp, &tty->tty_files, f_u.fu_list) {
if (filp->f_op->write == redirected_tty_write)
cons_filp = filp;
if (filp->f_op->write != tty_write)
--- linux-2.6.14-rc2-orig/fs/dquot.c 2005-09-20 05:00:41.000000000 +0200
+++ linux-2.6.14-rc2/fs/dquot.c 2005-09-24 05:02:35.000000000 +0200
@@ -662,7 +662,7 @@
restart:
file_list_lock();
list_for_each(p, &sb->s_files) {
- struct file *filp = list_entry(p, struct file, f_list);
+ struct file *filp = list_entry(p, struct file, f_u.fu_list);
struct inode *inode = filp->f_dentry->d_inode;
if (filp->f_mode & FMODE_WRITE && dqinit_needed(inode, type)) {
struct dentry *dentry = dget(filp->f_dentry);
--- linux-2.6.14-rc2-orig/fs/proc/generic.c 2005-09-20 05:00:41.000000000 +0200
+++ linux-2.6.14-rc2/fs/proc/generic.c 2005-09-24 05:02:35.000000000 +0200
@@ -533,7 +533,7 @@
*/
file_list_lock();
list_for_each(p, &sb->s_files) {
- struct file * filp = list_entry(p, struct file, f_list);
+ struct file * filp = list_entry(p, struct file, f_u.fu_list);
struct dentry * dentry = filp->f_dentry;
struct inode * inode;
struct file_operations *fops;
--- linux-2.6.14-rc2-orig/fs/super.c 2005-09-20 05:00:41.000000000 +0200
+++ linux-2.6.14-rc2/fs/super.c 2005-09-24 05:02:35.000000000 +0200
@@ -513,7 +513,7 @@
struct file *f;

file_list_lock();
- list_for_each_entry(f, &sb->s_files, f_list) {
+ list_for_each_entry(f, &sb->s_files, f_u.fu_list) {
if (S_ISREG(f->f_dentry->d_inode->i_mode) && file_count(f))
f->f_mode &= ~FMODE_WRITE;
}
--- linux-2.6.14-rc2-orig/security/selinux/hooks.c 2005-09-20 05:00:41.000000000 +0200
+++ linux-2.6.14-rc2/security/selinux/hooks.c 2005-09-24 05:02:35.000000000 +0200
@@ -1599,7 +1599,7 @@

if (tty) {
file_list_lock();
- file = list_entry(tty->tty_files.next, typeof(*file), f_list);
+ file = list_entry(tty->tty_files.next, typeof(*file), f_u.fu_list);
if (file) {
/* Revalidate access to controlling tty.
Use inode_has_perm on the tty inode directly rather
--- linux-2.6.14-rc2-orig/security/selinux/selinuxfs.c 2005-09-20 05:00:41.000000000 +0200
+++ linux-2.6.14-rc2/security/selinux/selinuxfs.c 2005-09-24 05:02:35.000000000 +0200
@@ -924,7 +924,7 @@

file_list_lock();
list_for_each(p, &sb->s_files) {
- struct file * filp = list_entry(p, struct file, f_list);
+ struct file * filp = list_entry(p, struct file, f_u.fu_list);
struct dentry * dentry = filp->f_dentry;

if (dentry->d_parent != de) {


Attachments:
shrink_struct_file (5.05 kB)

2005-09-25 01:43:57

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] reduce sizeof(struct file)

Eric Dumazet wrote:

> --- linux-2.6.14-rc2-orig/include/linux/fs.h 2005-09-20 05:00:41.000000000 +0200
> +++ linux-2.6.14-rc2/include/linux/fs.h 2005-09-24 04:52:20.000000000 +0200
> @@ -574,7 +574,13 @@
> #define RA_FLAG_INCACHE 0x02 /* file is already in cache */
>
> struct file {
> - struct list_head f_list;
> +/*
> + * f_list and f_rcuhead can share the same memory location
> + */
> + union {
> + struct list_head fu_list;
> + struct rcu_head fu_rcuhead;
> + } f_u;

I don't think you need to explain this basic C semantic to
the reader if they have gotten this far ;)

Instead, explain when fu_list and fu_rcuhead are used.
Something along the lines of

/*
* fu_list becomes invalid after file_free is called
* and queued via fu_rcuhead for RCU freeing
*/

Nick

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com

2005-09-30 00:31:58

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] reduce sizeof(struct file)

On Sat, Sep 24, 2005 at 05:43:05AM +0200, Eric Dumazet wrote:
> Christoph Hellwig a ?crit :
> >
> >Please just change all callers to use the union, there's not very many
> >of them.
>
> Yes it's better, thanks Christoph.
>
> What about this version then ?

After a momentary panic attack where I was worried that f_list might
be accessed by one CPU while another was sending the same struct file
to call_rcu(), I realized that all accesses to f_list do file_list_lock()
first, thus preventing any other CPU from doing call_rcu() concurrently
on that struct file.

So it looks OK to me.

But you did have me going there for a bit! ;-)

Thanx, Paul

> Hi all
>
> Now that RCU applied on 'struct file' seems stable, we can place f_rcuhead
> in a memory location that is not anymore used at call_rcu(&f->f_rcuhead,
> file_free_rcu) time, to reduce the size of this critical kernel object.
>
> The trick I used is to move f_rcuhead and f_list in an union called f_u
>
> The callers are changed so that f_rcuhead becomes f_u.fu_rcuhead and f_list
> becomes f_u.f_list
>
> Tested on allyesconfig, diffed against 2.6.14-rc2
>
> drivers/char/tty_io.c | 2 +-
> fs/dquot.c | 2 +-
> fs/file_table.c | 14 +++++++-------
> fs/proc/generic.c | 2 +-
> fs/super.c | 2 +-
> include/linux/fs.h | 9 +++++++--
> security/selinux/hooks.c | 2 +-
> security/selinux/selinuxfs.c | 2 +-
> 8 files changed, 20 insertions(+), 15 deletions(-)
>
>
> Thank you
>
> Signed-off-by: Eric Dumazet <[email protected]>
>

> --- linux-2.6.14-rc2-orig/include/linux/fs.h 2005-09-20 05:00:41.000000000 +0200
> +++ linux-2.6.14-rc2/include/linux/fs.h 2005-09-24 04:52:20.000000000 +0200
> @@ -574,7 +574,13 @@
> #define RA_FLAG_INCACHE 0x02 /* file is already in cache */
>
> struct file {
> - struct list_head f_list;
> +/*
> + * f_list and f_rcuhead can share the same memory location
> + */
> + union {
> + struct list_head fu_list;
> + struct rcu_head fu_rcuhead;
> + } f_u;
> struct dentry *f_dentry;
> struct vfsmount *f_vfsmnt;
> struct file_operations *f_op;
> @@ -598,7 +604,6 @@
> spinlock_t f_ep_lock;
> #endif /* #ifdef CONFIG_EPOLL */
> struct address_space *f_mapping;
> - struct rcu_head f_rcuhead;
> };
> extern spinlock_t files_lock;
> #define file_list_lock() spin_lock(&files_lock);
> --- linux-2.6.14-rc2-orig/fs/file_table.c 2005-09-20 05:00:41.000000000 +0200
> +++ linux-2.6.14-rc2/fs/file_table.c 2005-09-24 05:02:35.000000000 +0200
> @@ -56,13 +56,13 @@
>
> static inline void file_free_rcu(struct rcu_head *head)
> {
> - struct file *f = container_of(head, struct file, f_rcuhead);
> + struct file *f = container_of(head, struct file, f_u.fu_rcuhead);
> kmem_cache_free(filp_cachep, f);
> }
>
> static inline void file_free(struct file *f)
> {
> - call_rcu(&f->f_rcuhead, file_free_rcu);
> + call_rcu(&f->f_u.fu_rcuhead, file_free_rcu);
> }
>
> /* Find an unused file structure and return a pointer to it.
> @@ -95,7 +95,7 @@
> f->f_gid = current->fsgid;
> rwlock_init(&f->f_owner.lock);
> /* f->f_version: 0 */
> - INIT_LIST_HEAD(&f->f_list);
> + INIT_LIST_HEAD(&f->f_u.fu_list);
> return f;
>
> over:
> @@ -225,15 +225,15 @@
> if (!list)
> return;
> file_list_lock();
> - list_move(&file->f_list, list);
> + list_move(&file->f_u.fu_list, list);
> file_list_unlock();
> }
>
> void file_kill(struct file *file)
> {
> - if (!list_empty(&file->f_list)) {
> + if (!list_empty(&file->f_u.fu_list)) {
> file_list_lock();
> - list_del_init(&file->f_list);
> + list_del_init(&file->f_u.fu_list);
> file_list_unlock();
> }
> }
> @@ -245,7 +245,7 @@
> /* Check that no files are currently opened for writing. */
> file_list_lock();
> list_for_each(p, &sb->s_files) {
> - struct file *file = list_entry(p, struct file, f_list);
> + struct file *file = list_entry(p, struct file, f_u.fu_list);
> struct inode *inode = file->f_dentry->d_inode;
>
> /* File with pending delete? */
> --- linux-2.6.14-rc2-orig/drivers/char/tty_io.c 2005-09-20 05:00:41.000000000 +0200
> +++ linux-2.6.14-rc2/drivers/char/tty_io.c 2005-09-24 05:02:35.000000000 +0200
> @@ -809,7 +809,7 @@
> check_tty_count(tty, "do_tty_hangup");
> file_list_lock();
> /* This breaks for file handles being sent over AF_UNIX sockets ? */
> - list_for_each_entry(filp, &tty->tty_files, f_list) {
> + list_for_each_entry(filp, &tty->tty_files, f_u.fu_list) {
> if (filp->f_op->write == redirected_tty_write)
> cons_filp = filp;
> if (filp->f_op->write != tty_write)
> --- linux-2.6.14-rc2-orig/fs/dquot.c 2005-09-20 05:00:41.000000000 +0200
> +++ linux-2.6.14-rc2/fs/dquot.c 2005-09-24 05:02:35.000000000 +0200
> @@ -662,7 +662,7 @@
> restart:
> file_list_lock();
> list_for_each(p, &sb->s_files) {
> - struct file *filp = list_entry(p, struct file, f_list);
> + struct file *filp = list_entry(p, struct file, f_u.fu_list);
> struct inode *inode = filp->f_dentry->d_inode;
> if (filp->f_mode & FMODE_WRITE && dqinit_needed(inode, type)) {
> struct dentry *dentry = dget(filp->f_dentry);
> --- linux-2.6.14-rc2-orig/fs/proc/generic.c 2005-09-20 05:00:41.000000000 +0200
> +++ linux-2.6.14-rc2/fs/proc/generic.c 2005-09-24 05:02:35.000000000 +0200
> @@ -533,7 +533,7 @@
> */
> file_list_lock();
> list_for_each(p, &sb->s_files) {
> - struct file * filp = list_entry(p, struct file, f_list);
> + struct file * filp = list_entry(p, struct file, f_u.fu_list);
> struct dentry * dentry = filp->f_dentry;
> struct inode * inode;
> struct file_operations *fops;
> --- linux-2.6.14-rc2-orig/fs/super.c 2005-09-20 05:00:41.000000000 +0200
> +++ linux-2.6.14-rc2/fs/super.c 2005-09-24 05:02:35.000000000 +0200
> @@ -513,7 +513,7 @@
> struct file *f;
>
> file_list_lock();
> - list_for_each_entry(f, &sb->s_files, f_list) {
> + list_for_each_entry(f, &sb->s_files, f_u.fu_list) {
> if (S_ISREG(f->f_dentry->d_inode->i_mode) && file_count(f))
> f->f_mode &= ~FMODE_WRITE;
> }
> --- linux-2.6.14-rc2-orig/security/selinux/hooks.c 2005-09-20 05:00:41.000000000 +0200
> +++ linux-2.6.14-rc2/security/selinux/hooks.c 2005-09-24 05:02:35.000000000 +0200
> @@ -1599,7 +1599,7 @@
>
> if (tty) {
> file_list_lock();
> - file = list_entry(tty->tty_files.next, typeof(*file), f_list);
> + file = list_entry(tty->tty_files.next, typeof(*file), f_u.fu_list);
> if (file) {
> /* Revalidate access to controlling tty.
> Use inode_has_perm on the tty inode directly rather
> --- linux-2.6.14-rc2-orig/security/selinux/selinuxfs.c 2005-09-20 05:00:41.000000000 +0200
> +++ linux-2.6.14-rc2/security/selinux/selinuxfs.c 2005-09-24 05:02:35.000000000 +0200
> @@ -924,7 +924,7 @@
>
> file_list_lock();
> list_for_each(p, &sb->s_files) {
> - struct file * filp = list_entry(p, struct file, f_list);
> + struct file * filp = list_entry(p, struct file, f_u.fu_list);
> struct dentry * dentry = filp->f_dentry;
>
> if (dentry->d_parent != de) {