2002-10-01 12:37:49

by Eitan Ben-Nun

[permalink] [raw]
Subject: Adpter card read old memory value

This seems like a cache coherency problem:
An adapter card on the pci bus send a message to pc i386 Linux to update a memory address.
Then it reads the address and sees an old value, even though the pc cpu have performed an update to this memory address.

Environment:
SKA4 - (koa board) with two Intel Pentium III Xeon processors, running Linux, kernel 2.4.18-5 SMP.
The Server has 1Giga Byte physical memory, but Linux operating system is aware of 512M only.
In lilo.conf it indicates to use only 512M (from lilo.conf: append="mem=512M, nousb").
The upper 512M is managed by specialized software,
and accessed from kernel using __io_remap(phsy_addr, PAGE_SIZE, _PAGE_PWT | PAGE_PCD) function.

I would have preferred not to have this memory as write through, and define this physical memory as snoop able, for PCI reads, at the memory bus controller. But i havn't figured out how to do that.

What done so for to debug this problem:
0. How did we verified this is the problem.
The adapter perform a read to this address, and save the value. It sends an update message to the pc. After an update reply message arrives it reads the value again and see the old value. The adapter developer have verified no caching is done at the adapter side.

1. Testing:
Run the machine with one cpu, and two cpu and we saw the problem.
Run the machine without cache: Cache was disabled at machine bios and we didn't see the problem.

2. Code changes:
At the beginning we used only ioremap, then we moved to ioremap_nocache, then to __ioremap with the above flags.
We have verified the adapter and pc has the same physical addresses, we have verified the pc actually performed to update to the virtual address which Linux allocated for the physical address the adapter is reading from.

3. investigations:
Searched and read items at the web: I've want over lkml FAQ, I've search the net for item regarding cache coherency problem, I read all the items on the linux-smp mailing list, I've reviewed the kernel code of _ioremap, vmalloc, get_pte etc..., I've read the three chapters in understanding the Linux kernel 2, 6, 13, and other books some on the web such as Linux kernel hacking, I've also searched and read at /usr/src/Documentation to seek help there, as you can see I'm pretty much desperate about this issue.


Regards,
Eitan.




2002-10-01 12:48:32

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Adpter card read old memory value

On Tue, 2002-10-01 at 14:43, Eitan Ben-Nun wrote:
> This seems like a cache coherency problem:
> An adapter card on the pci bus send a message to pc i386 Linux to update a memory address.
> Then it reads the address and sees an old value, even though the pc cpu have performed an update to this memory address.

uhm your report is missing a pointer to the source of the driver so
nobody can help you by looking at what's going on....


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2002-10-01 14:40:32

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Adpter card read old memory value


On Tue, Oct 01, 2002 at 05:44:07PM +0300, Eitan Ben-Nun wrote:
> ok thanks,
> An adapter card on the pci bus send a message to pc i386 Linux to update a memory address.
> Then it reads the address and sees an old value, even though the pc cpu have performed an update to this memory address. Here is referance to my driver code:
> int update_cluster_operation_mode(unsigned long new_mode,
> unsigned long phys_addr);
> {
> unsigned long* vir_addr = 0;
> vir_addr = __ioremap(phys_addr, PAGE_SIZE, _PAGE_PWT | _PAGE_PCD);
> *vir_addr = new_mode;
> return 0;
> }
> phys_addr - is always on page bonderies and the address is between 512M-640M.

this is only a part of the source, is there a full source available?

also your code is already buggy; you should use writel(); and also read up
on PCI posting....

Greetings,
Arjan van de Ven