2007-01-12 12:15:14

by Philipp Beyer

[permalink] [raw]
Subject: ieee1394 feature needed: overwrite SPLIT_TIMEOUT from userspace

Hi,

I'm investigating an unwanted behaviour of our firewire devices in
connection with the ieee1394 kernel module.

The problem is caused by a non standard-conform behaviour of our
devices. Anyway, changes on the device-side dont seem to be the
best solution, so I'm looking for a workaround in terms of a
kernel patch.

The problem:
Our devices exceed the SPLIT_TIMEOUT for write requests in some
situations, where write accesses to the devices flash memory are
triggered. The SPLIT_TIMEOUT could be adjusted as it's part of the
CSR layout, but the longest interval possible is 8 seconds. We need
a substantial longer interval to assure failure-free operation.
(the maximum timeout needed may be around 120 seconds)

The presumed solution:
These long timeouts are only needed in a few rare situations like
writing user presets to flash or firmware updates. As far as I've
examined the kernel code it would be the best thing to have a
function (ioctl?) accessible from userspace that overwrites the
stored SPLIT_TIMEOUT for a certain connected device. This way
there should not be any interferences in case of normal operation.
Until (rare) write accesses to the flash memory are performed, a
reasonable short timeout could be used.

Since I don't have any real experience in kernel hacking yet,
this should be interpreted as a feature request at first:
If the described feature is easy to implement I would appreciate
if someone could do this.

Otherwise I'm confident that I'm able to write a patch on my own.
In this case the critical part would be to meet the standards
of the kernel community, since we would like to have the patch
included in the mainline.

Therefore I'm also interested in any kind of advices about how
to realize an appropriate patch.

Thanks,

Philipp Beyer

Software Development
Allied Vision Technologies


2007-01-13 18:15:55

by Stefan Richter

[permalink] [raw]
Subject: Re: ieee1394 feature needed: overwrite SPLIT_TIMEOUT from userspace

(full quote for linux1394-devel)

On 12 Jan, Philipp Beyer wrote to linux-kernel:
> Hi,
>
> I'm investigating an unwanted behaviour of our firewire devices in
> connection with the ieee1394 kernel module.
>
> The problem is caused by a non standard-conform behaviour of our
> devices. Anyway, changes on the device-side dont seem to be the
> best solution, so I'm looking for a workaround in terms of a
> kernel patch.
>
> The problem:
> Our devices exceed the SPLIT_TIMEOUT for write requests in some
> situations, where write accesses to the devices flash memory are
> triggered.

There are certainly a number of ways to implement this in your
device in a conforming way. For example, if it is too costly to
avoid the transaction timeout, you could add a register to your
device to be polled by the requester after it initiated a lengthy
operation. The extra register would become responsive when the
operation finished and could even show whether the operation
succeeded.

But then, why not support lengthier timeouts in Linux if it can be
done with minimum overhead.

> The SPLIT_TIMEOUT could be adjusted as it's part of the
> CSR layout, but the longest interval possible is 8 seconds. We need
> a substantial longer interval to assure failure-free operation.
> (the maximum timeout needed may be around 120 seconds)
>
> The presumed solution:
> These long timeouts are only needed in a few rare situations like
> writing user presets to flash or firmware updates. As far as I've
> examined the kernel code it would be the best thing to have a
> function (ioctl?) accessible from userspace that overwrites the
> stored SPLIT_TIMEOUT for a certain connected device. This way
> there should not be any interferences in case of normal operation.
> Until (rare) write accesses to the flash memory are performed, a
> reasonable short timeout could be used.

I have an alternative suggestion:

- Keep a global timeout for split transactions for all nodes.
Tracking different timeouts per node would add significant code
footprint.
- Control the timeout like before via a write request to the
SPLIT_TIMEOUT CSR.
- Allow the local node to write a nonstandard value of >7 to
SPLIT_TIMEOUT_HI. This would not be compliant with IEEE 1394(a)
but at least with IEEE 1212.

This suggestion may fall short if a bus manager is present. Also,
I have concerns to add such a non-conforming feature to mainline.
(Not that our drivers were fully compliant to the spec now or that
100% by-the-book behavior would be desirable in the first place...)

> Since I don't have any real experience in kernel hacking yet,
> this should be interpreted as a feature request at first:
> If the described feature is easy to implement I would appreciate
> if someone could do this.

I could post a patch which works as I outlined if it fits your
requirements.

> Otherwise I'm confident that I'm able to write a patch on my own.
> In this case the critical part would be to meet the standards
> of the kernel community, since we would like to have the patch
> included in the mainline.
>
> Therefore I'm also interested in any kind of advices about how
> to realize an appropriate patch.
>
> Thanks,
>
> Philipp Beyer
>
> Software Development
> Allied Vision Technologies

See Documentation/SubmittingPatches in the Linux kernel source
distribution for advice on code submission.
--
Stefan Richter
-=====-=-=== ---= -==-=
http://arcgraph.de/sr/

2007-01-15 13:20:27

by Philipp Beyer

[permalink] [raw]
Subject: Re: ieee1394 feature needed: overwrite SPLIT_TIMEOUT from userspace


Thanks for your input. My post was based on the (wrong) idea that
the kernel already uses different timeout values per node.

Therefore, having read your answer, I have a different opinion about
how to solve this now.

About your suggestions:
Unfortunately sending an early response and using a secondary register
as indication for completed flash writes doesnt work. In short, the
device isn't able to process packets while writing to flash and an early
answer followed by a period of non-responsiveness might lead to problems
on the windows side.

Also I dont like the idea of having such a big timeout for every bus
transaction. In case of 'normal' operation the device runs fine with
a standard timeout value.



I will now try to work around this problem in userspace basically by
ignoring the timeout error. The correct transmission of the write
request will already be confirmed by the acknowledge packet, after all.


Philipp Beyer


2007-01-15 14:06:25

by Stefan Richter

[permalink] [raw]
Subject: Re: ieee1394 feature needed: overwrite SPLIT_TIMEOUT from userspace

Philipp Beyer wrote:
> Thanks for your input. My post was based on the (wrong) idea that
> the kernel already uses different timeout values per node.
>
> Therefore, having read your answer, I have a different opinion about
> how to solve this now.
>
> About your suggestions:
> Unfortunately sending an early response and using a secondary register
> as indication for completed flash writes doesnt work. In short, the
> device isn't able to process packets while writing to flash and an early
> answer followed by a period of non-responsiveness might lead to problems
> on the windows side.

You don't need the early response if you would have the extra register.
(The config ROM instead of a register might work too.)

01) PC: sends write request (or whatever)
02) Dev: starts to process request, doesn't send response
03) PC: transaction times out
04) PC: now knows that either the request failed or the Dev is now busy
05) PC: sends read request to special register or to config ROM
<if Dev ready goto 08>
06) Dev: doesn't respond or even ack
07) PC: transaction times out
<goto 05>
08) Dev: responds to read request
09) PC: now assumes or knows that the request 01 was actually successful

If you used a special register, you can show by the contents of the read
response whether the action started in 02 was successful or failed.

> Also I dont like the idea of having such a big timeout for every bus
> transaction.

Me neither.

> In case of 'normal' operation the device runs fine with
> a standard timeout value.

If you don't want or can implement something like the above protocol, we
could patch the ieee1394 driver for non-standard time-outs. Then a
possible protocol could look like this:

0a) PC: reads current SPLIT_TIMEOUT on local node
0b) PC: writes large SPLIT_TIMEOUT to local node
0c) PC: sends write request (or whatever) to Dev
0d) Dev: starts to process request
<aeons pass>
0e) Dev: sends response
0f) PC: restores previous SPLIT_TIMEOUT on local node

As mentioned in the other post, a bus manager could interfere with the
manipulation of the SPLIT_TIMEOUT value. A bus manager is meant to
ensure that all nodes have the same SPLIT_TIMEOUT. I don't remember if
the spec also says when and how a bus manager is supposed to do this.
But this is irrelevant if there is only your device and the Linux PC.

> I will now try to work around this problem in userspace basically by
> ignoring the timeout error.

I.e. something similar but less sophisticated than the protocol 01...09?

> The correct transmission of the write
> request will already be confirmed by the acknowledge packet, after all.

In case of a split transaction, the ack indeed confirms proper reception
of the request. In case of a unified transaction, it is even possible to
encapsulate a write response in the ack, i.e. to complete the
transaction with the ack to the request. Needless to say, unified
transactions require special support by the link layer hardware on the
responder's side.
--
Stefan Richter
-=====-=-=== ---= -====
http://arcgraph.de/sr/

2007-01-15 14:53:45

by Philipp Beyer

[permalink] [raw]
Subject: Re: ieee1394 feature needed: overwrite SPLIT_TIMEOUT from userspace


> > I will now try to work around this problem in userspace basically by
> > ignoring the timeout error.
>
> I.e. something similar but less sophisticated than the protocol 01...09?
>

In fact steps 01-09 describe what I had in mind pretty well :-)

2007-01-15 18:30:26

by Stefan Richter

[permalink] [raw]
Subject: Re: ieee1394 feature needed: overwrite SPLIT_TIMEOUT from userspace

On 15 Jan, Philipp Beyer wrote:
> In fact steps 01-09 describe what I had in mind pretty well :-)

Of course there are many more possibilities. In case of SBP-2 for
example, there are
- huge command + data buffers on the PC (or a 3rd node),
- a status register on the PC,
- a notification register on the device (actually some more registers
but that's not important).
The PC writes to the notification register of the device to inform it
about new a command and data. The device then fetches the command and,
depending on the command, fetches or sends data from/to the buffers on
the PC. When it is finished, it writes status to the PC's status
register. Between notification and status write, the device's fetch
agent may read/write commands and data in any order and with as many
transactions it likes.

Of course this isn't exactly the simplest protocol for your kind of
application.

Anyway. Do you still want a patch like the following one? Should apply
to 2.6.20-rcX but you should have no difficulties to adapt it to 2.6.19
or older. (The first hunk won't apply before 2.6.20-rc1.)


From: Stefan Richter <[email protected]>
Subject: ieee1394: allow nonstandard SPLIT_TIMEOUT

Relax the rule that SPLIT_TIMEOUT_HI cannot be bigger than 7 (seconds).
Longer split transaction timeouts are outside the IEEE 1394 spec but may
be useful with custom hardware. Now we allow up to 255 (seconds, plus
the subseconds part in SPLIT_TIMEOUT_LO) but only if the local node was
writing to SPLIT_TIMEOUT_HI.

Any nonstandard contents of SPLIT_TIMEOUT_HI will be shown as-is to any
node which reads the register, e.g. a bus manager.

Signed-off-by: Stefan Richter <[email protected]>
---

Only compile-tested.


Index: linux/drivers/ieee1394/csr.c
===================================================================
--- linux.orig/drivers/ieee1394/csr.c
+++ linux/drivers/ieee1394/csr.c
@@ -158,7 +158,7 @@ static void host_reset(struct hpsb_host
*/
static inline void calculate_expire(struct csr_control *csr)
{
- unsigned int usecs = (csr->split_timeout_hi & 7) * 1000000 +
+ unsigned int usecs = csr->split_timeout_hi * 1000000 +
(csr->split_timeout_lo >> 19) * 125;

csr->expire = usecs_to_jiffies(usecs > 100000 ? usecs : 100000);
@@ -480,8 +480,11 @@ static int write_regs(struct hpsb_host *
return RCODE_ADDRESS_ERROR;

case CSR_SPLIT_TIMEOUT_HI:
- host->csr.split_timeout_hi =
- be32_to_cpu(*(data++)) & 0x00000007;
+ /* We allow the local node to write up to 255 seconds as a
+ * hack for custom purposes. This is OK with IEEE 1212 but
+ * IEEE 1394 allows only 7 here. */
+ host->csr.split_timeout_hi = be32_to_cpu(*(data++)) &
+ (nodeid == host->node_id ? 0x000000ff : 0x00000007);
calculate_expire(&host->csr);
out;
case CSR_SPLIT_TIMEOUT_LO:



--
Stefan Richter
-=====-=-=== ---= -====
http://arcgraph.de/sr/


2007-01-15 18:54:56

by Kristian H. Kristensen

[permalink] [raw]
Subject: Re: ieee1394 feature needed: overwrite SPLIT_TIMEOUT from userspace

On 1/15/07, Philipp Beyer <[email protected]> wrote:
>
> Thanks for your input. My post was based on the (wrong) idea that
> the kernel already uses different timeout values per node.
>
> Therefore, having read your answer, I have a different opinion about
> how to solve this now.
>
> About your suggestions:
> Unfortunately sending an early response and using a secondary register
> as indication for completed flash writes doesnt work. In short, the
> device isn't able to process packets while writing to flash and an early
> answer followed by a period of non-responsiveness might lead to problems
> on the windows side.
>
> Also I dont like the idea of having such a big timeout for every bus
> transaction. In case of 'normal' operation the device runs fine with
> a standard timeout value.

I read the thread briefly, so I may be off here, but another solution
is to implement an FCP-style protocol. That is, instead of trying to
cram a long operation into the SPLIT_TIMEOUT window, just use two
write transactions to device specific address areas: one for the
request from the PC and one for the response from the device. Or you
could even use a vendor specific FCP frame.

If you use unified transactions (i.e. your devices sends ACK_COMPLETE
when it receives the write) it doesn't even generate more traffic on
the bus. And since the device will send the response write request
once it has completed programming the flash, it doesn't need to
respond to packets while it is programming. But even if the write
transactions themselves are split transactions, it is still a low
overheads solution to your problem that avoids messing with
SPLIT_TIMEOUT.

cheers,
Kristian