Hello everybody!
I hope you are keeping safe against Covid-19 a.k.a. Coronavirus!
Now my issue:
I have a BIG trouble having dataloss when using two internal serial
ports of my boards based on NXP/FreeScale iMX28 SoC ARMv5Te ARM920ej-s
architecture.
It runs at 454Mhz.
Kernel used 4.9.x
When using my test case unit software between two serial ports connect
each other by a null modem cable, it fails when the speed rate are
different, and dataloss is increasing higher the speed rate.
I suppose to have overruns (now I am modifying my software to check them
too), but I think it is due the way the ISR is called and all data are
passed to the uart circular buffer within the interrupt routine.
I am talking about the high latency from the IRQ up to the service
routine when flushing the FIFO and another IRQ is called by another uart
in the same time at different speed.
The code I was looking is: drivers/tty/serial/mxs-auart.c __but__ all
other serial drivers are acting in the same way: they are reading one
character at time from the FIFO (if it exists) and put it into the
circular buffer so serial/tty driver can pass them to the user read routine.
Each function call has some overhead and it is time-consuming, and if
another interrupt is invoked by the same UART Core but from another
serial port (different context) the continuos insertion done by hardware
UART into the FIFO cannot be served fast enough to have an overrun. I
think this can be applied __almost__ to every serial driver as they are
written in the same way.
And it is __NOT__ an issue because of the CPU and its speed! Using two
serial converter (FTDI and Prolific PL2303 based) on each board, the
problem does not appear at all even after 24 hours running at more than
115200!!!
It does work fine if I am using two different serial devices: one
internal uart (mxs-auart) and an external uart (ttyUSB).
So I can say it is related on how the harwdare is managing the interrupt
context and the FIFO/buffer small size.
Are those correct assuptions?
Will a shared FIQ driver over the UART solve the issue?
Regards,
--
Eurek s.r.l. |
Electronic Engineering | http://www.eurek.it
via Celletta 8/B, 40026 Imola, Italy | Phone: +39-(0)542-609120
p.iva 00690621206 - c.f. 04020030377 | Fax: +39-(0)542-609212
On Tue, Apr 07, 2020 at 09:30:21AM +0200, gianluca wrote:
> I have a BIG trouble having dataloss when using two internal serial ports of
> my boards based on NXP/FreeScale iMX28 SoC ARMv5Te ARM920ej-s architecture.
>
> It runs at 454Mhz.
>
> Kernel used 4.9.x
That's a very old kernel, you are going to have to get support for that
from the vendor you bought it from :(
> When using my test case unit software between two serial ports connect each
> other by a null modem cable, it fails when the speed rate are different,
Of course, how would that work?
> and
> dataloss is increasing higher the speed rate.
What type of flow control are you using?
> I suppose to have overruns (now I am modifying my software to check them
> too), but I think it is due the way the ISR is called and all data are
> passed to the uart circular buffer within the interrupt routine.
Are you using flow control?
> I am talking about the high latency from the IRQ up to the service routine
> when flushing the FIFO and another IRQ is called by another uart in the same
> time at different speed.
>
> The code I was looking is: drivers/tty/serial/mxs-auart.c __but__ all other
> serial drivers are acting in the same way: they are reading one character at
> time from the FIFO (if it exists) and put it into the circular buffer so
> serial/tty driver can pass them to the user read routine.
>
> Each function call has some overhead and it is time-consuming, and if
> another interrupt is invoked by the same UART Core but from another serial
> port (different context) the continuos insertion done by hardware UART into
> the FIFO cannot be served fast enough to have an overrun. I think this can
> be applied __almost__ to every serial driver as they are written in the same
> way.
>
> And it is __NOT__ an issue because of the CPU and its speed! Using two
> serial converter (FTDI and Prolific PL2303 based) on each board, the problem
> does not appear at all even after 24 hours running at more than 115200!!!
usb-serial devices are totally different and send data to the host in a
completly different way.
Your hardware might just not be able to handle really high baud rates at
a continous stream, what baud rate were you using?
And again, this is what flow control was designed for, please use it.
> It does work fine if I am using two different serial devices: one internal
> uart (mxs-auart) and an external uart (ttyUSB).
Again, different interrupt and protocols being used for the USB stuff.
thanks,
greg k-h
Hello,
I am very pleased the Mr. Greg Kroah-Hartman is writing to me in person!
I appreciate a lot sir!
On 04/07/2020 10:24 AM, Greg Kroah-Hartman wrote:
> On Tue, Apr 07, 2020 at 09:30:21AM +0200, gianluca wrote:
>> I have a BIG trouble having dataloss when using two internal serial ports of
>> my boards based on NXP/FreeScale iMX28 SoC ARMv5Te ARM920ej-s architecture.
>>
>> It runs at 454Mhz.
>>
>> Kernel used 4.9.x
>
> That's a very old kernel, you are going to have to get support for that
> from the vendor you bought it from :(
>
We are the vendor. ;-)
Jokes apart, I can try to use the latest kernel 5.6, and see how is
going on them, but at the first check the driver seems exactly the same
as in kernel 4.9.
>> When using my test case unit software between two serial ports connect each
>> other by a null modem cable, it fails when the speed rate are different,
>
> Of course, how would that work?
>
I am not native english speaker so I am misleading to a
misunderstanding: my test case is a software with two pthreads which the
main thread is working with a differnet baud rate than the other
pthread. Using the same software in two different machines, and using
the same baudrate for each corrispondant port it should work.
i.e. /dev/ttyAPP1 is running at 9600 and /dev/ttyAPP2 is running at 38400
The same in the other machine. Both ports are null-modem connected:
9600 /dev/ttyAPP1 <----> /dev/ttyAPP1 9600
38400 /dev/ttyAPP2 <----> /dev/ttyAPP2 38400
I hope to be clear now. ;-)
>> and
>> dataloss is increasing higher the speed rate.
>
> What type of flow control are you using?
>
Unfortunately no flow control. Because the I cannot use it. When
connected to the real-hardware those two ports are connected to a
microcontroller unit which does not have flow control, only RX & TX
connected (i.e. no RTS/CTS/DTE/DCE lines)
>> I suppose to have overruns (now I am modifying my software to check them
>> too), but I think it is due the way the ISR is called and all data are
>> passed to the uart circular buffer within the interrupt routine.
>
> Are you using flow control?
>
As above, no [ unfortunately ]
>> I am talking about the high latency from the IRQ up to the service routine
>> when flushing the FIFO and another IRQ is called by another uart in the same
>> time at different speed.
>>
>> The code I was looking is: drivers/tty/serial/mxs-auart.c __but__ all other
>> serial drivers are acting in the same way: they are reading one character at
>> time from the FIFO (if it exists) and put it into the circular buffer so
>> serial/tty driver can pass them to the user read routine.
>>
>> Each function call has some overhead and it is time-consuming, and if
>> another interrupt is invoked by the same UART Core but from another serial
>> port (different context) the continuos insertion done by hardware UART into
>> the FIFO cannot be served fast enough to have an overrun. I think this can
>> be applied __almost__ to every serial driver as they are written in the same
>> way.
>>
>> And it is __NOT__ an issue because of the CPU and its speed! Using two
>> serial converter (FTDI and Prolific PL2303 based) on each board, the problem
>> does not appear at all even after 24 hours running at more than 115200!!!
>
> usb-serial devices are totally different and send data to the host in a
> completly different way.
>
> Your hardware might just not be able to handle really high baud rates at
> a continous stream, what baud rate were you using?
>
I suppose that, but the same issue can be proven with all single core
(NO FIFO UART) processors using two ports on the same uart core, running
Linux kernel @ 450 Mhz or less.
The irq latency it is the same.
> And again, this is what flow control was designed for, please use it.
>
I know and usually I am using a sort of protocol which can check
correctness of packet, and if not, the packet has to be reasked/resent.
In this case the microcontroller board I am connected to is not built by
us, and the software is a custom protocol (and I do not know if an error
on transfer can be accomplished by another request).
So the flow control __CANNOT_BE_USED_AT_ALL__...
>> It does work fine if I am using two different serial devices: one internal
>> uart (mxs-auart) and an external uart (ttyUSB).
>
> Again, different interrupt and protocols being used for the USB stuff.
>
...and in our case is working better than the internal uart driver on
the same board. It is a real pity...
> thanks,
>
Thanks to you, mr. greg k-h!
> greg k-h
P.S.: I am a very close friend of Andrea Arcangeli, we grew up in the
same place, and we went in the same school here in Italy (Imola - bologna).
We used to talked about you last Christmas Holidays when Andrea came to
Italy from NY
Regards,
Gianluca Renzi
--
Eurek s.r.l. |
Electronic Engineering | http://www.eurek.it
via Celletta 8/B, 40026 Imola, Italy | Phone: +39-(0)542-609120
p.iva 00690621206 - c.f. 04020030377 | Fax: +39-(0)542-609212
On Tue, Apr 07, 2020 at 11:01:08AM +0200, gianluca wrote:
> On 04/07/2020 10:24 AM, Greg Kroah-Hartman wrote:
> > On Tue, Apr 07, 2020 at 09:30:21AM +0200, gianluca wrote:
> > > I have a BIG trouble having dataloss when using two internal serial ports of
> > > my boards based on NXP/FreeScale iMX28 SoC ARMv5Te ARM920ej-s architecture.
> > >
> > > It runs at 454Mhz.
> > >
> > > Kernel used 4.9.x
> >
> > That's a very old kernel, you are going to have to get support for that
> > from the vendor you bought it from :(
> >
>
> We are the vendor. ;-)
Good luck! :)
> Jokes apart, I can try to use the latest kernel 5.6, and see how is going on
> them, but at the first check the driver seems exactly the same as in kernel
> 4.9.
>
> > > When using my test case unit software between two serial ports connect each
> > > other by a null modem cable, it fails when the speed rate are different,
> >
> > Of course, how would that work?
> >
>
> I am not native english speaker so I am misleading to a misunderstanding: my
> test case is a software with two pthreads which the main thread is working
> with a differnet baud rate than the other pthread. Using the same software
> in two different machines, and using the same baudrate for each
> corrispondant port it should work.
>
> i.e. /dev/ttyAPP1 is running at 9600 and /dev/ttyAPP2 is running at 38400
>
> The same in the other machine. Both ports are null-modem connected:
>
> 9600 /dev/ttyAPP1 <----> /dev/ttyAPP1 9600
> 38400 /dev/ttyAPP2 <----> /dev/ttyAPP2 38400
>
> I hope to be clear now. ;-)
Ok, yes, that makes more sense now, thank you.
> > > and
> > > dataloss is increasing higher the speed rate.
> >
> > What type of flow control are you using?
> >
>
> Unfortunately no flow control. Because the I cannot use it. When connected
> to the real-hardware those two ports are connected to a microcontroller unit
> which does not have flow control, only RX & TX connected (i.e. no
> RTS/CTS/DTE/DCE lines)
Then you are going to have problems, that is exactly what flow control
was designed for. To ignore that is to have problems.
Also, there is software flow control when you do not have any control
lines. This "issue" was solved decades ago :)
> > > I am talking about the high latency from the IRQ up to the service routine
> > > when flushing the FIFO and another IRQ is called by another uart in the same
> > > time at different speed.
> > >
> > > The code I was looking is: drivers/tty/serial/mxs-auart.c __but__ all other
> > > serial drivers are acting in the same way: they are reading one character at
> > > time from the FIFO (if it exists) and put it into the circular buffer so
> > > serial/tty driver can pass them to the user read routine.
> > >
> > > Each function call has some overhead and it is time-consuming, and if
> > > another interrupt is invoked by the same UART Core but from another serial
> > > port (different context) the continuos insertion done by hardware UART into
> > > the FIFO cannot be served fast enough to have an overrun. I think this can
> > > be applied __almost__ to every serial driver as they are written in the same
> > > way.
> > >
> > > And it is __NOT__ an issue because of the CPU and its speed! Using two
> > > serial converter (FTDI and Prolific PL2303 based) on each board, the problem
> > > does not appear at all even after 24 hours running at more than 115200!!!
> >
> > usb-serial devices are totally different and send data to the host in a
> > completly different way.
> >
> > Your hardware might just not be able to handle really high baud rates at
> > a continous stream, what baud rate were you using?
> >
>
> I suppose that, but the same issue can be proven with all single core (NO
> FIFO UART) processors using two ports on the same uart core, running Linux
> kernel @ 450 Mhz or less.
>
> The irq latency it is the same.
Again, usb-serial devices do not use a uart on the host, so they have a
totally different design and code flow.
> > And again, this is what flow control was designed for, please use it.
> >
>
> I know and usually I am using a sort of protocol which can check correctness
> of packet, and if not, the packet has to be reasked/resent.
> In this case the microcontroller board I am connected to is not built by us,
> and the software is a custom protocol (and I do not know if an error on
> transfer can be accomplished by another request).
>
> So the flow control __CANNOT_BE_USED_AT_ALL__...
Then that is a design mistake, please fix that.
good luck!
greg k-h