Hello,
doing a quick review... I realized a few issues...
Am 17.01.2017 um 20:22 schrieb Akshay Bhat:
> This patch adds support for the Holt HI-311x CAN controller. The HI311x
> CAN controller is capable of transmitting and receiving standard data
> frames, extended data frames and remote frames. The HI311x interfaces
> with the host over SPI.
>
> Datasheet: http://www.holtic.com/documents/371-hi-3110_v-rev-jpdf.do
>
> Signed-off-by: Akshay Bhat <[email protected]>
> ---
>
> v1 -> v2:
> Address comments from Marc Kleine-Budde:
> - use u8 instead of uint8_t
> - alphabetically sort Makefile and Kconfig
> - copy over copyright information from mcp251x
> - use single space after each marco
> - add missing HI3110_ namespace in defines
> - remove magic number for IDE & SRR bits
> - simplify logic to populate extended CAN ID
> - remove unused parameters in hi3110_setup function
> - remove redundant frame->can_dlc length check
> - simplify error handling in hi3110_open function
> Address comments from Julia Lawall:
> - remove unnecessary semicolon after while loop in hi3110_can_ist
>
> drivers/net/can/spi/Kconfig | 6 +
> drivers/net/can/spi/Makefile | 1 +
> drivers/net/can/spi/hi311x.c | 1069 ++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 1076 insertions(+)
> create mode 100644 drivers/net/can/spi/hi311x.c
>
> diff --git a/drivers/net/can/spi/Kconfig b/drivers/net/can/spi/Kconfig
> index 148cae5..8f2e0dd 100644
> --- a/drivers/net/can/spi/Kconfig
> +++ b/drivers/net/can/spi/Kconfig
> @@ -1,6 +1,12 @@
> menu "CAN SPI interfaces"
> depends on SPI
>
> +config CAN_HI311X
> + tristate "Holt HI311x SPI CAN controllers"
> + depends on CAN_DEV && SPI && HAS_DMA
> + ---help---
> + Driver for the Holt HI311x SPI CAN controllers.
> +
> config CAN_MCP251X
> tristate "Microchip MCP251x SPI CAN controllers"
> depends on HAS_DMA
> diff --git a/drivers/net/can/spi/Makefile b/drivers/net/can/spi/Makefile
> index 0e86040..f59fa37 100644
> --- a/drivers/net/can/spi/Makefile
> +++ b/drivers/net/can/spi/Makefile
> @@ -3,4 +3,5 @@
> #
>
>
> +obj-$(CONFIG_CAN_HI311X) += hi311x.o
> obj-$(CONFIG_CAN_MCP251X) += mcp251x.o
> diff --git a/drivers/net/can/spi/hi311x.c b/drivers/net/can/spi/hi311x.c
> new file mode 100644
> index 0000000..cccfe2d
> --- /dev/null
> +++ b/drivers/net/can/spi/hi311x.c
> @@ -0,0 +1,1069 @@
> +/* CAN bus driver for Holt HI3110 CAN Controller with SPI Interface
> + *
> + * Copyright(C) Timesys Corporation 2016
> + *
> + * Based on Microchip 251x CAN Controller (mcp251x) Linux kernel driver
> + * Copyright 2009 Christian Pellegrin EVOL S.r.l.
> + * Copyright 2007 Raymarine UK, Ltd. All Rights Reserved.
> + * Copyright 2006 Arcom Control Systems Ltd.
> + *
> + * Based on CAN bus driver for the CCAN controller written by
> + * - Sascha Hauer, Marc Kleine-Budde, Pengutronix
> + * - Simon Kallweit, intefo AG
> + * Copyright 2007
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/can/core.h>
> +#include <linux/can/dev.h>
> +#include <linux/can/led.h>
> +#include <linux/clk.h>
> +#include <linux/completion.h>
> +#include <linux/delay.h>
> +#include <linux/device.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/freezer.h>
> +#include <linux/interrupt.h>
> +#include <linux/io.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/netdevice.h>
> +#include <linux/of.h>
> +#include <linux/of_device.h>
> +#include <linux/platform_device.h>
> +#include <linux/regulator/consumer.h>
> +#include <linux/slab.h>
> +#include <linux/spi/spi.h>
> +#include <linux/uaccess.h>
> +
> +#define HI3110_MASTER_RESET 0x56
> +#define HI3110_READ_CTRL0 0xD2
> +#define HI3110_READ_CTRL1 0xD4
> +#define HI3110_READ_STATF 0xE2
> +#define HI3110_WRITE_CTRL0 0x14
> +#define HI3110_WRITE_CTRL1 0x16
> +#define HI3110_WRITE_INTE 0x1C
> +#define HI3110_WRITE_BTR0 0x18
> +#define HI3110_WRITE_BTR1 0x1A
> +#define HI3110_READ_BTR0 0xD6
> +#define HI3110_READ_BTR1 0xD8
> +#define HI3110_READ_INTF 0xDE
> +#define HI3110_READ_ERR 0xDC
> +#define HI3110_READ_FIFO_WOTIME 0x48
> +#define HI3110_WRITE_FIFO 0x12
> +#define HI3110_READ_MESSTAT 0xDA
> +#define HI3110_READ_TEC 0xEC
> +
> +#define HI3110_CTRL0_MODE_MASK (7 << 5)
> +#define HI3110_CTRL0_NORMAL_MODE (0 << 5)
> +#define HI3110_CTRL0_LOOPBACK_MODE (1 << 5)
> +#define HI3110_CTRL0_MONITOR_MODE (2 << 5)
> +#define HI3110_CTRL0_SLEEP_MODE (3 << 5)
> +#define HI3110_CTRL0_INIT_MODE (4 << 5)
> +
> +#define HI3110_CTRL1_TXEN BIT(7)
> +
> +#define HI3110_INT_RXTMP BIT(7)
> +#define HI3110_INT_RXFIFO BIT(6)
> +#define HI3110_INT_TXCPLT BIT(5)
> +#define HI3110_INT_BUSERR BIT(4)
> +#define HI3110_INT_MCHG BIT(3)
> +#define HI3110_INT_WAKEUP BIT(2)
> +#define HI3110_INT_F1MESS BIT(1)
> +#define HI3110_INT_F0MESS BIT(0)
> +
> +#define HI3110_ERR_BUSOFF BIT(7)
> +#define HI3110_ERR_TXERRP BIT(6)
> +#define HI3110_ERR_RXERRP BIT(5)
> +#define HI3110_ERR_BITERR BIT(4)
> +#define HI3110_ERR_FRMERR BIT(3)
> +#define HI3110_ERR_CRCERR BIT(2)
> +#define HI3110_ERR_ACKERR BIT(1)
> +#define HI3110_ERR_STUFERR BIT(0)
> +#define HI3110_ERR_PROTOCOL_MASK (0x1F)
> +
> +#define HI3110_STAT_RXFMTY BIT(1)
> +
> +#define HI3110_BTR0_SJW_SHIFT 6
> +#define HI3110_BTR0_BRP_SHIFT 0
> +
> +#define HI3110_BTR1_SAMP_3PERBIT (1 << 7)
> +#define HI3110_BTR1_SAMP_1PERBIT (0 << 7)
> +#define HI3110_BTR1_TSEG2_SHIFT 4
> +#define HI3110_BTR1_TSEG1_SHIFT 0
> +
> +#define HI3110_FIFO_WOTIME_TAG_OFF 0
> +#define HI3110_FIFO_WOTIME_ID_OFF 1
> +#define HI3110_FIFO_WOTIME_DLC_OFF 5
> +#define HI3110_FIFO_WOTIME_DAT_OFF 6
> +
> +#define HI3110_FIFO_WOTIME_TAG_IDE BIT(7)
> +#define HI3110_FIFO_WOTIME_ID_RTR BIT(0)
> +
> +#define HI3110_FIFO_TAG_OFF 0
> +#define HI3110_FIFO_ID_OFF 1
> +#define HI3110_FIFO_STD_DLC_OFF 3
> +#define HI3110_FIFO_STD_DATA_OFF 4
> +#define HI3110_FIFO_EXT_DLC_OFF 5
> +#define HI3110_FIFO_EXT_DATA_OFF 6
> +
> +#define HI3110_CAN_FRAME_MAX_DATA_LEN 8
> +#define HI3110_RX_BUF_LEN 15
> +#define HI3110_TX_STD_BUF_LEN 12
> +#define HI3110_TX_EXT_BUF_LEN 14
> +#define HI3110_CAN_FRAME_MAX_BITS 128
> +#define HI3110_EFF_FLAGS 0x18 /* IDE + SRR */
> +
> +#define HI3110_TX_ECHO_SKB_MAX 1
> +
> +#define HI3110_OST_DELAY_MS (10)
> +
> +#define DEVICE_NAME "hi3110"
> +
> +static int hi3110_enable_dma = 1; /* Enable SPI DMA. Default: 1 (On) */
> +module_param(hi3110_enable_dma, int, 0444);
> +MODULE_PARM_DESC(hi3110_enable_dma, "Enable SPI DMA. Default: 1 (On)");
> +
> +static const struct can_bittiming_const hi3110_bittiming_const = {
> + .name = DEVICE_NAME,
> + .tseg1_min = 2,
> + .tseg1_max = 16,
> + .tseg2_min = 2,
> + .tseg2_max = 8,
> + .sjw_max = 4,
> + .brp_min = 1,
> + .brp_max = 64,
> + .brp_inc = 1,
> +};
> +
> +enum hi3110_model {
> + CAN_HI3110_HI3110 = 0x3110,
> +};
> +
> +struct hi3110_priv {
> + struct can_priv can;
> + struct net_device *net;
> + struct spi_device *spi;
> + enum hi3110_model model;
> +
> + struct mutex hi3110_lock; /* SPI device lock */
> +
> + u8 *spi_tx_buf;
> + u8 *spi_rx_buf;
> + dma_addr_t spi_tx_dma;
> + dma_addr_t spi_rx_dma;
> +
> + struct sk_buff *tx_skb;
> + int tx_len;
> +
> + struct workqueue_struct *wq;
> + struct work_struct tx_work;
> + struct work_struct restart_work;
> +
> + int force_quit;
> + int after_suspend;
> +#define HI3110_AFTER_SUSPEND_UP 1
> +#define HI3110_AFTER_SUSPEND_DOWN 2
> +#define HI3110_AFTER_SUSPEND_POWER 4
> +#define HI3110_AFTER_SUSPEND_RESTART 8
> + int restart_tx;
> + struct regulator *power;
> + struct regulator *transceiver;
> + struct clk *clk;
> +};
> +
> +static void hi3110_clean(struct net_device *net)
> +{
> + struct hi3110_priv *priv = netdev_priv(net);
> +
> + if (priv->tx_skb || priv->tx_len)
> + net->stats.tx_errors++;
> + if (priv->tx_skb)
> + dev_kfree_skb(priv->tx_skb);
> + if (priv->tx_len)
> + can_free_echo_skb(priv->net, 0);
> + priv->tx_skb = NULL;
> + priv->tx_len = 0;
> +}
> +
> +/* Note about handling of error return of hi3110_spi_trans: accessing
> + * registers via SPI is not really different conceptually than using
> + * normal I/O assembler instructions, although it's much more
> + * complicated from a practical POV. So it's not advisable to always
> + * check the return value of this function. Imagine that every
> + * read{b,l}, write{b,l} and friends would be bracketed in "if ( < 0)
> + * error();", it would be a great mess (well there are some situation
> + * when exception handling C++ like could be useful after all). So we
> + * just check that transfers are OK at the beginning of our
> + * conversation with the chip and to avoid doing really nasty things
> + * (like injecting bogus packets in the network stack).
> + */
> +static int hi3110_spi_trans(struct spi_device *spi, int len)
> +{
> + struct hi3110_priv *priv = spi_get_drvdata(spi);
> + struct spi_transfer t = {
> + .tx_buf = priv->spi_tx_buf,
> + .rx_buf = priv->spi_rx_buf,
> + .len = len,
> + .cs_change = 0,
> + };
> + struct spi_message m;
> + int ret;
> +
> + spi_message_init(&m);
> +
> + if (hi3110_enable_dma) {
> + t.tx_dma = priv->spi_tx_dma;
> + t.rx_dma = priv->spi_rx_dma;
> + m.is_dma_mapped = 1;
> + }
> +
> + spi_message_add_tail(&t, &m);
> +
> + ret = spi_sync(spi, &m);
> +
> + if (ret)
> + dev_err(&spi->dev, "spi transfer failed: ret = %d\n", ret);
> + return ret;
> +}
> +
> +static u8 hi3110_cmd(struct spi_device *spi, u8 command)
> +{
> + struct hi3110_priv *priv = spi_get_drvdata(spi);
> +
> + priv->spi_tx_buf[0] = command;
> + dev_dbg(&spi->dev, "hi3110_cmd: %02X\n", command);
> +
> + return hi3110_spi_trans(spi, 1);
> +}
> +
> +static u8 hi3110_read(struct spi_device *spi, u8 command)
> +{
> + struct hi3110_priv *priv = spi_get_drvdata(spi);
> + u8 val = 0;
> +
> + priv->spi_tx_buf[0] = command;
> + hi3110_spi_trans(spi, 2);
> + val = priv->spi_rx_buf[1];
> + dev_dbg(&spi->dev, "hi3110_read: %02X, %02X\n", command, val);
This produces a lot of output which is not useful for the normal user.
> +
> + return val;
> +}
> +
> +static void hi3110_write(struct spi_device *spi, u8 reg, u8 val)
> +{
> + struct hi3110_priv *priv = spi_get_drvdata(spi);
> +
> + priv->spi_tx_buf[0] = reg;
> + priv->spi_tx_buf[1] = val;
> + dev_dbg(&spi->dev, "hi3110_write: %02X, %02X\n", reg, val);
Ditto.
> +
> + hi3110_spi_trans(spi, 2);
> +}
> +
> +static void hi3110_hw_tx_frame(struct spi_device *spi, u8 *buf, int len)
> +{
> + struct hi3110_priv *priv = spi_get_drvdata(spi);
> +
> + priv->spi_tx_buf[0] = HI3110_WRITE_FIFO;
> + memcpy(priv->spi_tx_buf + 1, buf, len);
> + hi3110_spi_trans(spi, len + 1);
> +}
> +
> +static void hi3110_hw_tx(struct spi_device *spi, struct can_frame *frame)
> +{
> + u8 buf[HI3110_TX_EXT_BUF_LEN];
> +
> + buf[HI3110_FIFO_TAG_OFF] = 0;
> +
> + if (frame->can_id & CAN_EFF_FLAG) {
> + /* Extended frame */
> + buf[HI3110_FIFO_ID_OFF] = (frame->can_id & CAN_EFF_MASK) >> 21;
> + buf[HI3110_FIFO_ID_OFF + 1] =
> + (((frame->can_id & CAN_EFF_MASK) >> 13) & 0xe0) |
> + HI3110_EFF_FLAGS |
> + (((frame->can_id & CAN_EFF_MASK) >> 15) & 0x07);
> + buf[HI3110_FIFO_ID_OFF + 2] =
> + (frame->can_id & CAN_EFF_MASK) >> 7;
> + buf[HI3110_FIFO_ID_OFF + 3] =
> + ((frame->can_id & CAN_EFF_MASK) << 1) |
> + ((frame->can_id & CAN_RTR_FLAG) ? 1 : 0);
> +
> + buf[HI3110_FIFO_EXT_DLC_OFF] = frame->can_dlc;
> +
> + memcpy(buf + HI3110_FIFO_EXT_DATA_OFF,
> + frame->data, frame->can_dlc);
> +
> + hi3110_hw_tx_frame(spi, buf, HI3110_TX_EXT_BUF_LEN -
> + (HI3110_CAN_FRAME_MAX_DATA_LEN - frame->can_dlc));
> + } else {
> + /* Standard frame */
> + buf[HI3110_FIFO_ID_OFF] = (frame->can_id & CAN_SFF_MASK) >> 3;
> + buf[HI3110_FIFO_ID_OFF + 1] =
> + ((frame->can_id & CAN_SFF_MASK) << 5) |
> + ((frame->can_id & CAN_RTR_FLAG) ? (1 << 4) : 0);
> +
> + buf[HI3110_FIFO_STD_DLC_OFF] = frame->can_dlc;
> +
> + memcpy(buf + HI3110_FIFO_STD_DATA_OFF,
> + frame->data, frame->can_dlc);
> +
> + hi3110_hw_tx_frame(spi, buf, HI3110_TX_STD_BUF_LEN -
> + (HI3110_CAN_FRAME_MAX_DATA_LEN - frame->can_dlc));
> + }
> +}
> +
> +static void hi3110_hw_rx_frame(struct spi_device *spi, u8 *buf)
> +{
> + struct hi3110_priv *priv = spi_get_drvdata(spi);
> +
> + priv->spi_tx_buf[0] = HI3110_READ_FIFO_WOTIME;
> + hi3110_spi_trans(spi, HI3110_RX_BUF_LEN);
> + memcpy(buf, priv->spi_rx_buf + 1, HI3110_RX_BUF_LEN - 1);
> +}
> +
> +static void hi3110_hw_rx(struct spi_device *spi)
> +{
> + struct hi3110_priv *priv = spi_get_drvdata(spi);
> + struct sk_buff *skb;
> + struct can_frame *frame;
> + u8 buf[HI3110_RX_BUF_LEN - 1];
> +
> + skb = alloc_can_skb(priv->net, &frame);
> + if (!skb) {
> + dev_err(&spi->dev, "cannot allocate RX skb\n");
Please return silenty! Otherwise it will make the situation worse.
> + priv->net->stats.rx_dropped++;
> + return;
> + }
> +
> + hi3110_hw_rx_frame(spi, buf);
> + if (buf[HI3110_FIFO_WOTIME_TAG_OFF] & HI3110_FIFO_WOTIME_TAG_IDE) {
> + /* IDE is recessive (1), indicating extended 29-bit frame */
> + frame->can_id = CAN_EFF_FLAG;
> + frame->can_id |=
> + (buf[HI3110_FIFO_WOTIME_ID_OFF] << 21) |
> + (((buf[HI3110_FIFO_WOTIME_ID_OFF + 1] & 0xE0) >> 5) << 18) |
> + ((buf[HI3110_FIFO_WOTIME_ID_OFF + 1] & 0x07) << 15) |
> + (buf[HI3110_FIFO_WOTIME_ID_OFF + 2] << 7) |
> + (buf[HI3110_FIFO_WOTIME_ID_OFF + 3] >> 1);
> + } else {
> + /* IDE is dominant (0), frame indicating standard 11-bit */
> + frame->can_id =
> + (buf[HI3110_FIFO_WOTIME_ID_OFF] << 3) |
> + ((buf[HI3110_FIFO_WOTIME_ID_OFF + 1] & 0xE0) >> 5);
> + }
> +
> + if (buf[HI3110_FIFO_WOTIME_ID_OFF + 3] & HI3110_FIFO_WOTIME_ID_RTR) {
> + /* RTR is recessive (1), indicating remote request frame */
> + frame->can_id |= CAN_RTR_FLAG;
> + }
> +
> + /* Data length */
> + frame->can_dlc = get_can_dlc(buf[HI3110_FIFO_WOTIME_DLC_OFF] & 0x0F);
> + memcpy(frame->data, buf + HI3110_FIFO_WOTIME_DAT_OFF, frame->can_dlc);
No data bytes should not be copied for RTR messages.
> +
> + priv->net->stats.rx_packets++;
> + priv->net->stats.rx_bytes += frame->can_dlc;
> +
> + can_led_event(priv->net, CAN_LED_EVENT_RX);
> +
> + netif_rx_ni(skb);
> +}
> +
> +static void hi3110_hw_sleep(struct spi_device *spi)
> +{
> + hi3110_write(spi, HI3110_WRITE_CTRL0, HI3110_CTRL0_SLEEP_MODE);
> +}tx_skb
> +static netdev_tx_t hi3110_hard_start_xmit(struct sk_buff *skb,
> + struct net_device *net)
> +{
> + struct hi3110_priv *priv = netdev_priv(net);
> + struct spi_device *spi = priv->spi;
> +
> + if (priv->tx_skb || priv->tx_len) {
> + dev_warn(&spi->dev, "hard_xmit called while tx busy\n");
s/warn/err/? This should never happen; IIUC.
> + return NETDEV_TX_BUSY;
> + }
> +
> + if (can_dropped_invalid_skb(net, skb))
> + return NETDEV_TX_OK;
> +
> + netif_stop_queue(net);
The driver transfers the packets in sequence. Any chance to queue them?
At least there is a TX FIFO for 8 messages. That's bad for RT but would
increase the throughput.
> + priv->tx_skb = skb;
> + queue_work(priv->wq, &priv->tx_work);
> +
> + return NETDEV_TX_OK;
> +}
> +
> +static int hi3110_do_set_mode(struct net_device *net, enum can_mode mode)
> +{
> + struct hi3110_priv *priv = netdev_priv(net);
> +
> + switch (mode) {
> + case CAN_MODE_START:
> + hi3110_clean(net);
> + /* We have to delay work since SPI I/O may sleep */
> + priv->can.state = CAN_STATE_ERROR_ACTIVE;
> + priv->restart_tx = 1;
> + if (priv->can.restart_ms == 0)
> + priv->after_suspend = HI3110_AFTER_SUSPEND_RESTART;
> + queue_work(priv->wq, &priv->restart_work);
> + break;
> + default:
> + return -EOPNOTSUPP;
> + }
> +
> + return 0;
> +}
> +
> +static int hi3110_set_normal_mode(struct spi_device *spi)
> +{
> + struct hi3110_priv *priv = spi_get_drvdata(spi);
> + u8 reg;
> +
> + hi3110_write(spi, HI3110_WRITE_INTE, HI3110_INT_BUSERR |
> + HI3110_INT_RXFIFO | HI3110_INT_TXCPLT);
> +
> + /* Enable TX */
> + hi3110_write(spi, HI3110_WRITE_CTRL1, HI3110_CTRL1_TXEN);
> +
> + if (priv->can.ctrlmode & CAN_CTRLMODE_LOOPBACK) {
> + /* Put device into loopback mode */
> + hi3110_write(spi, HI3110_WRITE_CTRL0,
> + HI3110_CTRL0_LOOPBACK_MODE);
> + } else if (priv->can.ctrlmode & CAN_CTRLMODE_LISTENONLY) {
> + /* Put device into listen-only mode */
> + hi3110_write(spi, HI3110_WRITE_CTRL0,
> + HI3110_CTRL0_MONITOR_MODE);
> + } else {
> + /* Put device into normal mode */
> + hi3110_write(spi, HI3110_WRITE_CTRL0,
> + HI3110_CTRL0_NORMAL_MODE);
"mode = x" and just one write is more compact.
> +
> + /* Wait for the device to enter normal mode */
> + mdelay(HI3110_OST_DELAY_MS);
> + reg = hi3110_read(spi, HI3110_READ_CTRL0);
> + if ((reg & HI3110_CTRL0_MODE_MASK) != HI3110_CTRL0_NORMAL_MODE)
> + return -EBUSY;
Is this not necesary for listen or loopbcak only mode?
> + }
> + priv->can.state = CAN_STATE_ERROR_ACTIVE;
> + return 0;
> +}
> +
> +static int hi3110_do_set_bittiming(struct net_device *net)
> +{
> + struct hi3110_priv *priv = netdev_priv(net);
> + struct can_bittiming *bt = &priv->can.bittiming;
> + struct spi_device *spi = priv->spi;
> +
> + hi3110_write(spi, HI3110_WRITE_BTR0,
> + ((bt->sjw - 1) << HI3110_BTR0_SJW_SHIFT) |
> + ((bt->brp - 1) << HI3110_BTR0_BRP_SHIFT));
> +
> + hi3110_write(spi, HI3110_WRITE_BTR1,
> + (priv->can.ctrlmode &
> + CAN_CTRLMODE_3_SAMPLES ?
> + HI3110_BTR1_SAMP_3PERBIT : HI3110_BTR1_SAMP_1PERBIT) |
> + ((bt->phase_seg1 + bt->prop_seg - 1)
> + << HI3110_BTR1_TSEG1_SHIFT) |
> + ((bt->phase_seg2 - 1) << HI3110_BTR1_TSEG2_SHIFT));
> +
> + dev_dbg(&spi->dev, "BT: 0x%02x 0x%02x\n",
> + hi3110_read(spi, HI3110_READ_BTR0),
> + hi3110_read(spi, HI3110_READ_BTR1));
> +
> + return 0;
> +}
> +
> +static int hi3110_setup(struct net_device *net)
> +{
> + hi3110_do_set_bittiming(net);
> + return 0;
> +}
> +
> +static int hi3110_hw_reset(struct spi_device *spi)
> +{
> + u8 reg;
> + int ret;
> +
> + /* Wait for oscillator startup timer after power up */
> + mdelay(HI3110_OST_DELAY_MS);
> +
> + ret = hi3110_cmd(spi, HI3110_MASTER_RESET);
> + if (ret)
> + return ret;
> +
> + /* Wait for oscillator startup timer after reset */
> + mdelay(HI3110_OST_DELAY_MS);
> +
> + reg = hi3110_read(spi, HI3110_READ_CTRL0);
> + if ((reg & HI3110_CTRL0_MODE_MASK) != HI3110_CTRL0_INIT_MODE)
> + return -ENODEV;
> +
> + /* As per the datasheet it appears the error flags are
> + * not cleared on reset. Explicitly clear them by performing a read
> + */
> + hi3110_read(spi, HI3110_READ_ERR);
> +
> + return 0;
> +}
> +
> +static int hi3110_hw_probe(struct spi_device *spi)
> +{
> + u8 statf;
> +
> + hi3110_hw_reset(spi);
> +
> + /* Confirm correct operation by checking against reset values
> + * in datasheet
> + */
> + statf = hi3110_read(spi, HI3110_READ_STATF);
> +
> + dev_dbg(&spi->dev, "statf: %02X\n", statf);
> +
> + if (statf != 0x82)
> + return -ENODEV;
> +
> + return 0;
> +}
> +
> +static int hi3110_power_enable(struct regulator *reg, int enable)
> +{
> + if (IS_ERR_OR_NULL(reg))
> + return 0;
> +
> + if (enable)
> + return regulator_enable(reg);
> + else
> + return regulator_disable(reg);
> +}
> +
> +static int hi3110_stop(struct net_device *net)
> +{
> + struct hi3110_priv *priv = netdev_priv(net);
> + struct spi_device *spi = priv->spi;
> +
> + close_candev(net);
> +
> + priv->force_quit = 1;
> + free_irq(spi->irq, priv);
> + destroy_workqueue(priv->wq);
> + priv->wq = NULL;
> +
> + mutex_lock(&priv->hi3110_lock);
> +
> + /* Disable transmit, interrupts and clear flags */
> + hi3110_write(spi, HI3110_WRITE_CTRL1, 0x0);
> + hi3110_write(spi, HI3110_WRITE_INTE, 0x0);
> + hi3110_read(spi, HI3110_READ_INTF);
> +
> + hi3110_clean(net);
> +
> + hi3110_hw_sleep(spi);
> +
> + hi3110_power_enable(priv->transceiver, 0);
> +
> + priv->can.state = CAN_STATE_STOPPED;
> +
> + mutex_unlock(&priv->hi3110_lock);
> +
> + can_led_event(net, CAN_LED_EVENT_STOP);
> +
> + return 0;
> +}
> +
> +static void hi3110_error_skb(struct net_device *net, int can_id,
> + int data1, int data2)
> +{
> + struct sk_buff *skb;
> + struct can_frame *frame;
> +
> + skb = alloc_can_err_skb(net, &frame);
> + if (skb) {
> + frame->can_id |= can_id;
> + frame->data[1] = data1;
> + frame->data[2] = data2;
> + netif_rx_ni(skb);
> + } else {
> + netdev_err(net, "cannot allocate error skb\n");
Please remove the error message. Not a good at low memory situations.
> + }
> +}
> +
> +static void hi3110_tx_work_handler(struct work_struct *ws)
> +{
> + struct hi3110_priv *priv = container_of(ws, struct hi3110_priv,
> + tx_work);
> + struct spi_device *spi = priv->spi;
> + struct net_device *net = priv->net;
> + struct can_frame *frame;
> +
> + mutex_lock(&priv->hi3110_lock);
> + if (priv->tx_skb) {
> + if (priv->can.state == CAN_STATE_BUS_OFF) {
> + hi3110_clean(net);
> + } else {
> + frame = (struct can_frame *)priv->tx_skb->data;
> + hi3110_hw_tx(spi, frame);
> + priv->tx_len = 1 + frame->can_dlc;
> + can_put_echo_skb(priv->tx_skb, net, 0);
> + priv->tx_skb = NULL;
> + }
> + }
> + mutex_unlock(&priv->hi3110_lock);
> +}
> +
> +static void hi3110_restart_work_handler(struct work_struct *ws)
> +{
> + struct hi3110_priv *priv = container_of(ws, struct hi3110_priv,
> + restart_work);
> + struct spi_device *spi = priv->spi;
> + struct net_device *net = priv->net;
> +
> + mutex_lock(&priv->hi3110_lock);
> + if (priv->after_suspend) {
> + hi3110_hw_reset(spi);
> + hi3110_setup(net);
> + if (priv->after_suspend & HI3110_AFTER_SUSPEND_RESTART) {
> + hi3110_set_normal_mode(spi);
> + } else if (priv->after_suspend & HI3110_AFTER_SUSPEND_UP) {
> + netif_device_attach(net);
> + hi3110_clean(net);
> + hi3110_set_normal_mode(spi);
> + netif_wake_queue(net);
> + } else {
> + hi3110_hw_sleep(spi);
> + }
> + priv->after_suspend = 0;
> + priv->force_quit = 0;
> + }
> +
> + if (priv->restart_tx) {
> + priv->restart_tx = 0;
> + hi3110_clean(net);
> + netif_wake_queue(net);
> + hi3110_error_skb(net, CAN_ERR_RESTARTED, 0, 0);
> + }
> + mutex_unlock(&priv->hi3110_lock);
> +}
> +
> +static irqreturn_t hi3110_can_ist(int irq, void *dev_id)
> +{
> + struct hi3110_priv *priv = dev_id;
> + struct spi_device *spi = priv->spi;
> + struct net_device *net = priv->net;
> +
> + mutex_lock(&priv->hi3110_lock);
> +
> + while (!priv->force_quit) {
> + enum can_state new_state;
> + u8 intf;
> + u8 eflag;
> + int can_id = 0, data1 = 0, data2 = 0;
> +
> + while (!(HI3110_STAT_RXFMTY &
> + hi3110_read(spi, HI3110_READ_STATF))) {
> + hi3110_hw_rx(spi);
> + }
> +
> + intf = hi3110_read(spi, HI3110_READ_INTF);
> + eflag = hi3110_read(spi, HI3110_READ_ERR);
> + /* Update can state */
> + if (eflag & HI3110_ERR_BUSOFF) {
> + new_state = CAN_STATE_BUS_OFF;
> + can_id |= CAN_ERR_BUSOFF;
> + } else if (eflag & HI3110_ERR_TXERRP) {
> + new_state = CAN_STATE_ERROR_PASSIVE;
> + can_id |= CAN_ERR_CRTL;
> + data1 |= CAN_ERR_CRTL_TX_PASSIVE;
> + } else if (eflag & HI3110_ERR_RXERRP) {
> + new_state = CAN_STATE_ERROR_PASSIVE;
> + can_id |= CAN_ERR_CRTL;
> + data1 |= CAN_ERR_CRTL_RX_PASSIVE;
> + } else {
> + new_state = CAN_STATE_ERROR_ACTIVE;
> + }
> +
> + /* Check for protocol errors */
> + if (eflag & HI3110_ERR_PROTOCOL_MASK) {
> + can_id |= CAN_ERR_PROT | CAN_ERR_BUSERROR;
> + priv->can.can_stats.bus_error++;
> + priv->net->stats.rx_errors++;
> + if (eflag & HI3110_ERR_BITERR)
> + data2 |= CAN_ERR_PROT_BIT;
> + else if (eflag & HI3110_ERR_FRMERR)
> + data2 |= CAN_ERR_PROT_FORM;
> + else if (eflag & HI3110_ERR_STUFERR)
> + data2 |= CAN_ERR_PROT_STUFF;
> + else
> + data2 |= CAN_ERR_PROT_UNSPEC;
And what about the ACK and CRC error defines at the beginning?
It's also comon to use netdev_dbg() on error interrupts.
> + }
Bus error reporting can flood the system with interrupts. Any chance to
implement CAN_CTRLMODE_BERR_REPORTING. I think the bus error interrupt
can be enabled/disabled.
> + /* Update can state statistics */
> + switch (priv->can.state) {
> + case CAN_STATE_ERROR_ACTIVE:
> + if (new_state >= CAN_STATE_ERROR_WARNING &&
> + new_state <= CAN_STATE_BUS_OFF)
> + priv->can.can_stats.error_warning++;
> + /* fallthrough */
> + case CAN_STATE_ERROR_WARNING:
> + if (new_state >= CAN_STATE_ERROR_PASSIVE &&
> + new_state <= CAN_STATE_BUS_OFF)
> + priv->can.can_stats.error_passive++;
> + break;
> + default:
> + break;
> + }
> + priv->can.state = new_state;
> +
> + if (intf & HI3110_INT_BUSERR) {
> + /* Note: HI3110 Does report overflow errors */
> + hi3110_error_skb(net, can_id, data1, data2);
> + }
Usually the bus error counts are filled in the error message frame as
well. It the counts are available, I would also be nice to have the
"do_get_berr_counter" callback as well.
> +
> + if (priv->can.state == CAN_STATE_BUS_OFF) {
> + if (priv->can.restart_ms == 0) {
> + priv->force_quit = 1;
> + priv->can.can_stats.bus_off++;
> + can_bus_off(net);
> + hi3110_hw_sleep(spi);
> + break;
> + }
> + }
> +
> + if (intf == 0)
> + break;
> +
> + if (intf & HI3110_INT_TXCPLT) {
> + net->stats.tx_packets++;
> + net->stats.tx_bytes += priv->tx_len - 1;
> + can_led_event(net, CAN_LED_EVENT_TX);
> + if (priv->tx_len) {
> + can_get_echo_skb(net, 0);
> + priv->tx_len = 0;
> + }
> + netif_wake_queue(net);
> + }
> + }
> + mutex_unlock(&priv->hi3110_lock);
> + return IRQ_HANDLED;
> +}
> +
> +static int hi3110_open(struct net_device *net)
> +{
> + struct hi3110_priv *priv = netdev_priv(net);
> + struct spi_device *spi = priv->spi;
> + unsigned long flags = IRQF_ONESHOT | IRQF_TRIGGER_RISING;
> + int ret;
> +
> + ret = open_candev(net);
> + if (ret) {
> + dev_err(&spi->dev, "unable to set initial baudrate!\n");
open_candev() does already print an error message.
> + return ret;
> + }
> +
> + mutex_lock(&priv->hi3110_lock);
> + hi3110_power_enable(priv->transceiver, 1);
> +
> + priv->force_quit = 0;
> + priv->tx_skb = NULL;
> + priv->tx_len = 0;
> +
> + ret = request_threaded_irq(spi->irq, NULL, hi3110_can_ist,
> + flags, DEVICE_NAME, priv);
> + if (ret) {
> + dev_err(&spi->dev, "failed to acquire irq %d\n", spi->irq);
> + goto out_close;
> + }
> +
> + priv->wq = alloc_workqueue("hi3110_wq", WQ_FREEZABLE | WQ_MEM_RECLAIM,
> + 0);
> + INIT_WORK(&priv->tx_work, hi3110_tx_work_handler);
> + INIT_WORK(&priv->restart_work, hi3110_restart_work_handler);
> +
> + ret = hi3110_hw_reset(spi);
> + if (ret)
> + goto out_free_irq;
> +
> + ret = hi3110_setup(net);
> + if (ret)
> + goto out_free_irq;
> +
> + ret = hi3110_set_normal_mode(spi);
> + if (ret)
> + goto out_free_irq;
> +
> + can_led_event(net, CAN_LED_EVENT_OPEN);
> + netif_wake_queue(net);
> + mutex_unlock(&priv->hi3110_lock);
> +
> + return 0;
> +
> +out_free_irq:
> + free_irq(spi->irq, priv);
> + hi3110_hw_sleep(spi);
> +
> +out_close:
> + hi3110_power_enable(priv->transceiver, 0);
> + close_candev(net);
> + mutex_unlock(&priv->hi3110_lock);
> + return ret;
> +}
> +
> +static const struct net_device_ops hi3110_netdev_ops = {
> + .ndo_open = hi3110_open,
> + .ndo_stop = hi3110_stop,
> + .ndo_start_xmit = hi3110_hard_start_xmit,
> +};
> +
> +static const struct of_device_id hi3110_of_match[] = {
> + {
> + .compatible = "holt,hi3110",
> + .data = (void *)CAN_HI3110_HI3110,
> + },
> + { }
> +};
> +MODULE_DEVICE_TABLE(of, hi3110_of_match);
> +
> +static const struct spi_device_id hi3110_id_table[] = {
> + {
> + .name = "hi3110",
> + .driver_data = (kernel_ulong_t)CAN_HI3110_HI3110,
> + },
> + { }
> +};
> +MODULE_DEVICE_TABLE(spi, hi3110_id_table);
> +
> +static int hi3110_can_probe(struct spi_device *spi)
> +{
> + const struct of_device_id *of_id = of_match_device(hi3110_of_match,
> + &spi->dev);
> + struct net_device *net;
> + struct hi3110_priv *priv;
> + struct clk *clk;
> + int freq, ret;
> +
> + clk = devm_clk_get(&spi->dev, NULL);
> + if (IS_ERR(clk)) {
> + dev_err(&spi->dev, "no CAN clock source defined\n");
> + return PTR_ERR(clk);
> + }
> + freq = clk_get_rate(clk);
> +
> + /* Sanity check */
> + if (freq > 40000000)
> + return -ERANGE;
> +
> + /* Allocate can/net device */
> + net = alloc_candev(sizeof(struct hi3110_priv), HI3110_TX_ECHO_SKB_MAX);
> + if (!net)
> + return -ENOMEM;
> +
> + if (!IS_ERR(clk)) {
> + ret = clk_prepare_enable(clk);
> + if (ret)
> + goto out_free;
> + }
> +
> + net->netdev_ops = &hi3110_netdev_ops;
> + net->flags |= IFF_ECHO;
> +
> + priv = netdev_priv(net);
> + priv->can.bittiming_const = &hi3110_bittiming_const;
> + priv->can.do_set_mode = hi3110_do_set_mode;
> + priv->can.clock.freq = freq / 2;
> + priv->can.ctrlmode_supported = CAN_CTRLMODE_3_SAMPLES |
> + CAN_CTRLMODE_LOOPBACK | CAN_CTRLMODE_LISTENONLY;
> + if (of_id)
> + priv->model = (enum hi3110_model)of_id->data;
> + else
> + priv->model = spi_get_device_id(spi)->driver_data;
> + priv->net = net;
> + priv->clk = clk;
> +
> + spi_set_drvdata(spi, priv);
> +
> + /* Configure the SPI bus */
> + spi->bits_per_word = 8;
> + ret = spi_setup(spi);
> + if (ret)
> + goto out_clk;
> +
> + priv->power = devm_regulator_get_optional(&spi->dev, "vdd");
> + priv->transceiver = devm_regulator_get_optional(&spi->dev, "xceiver");
> + if ((PTR_ERR(priv->power) == -EPROBE_DEFER) ||
> + (PTR_ERR(priv->transceiver) == -EPROBE_DEFER)) {
> + ret = -EPROBE_DEFER;
> + goto out_clk;
> + }
> +
> + ret = hi3110_power_enable(priv->power, 1);
> + if (ret)
> + goto out_clk;
> +
> + priv->spi = spi;
> + mutex_init(&priv->hi3110_lock);
> +
> + /* If requested, allocate DMA buffers */
> + if (hi3110_enable_dma) {
> + spi->dev.coherent_dma_mask = ~0;
> +
> + /* Minimum coherent DMA allocation is PAGE_SIZE, so allocate
> + * that much and share it between Tx and Rx DMA buffers.
> + */
> + priv->spi_tx_buf = dmam_alloc_coherent(&spi->dev,
> + PAGE_SIZE,
> + &priv->spi_tx_dma,
> + GFP_DMA);
> +
> + if (priv->spi_tx_buf) {
> + priv->spi_rx_buf = (priv->spi_tx_buf + (PAGE_SIZE / 2));
> + priv->spi_rx_dma = (dma_addr_t)(priv->spi_tx_dma +
> + (PAGE_SIZE / 2));
> + } else {
> + /* Fall back to non-DMA */
> + hi3110_enable_dma = 0;
> + }
> + }
> +
> + /* Allocate non-DMA buffers */
> + if (!hi3110_enable_dma) {
> + priv->spi_tx_buf = devm_kzalloc(&spi->dev, HI3110_RX_BUF_LEN,
> + GFP_KERNEL);
> + if (!priv->spi_tx_buf) {
> + ret = -ENOMEM;
> + goto error_probe;
> + }
> + priv->spi_rx_buf = devm_kzalloc(&spi->dev, HI3110_RX_BUF_LEN,
> + GFP_KERNEL);
> +
> + if (!priv->spi_rx_buf) {
> + ret = -ENOMEM;
> + goto error_probe;
> + }
> + }
> +
> + SET_NETDEV_DEV(net, &spi->dev);
> +
> + ret = hi3110_hw_probe(spi);
> + if (ret) {
> + if (ret == -ENODEV)
> + dev_err(&spi->dev, "Cannot initialize %x. Wrong wiring?\n",
> + priv->model);
> + goto error_probe;
> + }
> + hi3110_hw_sleep(spi);
> +
> + ret = register_candev(net);
> + if (ret)
> + goto error_probe;
> +
> + devm_can_led_init(net);
> + netdev_info(net, "%x successfully initialized.\n", priv->model);
> +
> + return 0;
> +
> +error_probe:
> + hi3110_power_enable(priv->power, 0);
> +
> +out_clk:
> + if (!IS_ERR(clk))
> + clk_disable_unprepare(clk);
> +
> +out_free:
> + free_candev(net);
> +
> + dev_err(&spi->dev, "Probe failed, err=%d\n", -ret);
> + return ret;
> +}
> +
> +static int hi3110_can_remove(struct spi_device *spi)
> +{
> + struct hi3110_priv *priv = spi_get_drvdata(spi);
> + struct net_device *net = priv->net;
> +
> + unregister_candev(net);
> +
> + hi3110_power_enable(priv->power, 0);
> +
> + if (!IS_ERR(priv->clk))
> + clk_disable_unprepare(priv->clk);
> +
> + free_candev(net);
> +
> + return 0;
> +}
> +
> +static int __maybe_unused hi3110_can_suspend(struct device *dev)
> +{
> + struct spi_device *spi = to_spi_device(dev);
> + struct hi3110_priv *priv = spi_get_drvdata(spi);
> + struct net_device *net = priv->net;
> +
> + priv->force_quit = 1;
> + disable_irq(spi->irq);
> +
> + /* Note: at this point neither IST nor workqueues are running.
> + * open/stop cannot be called anyway so locking is not needed
> + */
> + if (netif_running(net)) {
> + netif_device_detach(net);
> +
> + hi3110_hw_sleep(spi);
> + hi3110_power_enable(priv->transceiver, 0);
> + priv->after_suspend = HI3110_AFTER_SUSPEND_UP;
> + } else {
> + priv->after_suspend = HI3110_AFTER_SUSPEND_DOWN;
> + }
> +
> + if (!IS_ERR_OR_NULL(priv->power)) {
> + regulator_disable(priv->power);
> + priv->after_suspend |= HI3110_AFTER_SUSPEND_POWER;
> + }
> +
> + return 0;
> +}
> +
> +static int __maybe_unused hi3110_can_resume(struct device *dev)
> +{
> + struct spi_device *spi = to_spi_device(dev);
> + struct hi3110_priv *priv = spi_get_drvdata(spi);
> +
> + if (priv->after_suspend & HI3110_AFTER_SUSPEND_POWER)
> + hi3110_power_enable(priv->power, 1);
> +
> + if (priv->after_suspend & HI3110_AFTER_SUSPEND_UP) {
> + hi3110_power_enable(priv->transceiver, 1);
> + queue_work(priv->wq, &priv->restart_work);
> + } else {
> + priv->after_suspend = 0;
> + }
> +
> + priv->force_quit = 0;
> + enable_irq(spi->irq);
> + return 0;
> +}
> +
> +static SIMPLE_DEV_PM_OPS(hi3110_can_pm_ops, hi3110_can_suspend,
> + hi3110_can_resume);
> +
> +static struct spi_driver hi3110_can_driver = {
> + .driver = {
> + .name = DEVICE_NAME,
> + .of_match_table = hi3110_of_match,
> + .pm = &hi3110_can_pm_ops,
> + },
> + .id_table = hi3110_id_table,
> + .probe = hi3110_can_probe,
> + .remove = hi3110_can_remove,
> +};
> +
> +module_spi_driver(hi3110_can_driver);
> +
> +MODULE_AUTHOR("Akshay Bhat <[email protected]>");
> +MODULE_AUTHOR("Casey Fitzpatrick <[email protected]>");
> +MODULE_DESCRIPTION("Holt HI-3110 CAN driver");
> +MODULE_LICENSE("GPL v2");
A few other things to check:
Run "cangen" and monitor the message with "candump -e any,0:0,#FFFFFFF".
Then 1) disconnect the cable or 2) short-circuit CAN low and high at the
connector. You should see error messages. After reconnection or removing
the short-circuit (and bus-off recovery) the state should go back to
"active".
Run "canfdtest" to check for out-of-order messages. I do not expect
problems with your driver, though.
Thanks,
Wolfgang.
Hi Wolfgang,
On 03/09/2017 12:36 PM, Wolfgang Grandegger wrote:
> Hello,
>
> doing a quick review... I realized a few issues...
>
> Am 17.01.2017 um 20:22 schrieb Akshay Bhat:
>> +static u8 hi3110_read(struct spi_device *spi, u8 command)
>> +{
>> + struct hi3110_priv *priv = spi_get_drvdata(spi);
>> + u8 val = 0;
>> +
>> + priv->spi_tx_buf[0] = command;
>> + hi3110_spi_trans(spi, 2);
>> + val = priv->spi_rx_buf[1];
>> + dev_dbg(&spi->dev, "hi3110_read: %02X, %02X\n", command, val);
>
> This produces a lot of output which is not useful for the normal user.
>
Fixed in v3 patch.
>> +static void hi3110_write(struct spi_device *spi, u8 reg, u8 val)
>> +{
>> + struct hi3110_priv *priv = spi_get_drvdata(spi);
>> +
>> + priv->spi_tx_buf[0] = reg;
>> + priv->spi_tx_buf[1] = val;
>> + dev_dbg(&spi->dev, "hi3110_write: %02X, %02X\n", reg, val);
>
> Ditto.
>
Fixed in v3 patch.
>> +
>> +static void hi3110_hw_rx(struct spi_device *spi)
>> +{
>> + struct hi3110_priv *priv = spi_get_drvdata(spi);
>> + struct sk_buff *skb;
>> + struct can_frame *frame;
>> + u8 buf[HI3110_RX_BUF_LEN - 1];
>> +
>> + skb = alloc_can_skb(priv->net, &frame);
>> + if (!skb) {
>> + dev_err(&spi->dev, "cannot allocate RX skb\n");
>
> Please return silenty! Otherwise it will make the situation worse.
>
Fixed in v3 patch.
>> +
>> + /* Data length */
>> + frame->can_dlc = get_can_dlc(buf[HI3110_FIFO_WOTIME_DLC_OFF] &
>> 0x0F);
>> + memcpy(frame->data, buf + HI3110_FIFO_WOTIME_DAT_OFF,
>> frame->can_dlc);
>
> No data bytes should not be copied for RTR messages.
>
Fixed in v3 patch.
>> +
>> + if (priv->tx_skb || priv->tx_len) {
>> + dev_warn(&spi->dev, "hard_xmit called while tx busy\n");
>
> s/warn/err/? This should never happen; IIUC.
>
Fixed in v3 patch.
>> + return NETDEV_TX_BUSY;
>> + }
>> +
>> + if (can_dropped_invalid_skb(net, skb))
>> + return NETDEV_TX_OK;
>> +
>> + netif_stop_queue(net);
>
> The driver transfers the packets in sequence. Any chance to queue them?
> At least there is a TX FIFO for 8 messages. That's bad for RT but would
> increase the throughput.
>
I initially did not use the TX FIFO for the reason you mentioned above.
Queuing should be possible but since it requires lot more additional
logic, I can work on it a later time.
>> +
>> + if (priv->can.ctrlmode & CAN_CTRLMODE_LOOPBACK) {
>> + /* Put device into loopback mode */
>> + hi3110_write(spi, HI3110_WRITE_CTRL0,
>> + HI3110_CTRL0_LOOPBACK_MODE);
>> + } else if (priv->can.ctrlmode & CAN_CTRLMODE_LISTENONLY) {
>> + /* Put device into listen-only mode */
>> + hi3110_write(spi, HI3110_WRITE_CTRL0,
>> + HI3110_CTRL0_MONITOR_MODE);
>> + } else {
>> + /* Put device into normal mode */
>> + hi3110_write(spi, HI3110_WRITE_CTRL0,
>> + HI3110_CTRL0_NORMAL_MODE);
>
> "mode = x" and just one write is more compact.
>
Fixed in v3 patch.
>> +
>> + /* Wait for the device to enter normal mode */
>> + mdelay(HI3110_OST_DELAY_MS);
>> + reg = hi3110_read(spi, HI3110_READ_CTRL0);
>> + if ((reg & HI3110_CTRL0_MODE_MASK) != HI3110_CTRL0_NORMAL_MODE)
>> + return -EBUSY;
>
> Is this not necesary for listen or loopbcak only mode?
>
It is necessary, fixed in v3 patch.
>> +
>> +static void hi3110_error_skb(struct net_device *net, int can_id,
>> + int data1, int data2)
>> +{
>> + struct sk_buff *skb;
>> + struct can_frame *frame;
>> +
>> + skb = alloc_can_err_skb(net, &frame);
>> + if (skb) {
>> + frame->can_id |= can_id;
>> + frame->data[1] = data1;
>> + frame->data[2] = data2;
>> + netif_rx_ni(skb);
>> + } else {
>> + netdev_err(net, "cannot allocate error skb\n");
>
> Please remove the error message. Not a good at low memory situations.
>
Fixed in v3 patch.
>> + /* Check for protocol errors */
>> + if (eflag & HI3110_ERR_PROTOCOL_MASK) {
>> + can_id |= CAN_ERR_PROT | CAN_ERR_BUSERROR;
>> + priv->can.can_stats.bus_error++;
>> + priv->net->stats.rx_errors++;
>> + if (eflag & HI3110_ERR_BITERR)
>> + data2 |= CAN_ERR_PROT_BIT;
>> + else if (eflag & HI3110_ERR_FRMERR)
>> + data2 |= CAN_ERR_PROT_FORM;
>> + else if (eflag & HI3110_ERR_STUFERR)
>> + data2 |= CAN_ERR_PROT_STUFF;
>> + else
>> + data2 |= CAN_ERR_PROT_UNSPEC;
>
> And what about the ACK and CRC error defines at the beginning?
> It's also comon to use netdev_dbg() on error interrupts.
>
Good catch, I missed it. Fixed in v3 patch.
>> + }
>
> Bus error reporting can flood the system with interrupts. Any chance to
> implement CAN_CTRLMODE_BERR_REPORTING. I think the bus error interrupt
> can be enabled/disabled.
>
Thanks, was not aware of this feature. Added it in v3 patch.
>> + /* Update can state statistics */
>> + switch (priv->can.state) {
>> + case CAN_STATE_ERROR_ACTIVE:
>> + if (new_state >= CAN_STATE_ERROR_WARNING &&
>> + new_state <= CAN_STATE_BUS_OFF)
>> + priv->can.can_stats.error_warning++;
>> + /* fallthrough */
>> + case CAN_STATE_ERROR_WARNING:
>> + if (new_state >= CAN_STATE_ERROR_PASSIVE &&
>> + new_state <= CAN_STATE_BUS_OFF)
>> + priv->can.can_stats.error_passive++;
>> + break;
>> + default:
>> + break;
>> + }
>> + priv->can.state = new_state;
>> +
>> + if (intf & HI3110_INT_BUSERR) {
>> + /* Note: HI3110 Does report overflow errors */
>> + hi3110_error_skb(net, can_id, data1, data2);
>> + }
>
> Usually the bus error counts are filled in the error message frame as
> well. It the counts are available, I would also be nice to have the
> "do_get_berr_counter" callback as well.
>
Added it in v3 patch.
>> +static int hi3110_open(struct net_device *net)
>> +{
>> + struct hi3110_priv *priv = netdev_priv(net);
>> + struct spi_device *spi = priv->spi;
>> + unsigned long flags = IRQF_ONESHOT | IRQF_TRIGGER_RISING;
>> + int ret;
>> +
>> + ret = open_candev(net);
>> + if (ret) {
>> + dev_err(&spi->dev, "unable to set initial baudrate!\n");
>
> open_candev() does already print an error message.
>
Fixed in v3 patch.
> A few other things to check:
>
> Run "cangen" and monitor the message with "candump -e any,0:0,#FFFFFFF".
> Then 1) disconnect the cable or 2) short-circuit CAN low and high at the
> connector. You should see error messages. After reconnection or removing
> the short-circuit (and bus-off recovery) the state should go back to
> "active".
>
With the above sequence, candump reports "ERRORFRAME" with
protocol-violation{{}{acknowledge-slot}}, bus-error. On re-connecting
the cable the can state goes back to ACTIVE and I see the messages that
were in the queue being sent.
> Run "canfdtest" to check for out-of-order messages. I do not expect
> problems with your driver, though.
>
Ran canfdtest for 100k loop (@1M can bit rate), did not see any issues.
Thanks for your valuable suggests and tests :)
Akshay.
Hallo Akshay,
Am 13.03.2017 um 16:38 schrieb Akshay Bhat:
> Hi Wolfgang,
>
> On 03/09/2017 12:36 PM, Wolfgang Grandegger wrote:
>> Hello,
>>
>> doing a quick review... I realized a few issues...
>>
>> Am 17.01.2017 um 20:22 schrieb Akshay Bhat:
... snip ...
>> A few other things to check:
>>
>> Run "cangen" and monitor the message with "candump -e any,0:0,#FFFFFFF".
>> Then 1) disconnect the cable or 2) short-circuit CAN low and high at the
>> connector. You should see error messages. After reconnection or removing
>> the short-circuit (and bus-off recovery) the state should go back to
>> "active".
>>
>
> With the above sequence, candump reports "ERRORFRAME" with
> protocol-violation{{}{acknowledge-slot}}, bus-error. On re-connecting
> the cable the can state goes back to ACTIVE and I see the messages that
> were in the queue being sent.
Do you get the ACK error also with berr-reporting off? Would be nice if
you could show a candump log here.
Also, any error message should show the bus error counts in data[7,8]:
http://lxr.free-electrons.com/source/drivers/net/can/sja1000/sja1000.c#L408
And please check bus-off as well (short-circuiting CAN low and high).
Wolfgang.
Hi Wolfgang,
On 03/14/2017 08:11 AM, Wolfgang Grandegger wrote:
> ... snip ...
>>> A few other things to check:
>>>
>>> Run "cangen" and monitor the message with "candump -e any,0:0,#FFFFFFF".
>>> Then 1) disconnect the cable or 2) short-circuit CAN low and high at the
>>> connector. You should see error messages. After reconnection or removing
>>> the short-circuit (and bus-off recovery) the state should go back to
>>> "active".
>>>
>>
>> With the above sequence, candump reports "ERRORFRAME" with
>> protocol-violation{{}{acknowledge-slot}}, bus-error. On re-connecting
>> the cable the can state goes back to ACTIVE and I see the messages that
>> were in the queue being sent.
>
> Do you get the ACK error also with berr-reporting off? Would be nice if
> you could show a candump log here.
>
Below is a log for disconnecting and re-connecting CAN cable scenario:
(Note this is on a 4.1.18 kernel with RT patch)
root@imx6qrom5420b1:~# ip link set can0 up type can bitrate 1000000
berr-reporting on
root@imx6qrom5420b1:~# candump -e any,0:0,#FFFFFFF &
[1] 768
root@imx6qrom5420b1:~# cangen can0
can0 21C [8] 35 98 C0 7A 95 03 E6 2A
can0 6E6 [1] F2
can0 5C7 [2] 42 50
can0 57C [8] 83 7A E4 0C 03 8B 90 45
can0 55C [8] B9 74 87 52 D8 F4 64 04
can0 014 [8] 28 CB 96 57 3B 80 67 4F
can0 6AF [1] 35
can0 51E [8] B6 C8 6C 1D 3A 87 ED 2E
can0 527 [8] D0 8A D3 59 0E 34 40 78
can0 30C [2] 6A 12
can0 145 [8] CB 6E FF 55 C1 BE C3 22
can0 5A5 [8] C4 49 54 68 02 63 F9 35
can0 0BA [8] DA 57 5E 3A CE 88 20 1C
can0 516 [2] 09 09
can0 743 [8] 7C 4D 25 47 61 4C 56 3D
can0 31D [2] 9C D3
can0 71E [8] 53 7C 97 2A 2A F2 9F 56
can0 52E [8] FE DA 2D 51 73 96 DF 79
/////disconnect cable
can0 20000088 [8] 00 00 00 19 00 00 28 00 ERRORFRAME
protocol-violation{{}{acknowledge-slot}}
bus-error
error-counter-tx-rx{{40}{0}}
can0 20000088 [8] 00 00 00 19 00 00 58 00 ERRORFRAME
protocol-violation{{}{acknowledge-slot}}
bus-error
error-counter-tx-rx{{88}{0}}
can0 20000088 [8] 00 00 00 19 00 00 80 00 ERRORFRAME
protocol-violation{{}{acknowledge-slot}}
bus-error
error-counter-tx-rx{{128}{0}}
can0 2000008C [8] 00 20 00 19 00 00 80 00 ERRORFRAME
controller-problem{tx-error-passive}
protocol-violation{{}{acknowledge-slot}}
bus-error
error-counter-tx-rx{{128}{0}}
write: No buffer space available
root@imx6qrom5420b1:~# ip -s -d link show can0
4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
mode DEFAULT group default qlen 10
link/can promiscuity 0
can <BERR-REPORTING> state ERROR-PASSIVE (berr-counter tx 128 rx 0)
restart-ms 0
bitrate 1000000 sample-point 0.750
tq 62 prop-seg 5 phase-seg1 6 phase-seg2 4 sjw 1
hi3110: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
clock 16000000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 6 0 1 1 0
RX: bytes packets errors dropped overrun mcast
0 0 6 0 0 0
TX: bytes packets errors dropped carrier collsns
106 18 0 0 0 0
root@imx6qrom5420b1:~#
/////re-connect cable
can0 169 [8] 35 55 A3 1C 0F 47 2E 5B
can0 318 [8] 11 AA 27 11 D2 1B CE 34
can0 577 [8] A0 A4 EE 50 8D A2 E1 3E
can0 4ED [8] 52 96 17 7E 31 FC 7D 7C
can0 2E7 [8] 92 48 D4 39 05 1E 9F 50
can0 200 [8] 4A 66 F6 02 1E 71 8E 26
can0 29A [8] 49 63 2E 7D C9 77 85 7A
can0 15A [7] 3C 0E 65 74 C3 62 80
can0 011 [1] D2
can0 26B [3] FC D6 68
can0 5CE [8] 6F 02 B5 14 BC 7A D7 02
root@imx6qrom5420b1:~# ip -s -d link show can0
4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
mode DEFAULT group default qlen 10
link/can promiscuity 0
can <BERR-REPORTING> state ERROR-ACTIVE (berr-counter tx 117 rx 0)
restart-ms 0
bitrate 1000000 sample-point 0.750
tq 62 prop-seg 5 phase-seg1 6 phase-seg2 4 sjw 1
hi3110: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
clock 16000000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 7 0 1 1 0
RX: bytes packets errors dropped overrun mcast
0 0 7 0 0 0
TX: bytes packets errors dropped carrier collsns
181 29 0 0 0 0
//Reboot the board and test with bus error reporting off
root@imx6qrom5420b1:~# ip link set can0 up type can bitrate 1000000
berr-reporting off
root@imx6qrom5420b1:~# candump -e any,0:0,#FFFFFFF &
[1] 782
root@imx6qrom5420b1:~# cangen can0
can0 1FA [3] C9 FE C2
can0 3E2 [5] 85 37 03 5B 6F
can0 289 [8] A4 F6 BF 4A 3F 70 65 1B
can0 12D [8] B2 72 10 33 AB B4 68 64
can0 054 [2] 01 D7
can0 4A6 [8] 29 7D 76 56 CA C1 60 00
can0 768 [8] 97 3D 92 08 61 C1 D9 03
can0 098 [6] A4 A8 5A 60 92 1A
can0 3C9 [8] 71 78 0D 25 AB 27 8B 51
/////disconnect cable
write: No buffer space available
root@imx6qrom5420b1:~# ip -s -d link show can0
4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
mode DEFAULT group default qlen 10
link/can promiscuity 0
can state ERROR-ACTIVE (berr-counter tx 128 rx 0) restart-ms 0
bitrate 1000000 sample-point 0.750
tq 62 prop-seg 5 phase-seg1 6 phase-seg2 4 sjw 1
hi3110: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
clock 16000000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 0 0 0 0 0
RX: bytes packets errors dropped overrun mcast
0 0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
56 9 0 0 0 0
root@imx6qrom5420b1:~#
/////re-connect cable
can0 20000088 [8] 00 00 00 19 00 00 7F 00 ERRORFRAME
protocol-violation{{}{acknowledge-slot}}
bus-error
error-counter-tx-rx{{127}{0}}
can0 553 [6] 1A E4 60 6B DC 07
can0 7E3 [8] 1C 78 95 6E 10 81 AA 40
can0 20C [8] BB 35 13 25 60 0A 56 57
can0 1D0 [8] 48 4A 39 64 76 E6 57 08
can0 43A [1] 40
can0 2CF [7] 03 45 5E 0F 67 33 4C
can0 1CD [8] F9 4D AB 1D 96 A5 67 0E
can0 515 [8] 41 CD F2 5F 68 92 43 16
can0 661 [8] 45 9A 73 69 45 EE 8B 42
can0 41B [1] 55
can0 52F [1] 87
root@imx6qrom5420b1:~# ip -s -d link show can0
4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
mode DEFAULT group default qlen 10
link/can promiscuity 0
can state ERROR-ACTIVE (berr-counter tx 117 rx 0) restart-ms 0
bitrate 1000000 sample-point 0.750
tq 62 prop-seg 5 phase-seg1 6 phase-seg2 4 sjw 1
hi3110: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
clock 16000000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 1 0 0 0 0
RX: bytes packets errors dropped overrun mcast
0 0 1 0 0 0
TX: bytes packets errors dropped carrier collsns
120 20 0 0 0 0
> Also, any error message should show the bus error counts in data[7,8]:
>
> http://lxr.free-electrons.com/source/drivers/net/can/sja1000/sja1000.c#L408
>
I can add this in v4 version of the patch (Above log has this patch
applied).
> And please check bus-off as well (short-circuiting CAN low and high).
>
I have not been able to check the bus-off condition by (short-circuiting
CAN low and high). The tec error count remains at 128 when I short the
CAN low and high pins and the status never goes BUSOFF.
If I short the CAN high pin to ground, then the TEC goes to 136 but
thats the highest I have been able to get the TEC to go.
The setup I have is:
HI-3111 can chip -> ADM3054 can transceiver
//CAN_H and CAN_L shorted
<snip>
can0 2000008C [8] 00 20 00 19 00 00 80 00 ERRORFRAME
controller-problem{tx-error-passive}
protocol-violation{{}{acknowledge-slot}}
bus-error
error-counter-tx-rx{{128}{0}}
root@imx6qrom5420b1:~# ip -s -d link show can0
4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
mode DEFAULT group default qlen 10
link/can promiscuity 0
can <BERR-REPORTING> state ERROR-PASSIVE (berr-counter tx 128 rx 0)
restart-ms 0
bitrate 1000000 sample-point 0.750
tq 62 prop-seg 5 phase-seg1 6 phase-seg2 4 sjw 1
hi3110: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
clock 16000000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 7 0 1 1 0
RX: bytes packets errors dropped overrun mcast
0 0 7 0 0 0
TX: bytes packets errors dropped carrier collsns
626 109 0 0 0 0
//CAN_H shorted to GND
<snip>
can0 2000008C [8] 00 20 00 19 00 00 80 00 ERRORFRAME
controller-problem{tx-error-passive}
protocol-violation{{}{acknowledge-slot}}
bus-error
error-counter-tx-rx{{128}{0}}
can0 2000008C [8] 00 20 01 00 00 00 88 00 ERRORFRAME
controller-problem{tx-error-passive}
protocol-violation{{single-bit-error}{}}
bus-error
error-counter-tx-rx{{136}{0}}
root@imx6qrom5420b1:~# ip -s -d link show can0
4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
mode DEFAULT group default qlen 10
link/can promiscuity 0
can <BERR-REPORTING> state ERROR-PASSIVE (berr-counter tx 136 rx 0)
restart-ms 0
bitrate 1000000 sample-point 0.750
tq 62 prop-seg 5 phase-seg1 6 phase-seg2 4 sjw 1
hi3110: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
clock 16000000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 9 0 1 1 0
RX: bytes packets errors dropped overrun mcast
0 0 9 0 0 0
TX: bytes packets errors dropped carrier collsns
626 109 0 0 0 0
Thanks,
Akshay
Hello Akshay,
Am 14.03.2017 um 17:20 schrieb Akshay Bhat:
>
> Hi Wolfgang,
>
> On 03/14/2017 08:11 AM, Wolfgang Grandegger wrote:
>> ... snip ...
>>>> A few other things to check:
>>>>
>>>> Run "cangen" and monitor the message with "candump -e any,0:0,#FFFFFFF".
>>>> Then 1) disconnect the cable or 2) short-circuit CAN low and high at the
>>>> connector. You should see error messages. After reconnection or removing
>>>> the short-circuit (and bus-off recovery) the state should go back to
>>>> "active".
>>>>
>>>
>>> With the above sequence, candump reports "ERRORFRAME" with
>>> protocol-violation{{}{acknowledge-slot}}, bus-error. On re-connecting
>>> the cable the can state goes back to ACTIVE and I see the messages that
>>> were in the queue being sent.
>>
>> Do you get the ACK error also with berr-reporting off? Would be nice if
>> you could show a candump log here.
>>
>
> Below is a log for disconnecting and re-connecting CAN cable scenario:
> (Note this is on a 4.1.18 kernel with RT patch)
>
> root@imx6qrom5420b1:~# ip link set can0 up type can bitrate 1000000
> berr-reporting on
> root@imx6qrom5420b1:~# candump -e any,0:0,#FFFFFFF &
Please add "-td" ...
> [1] 768
> root@imx6qrom5420b1:~# cangen can0
and "-i" here.
> can0 21C [8] 35 98 C0 7A 95 03 E6 2A
> can0 6E6 [1] F2
> can0 5C7 [2] 42 50
> can0 57C [8] 83 7A E4 0C 03 8B 90 45
> can0 55C [8] B9 74 87 52 D8 F4 64 04
> can0 014 [8] 28 CB 96 57 3B 80 67 4F
> can0 6AF [1] 35
> can0 51E [8] B6 C8 6C 1D 3A 87 ED 2E
> can0 527 [8] D0 8A D3 59 0E 34 40 78
> can0 30C [2] 6A 12
> can0 145 [8] CB 6E FF 55 C1 BE C3 22
> can0 5A5 [8] C4 49 54 68 02 63 F9 35
> can0 0BA [8] DA 57 5E 3A CE 88 20 1C
> can0 516 [2] 09 09
> can0 743 [8] 7C 4D 25 47 61 4C 56 3D
> can0 31D [2] 9C D3
> can0 71E [8] 53 7C 97 2A 2A F2 9F 56
> can0 52E [8] FE DA 2D 51 73 96 DF 79
> /////disconnect cable
> can0 20000088 [8] 00 00 00 19 00 00 28 00 ERRORFRAME
> protocol-violation{{}{acknowledge-slot}}
> bus-error
> error-counter-tx-rx{{40}{0}}
> can0 20000088 [8] 00 00 00 19 00 00 58 00 ERRORFRAME
> protocol-violation{{}{acknowledge-slot}}
> bus-error
> error-counter-tx-rx{{88}{0}}
> can0 20000088 [8] 00 00 00 19 00 00 80 00 ERRORFRAME
> protocol-violation{{}{acknowledge-slot}}
> bus-error
> error-counter-tx-rx{{128}{0}}
TX error warning is missing.
> can0 2000008C [8] 00 20 00 19 00 00 80 00 ERRORFRAME
> controller-problem{tx-error-passive}
> protocol-violation{{}{acknowledge-slot}}
> bus-error
> error-counter-tx-rx{{128}{0}}
Here "tx-error-passiv" is packed with a bus error. What I'm looking for
are state change messages similar to:
can0 20000204 [8] 00 08 00 00 00 00 60 00 ERRORFRAME
controller-problem{tx-error-warning}
state-change{tx-error-warning}
error-counter-tx-rx{{96}{0}}
can0 20000204 [8] 00 30 00 00 00 00 80 00 ERRORFRAME
controller-problem{tx-error-passive}
state-change{tx-error-passive}
error-counter-tx-rx{{128}{0}
They should always come, even with "berr-reporting off".
> write: No buffer space available
> root@imx6qrom5420b1:~# ip -s -d link show can0
> 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
> mode DEFAULT group default qlen 10
> link/can promiscuity 0
> can <BERR-REPORTING> state ERROR-PASSIVE (berr-counter tx 128 rx 0)
> restart-ms 0
> bitrate 1000000 sample-point 0.750
> tq 62 prop-seg 5 phase-seg1 6 phase-seg2 4 sjw 1
> hi3110: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
> clock 16000000
> re-started bus-errors arbit-lost error-warn error-pass bus-off
> 0 6 0 1 1 0
The error warning and passive counter increased , though. Also the bus
error should come in at a rather hight rate. Looking to the code, maybe
you need to test STATF to check for state changes (and not ERR).
> RX: bytes packets errors dropped overrun mcast
> 0 0 6 0 0 0
> TX: bytes packets errors dropped carrier collsns
> 106 18 0 0 0 0
> root@imx6qrom5420b1:~#
> /////re-connect cable
> can0 169 [8] 35 55 A3 1C 0F 47 2E 5B
> can0 318 [8] 11 AA 27 11 D2 1B CE 34
> can0 577 [8] A0 A4 EE 50 8D A2 E1 3E
> can0 4ED [8] 52 96 17 7E 31 FC 7D 7C
> can0 2E7 [8] 92 48 D4 39 05 1E 9F 50
> can0 200 [8] 4A 66 F6 02 1E 71 8E 26
> can0 29A [8] 49 63 2E 7D C9 77 85 7A
> can0 15A [7] 3C 0E 65 74 C3 62 80
> can0 011 [1] D2
> can0 26B [3] FC D6 68
> can0 5CE [8] 6F 02 B5 14 BC 7A D7 02
>
> root@imx6qrom5420b1:~# ip -s -d link show can0
> 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
> mode DEFAULT group default qlen 10
> link/can promiscuity 0
> can <BERR-REPORTING> state ERROR-ACTIVE (berr-counter tx 117 rx 0)
> restart-ms 0
> bitrate 1000000 sample-point 0.750
> tq 62 prop-seg 5 phase-seg1 6 phase-seg2 4 sjw 1
> hi3110: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
> clock 16000000
> re-started bus-errors arbit-lost error-warn error-pass bus-off
> 0 7 0 1 1 0
> RX: bytes packets errors dropped overrun mcast
> 0 0 7 0 0 0
> TX: bytes packets errors dropped carrier collsns
> 181 29 0 0 0 0
>
>
> //Reboot the board and test with bus error reporting off
>
> root@imx6qrom5420b1:~# ip link set can0 up type can bitrate 1000000
> berr-reporting off
> root@imx6qrom5420b1:~# candump -e any,0:0,#FFFFFFF &
> [1] 782
> root@imx6qrom5420b1:~# cangen can0
> can0 1FA [3] C9 FE C2
> can0 3E2 [5] 85 37 03 5B 6F
> can0 289 [8] A4 F6 BF 4A 3F 70 65 1B
> can0 12D [8] B2 72 10 33 AB B4 68 64
> can0 054 [2] 01 D7
> can0 4A6 [8] 29 7D 76 56 CA C1 60 00
> can0 768 [8] 97 3D 92 08 61 C1 D9 03
> can0 098 [6] A4 A8 5A 60 92 1A
> can0 3C9 [8] 71 78 0D 25 AB 27 8B 51
> /////disconnect cable
> write: No buffer space available
> root@imx6qrom5420b1:~# ip -s -d link show can0
> 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
> mode DEFAULT group default qlen 10
> link/can promiscuity 0
> can state ERROR-ACTIVE (berr-counter tx 128 rx 0) restart-ms 0
> bitrate 1000000 sample-point 0.750
> tq 62 prop-seg 5 phase-seg1 6 phase-seg2 4 sjw 1
> hi3110: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
> clock 16000000
> re-started bus-errors arbit-lost error-warn error-pass bus-off
> 0 0 0 0 0 0
> RX: bytes packets errors dropped overrun mcast
> 0 0 0 0 0 0
> TX: bytes packets errors dropped carrier collsns
> 56 9 0 0 0 0
> root@imx6qrom5420b1:~#
> /////re-connect cable
> can0 20000088 [8] 00 00 00 19 00 00 7F 00 ERRORFRAME
> protocol-violation{{}{acknowledge-slot}}
> bus-error
> error-counter-tx-rx{{127}{0}}
> can0 553 [6] 1A E4 60 6B DC 07
> can0 7E3 [8] 1C 78 95 6E 10 81 AA 40
> can0 20C [8] BB 35 13 25 60 0A 56 57
> can0 1D0 [8] 48 4A 39 64 76 E6 57 08
> can0 43A [1] 40
> can0 2CF [7] 03 45 5E 0F 67 33 4C
> can0 1CD [8] F9 4D AB 1D 96 A5 67 0E
> can0 515 [8] 41 CD F2 5F 68 92 43 16
> can0 661 [8] 45 9A 73 69 45 EE 8B 42
> can0 41B [1] 55
> can0 52F [1] 87
After some more messages there should be also:
can0 20000200 [8] 00 40 00 00 00 00 5F 00 ERRORFRAME
state-change{back-to-error-active}
error-counter-tx-rx{{95}{0}}
For each message sent, the error counter decreases by 8.
>
> root@imx6qrom5420b1:~# ip -s -d link show can0
> 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
> mode DEFAULT group default qlen 10
> link/can promiscuity 0
> can state ERROR-ACTIVE (berr-counter tx 117 rx 0) restart-ms 0
> bitrate 1000000 sample-point 0.750
> tq 62 prop-seg 5 phase-seg1 6 phase-seg2 4 sjw 1
> hi3110: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
> clock 16000000
> re-started bus-errors arbit-lost error-warn error-pass bus-off
> 0 1 0 0 0 0
Strange, some counters got lost.
> RX: bytes packets errors dropped overrun mcast
> 0 0 1 0 0 0
> TX: bytes packets errors dropped carrier collsns
> 120 20 0 0 0 0
>
>
>> Also, any error message should show the bus error counts in data[7,8]:
>>
>> http://lxr.free-electrons.com/source/drivers/net/can/sja1000/sja1000.c#L408
>>
>
> I can add this in v4 version of the patch (Above log has this patch
> applied).
Looks good.
>> And please check bus-off as well (short-circuiting CAN low and high).
>>
>
> I have not been able to check the bus-off condition by (short-circuiting
> CAN low and high). The tec error count remains at 128 when I short the
> CAN low and high pins and the status never goes BUSOFF.
You also need to send a message and the short-circuit should be at the
connector of the sending host. What tranceiver is used? Do you know?
Wolfgang.
Am 14.03.2017 um 19:08 schrieb Wolfgang Grandegger:
> Hello Akshay,
>
> Am 14.03.2017 um 17:20 schrieb Akshay Bhat:
>>
>> Hi Wolfgang,
>>
>> On 03/14/2017 08:11 AM, Wolfgang Grandegger wrote:
>>> ... snip ...
>>>>> A few other things to check:
>>>>>
>>>>> Run "cangen" and monitor the message with "candump -e
>>>>> any,0:0,#FFFFFFF".
>>>>> Then 1) disconnect the cable or 2) short-circuit CAN low and high
>>>>> at the
>>>>> connector. You should see error messages. After reconnection or
>>>>> removing
>>>>> the short-circuit (and bus-off recovery) the state should go back to
>>>>> "active".
>>>>>
>>>>
>>>> With the above sequence, candump reports "ERRORFRAME" with
>>>> protocol-violation{{}{acknowledge-slot}}, bus-error. On re-connecting
>>>> the cable the can state goes back to ACTIVE and I see the messages that
>>>> were in the queue being sent.
>>>
>>> Do you get the ACK error also with berr-reporting off? Would be nice if
>>> you could show a candump log here.
>>>
>>
>> Below is a log for disconnecting and re-connecting CAN cable scenario:
>> (Note this is on a 4.1.18 kernel with RT patch)
>>
>> root@imx6qrom5420b1:~# ip link set can0 up type can bitrate 1000000
>> berr-reporting on
>> root@imx6qrom5420b1:~# candump -e any,0:0,#FFFFFFF &
>
> Please add "-td" ...
>
>> [1] 768
>> root@imx6qrom5420b1:~# cangen can0
>
> and "-i" here.
>
>> can0 21C [8] 35 98 C0 7A 95 03 E6 2A
>> can0 6E6 [1] F2
>> can0 5C7 [2] 42 50
>> can0 57C [8] 83 7A E4 0C 03 8B 90 45
>> can0 55C [8] B9 74 87 52 D8 F4 64 04
>> can0 014 [8] 28 CB 96 57 3B 80 67 4F
>> can0 6AF [1] 35
>> can0 51E [8] B6 C8 6C 1D 3A 87 ED 2E
>> can0 527 [8] D0 8A D3 59 0E 34 40 78
>> can0 30C [2] 6A 12
>> can0 145 [8] CB 6E FF 55 C1 BE C3 22
>> can0 5A5 [8] C4 49 54 68 02 63 F9 35
>> can0 0BA [8] DA 57 5E 3A CE 88 20 1C
>> can0 516 [2] 09 09
>> can0 743 [8] 7C 4D 25 47 61 4C 56 3D
>> can0 31D [2] 9C D3
>> can0 71E [8] 53 7C 97 2A 2A F2 9F 56
>> can0 52E [8] FE DA 2D 51 73 96 DF 79
>> /////disconnect cable
>> can0 20000088 [8] 00 00 00 19 00 00 28 00 ERRORFRAME
>> protocol-violation{{}{acknowledge-slot}}
>> bus-error
>> error-counter-tx-rx{{40}{0}}
>> can0 20000088 [8] 00 00 00 19 00 00 58 00 ERRORFRAME
>> protocol-violation{{}{acknowledge-slot}}
>> bus-error
>> error-counter-tx-rx{{88}{0}}
>> can0 20000088 [8] 00 00 00 19 00 00 80 00 ERRORFRAME
>> protocol-violation{{}{acknowledge-slot}}
>> bus-error
>> error-counter-tx-rx{{128}{0}}
>
> TX error warning is missing.
>
>> can0 2000008C [8] 00 20 00 19 00 00 80 00 ERRORFRAME
>> controller-problem{tx-error-passive}
>> protocol-violation{{}{acknowledge-slot}}
>> bus-error
>> error-counter-tx-rx{{128}{0}}
>
> Here "tx-error-passiv" is packed with a bus error. What I'm looking for
> are state change messages similar to:
>
> can0 20000204 [8] 00 08 00 00 00 00 60 00 ERRORFRAME
> controller-problem{tx-error-warning}
> state-change{tx-error-warning}
> error-counter-tx-rx{{96}{0}}
> can0 20000204 [8] 00 30 00 00 00 00 80 00 ERRORFRAME
> controller-problem{tx-error-passive}
> state-change{tx-error-passive}
> error-counter-tx-rx{{128}{0}
>
> They should always come, even with "berr-reporting off".
>
>> write: No buffer space available
>> root@imx6qrom5420b1:~# ip -s -d link show can0
>> 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
>> mode DEFAULT group default qlen 10
>> link/can promiscuity 0
>> can <BERR-REPORTING> state ERROR-PASSIVE (berr-counter tx 128 rx 0)
>> restart-ms 0
>> bitrate 1000000 sample-point 0.750
>> tq 62 prop-seg 5 phase-seg1 6 phase-seg2 4 sjw 1
>> hi3110: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
>> clock 16000000
>> re-started bus-errors arbit-lost error-warn error-pass bus-off
>> 0 6 0 1 1 0
>
> The error warning and passive counter increased , though. Also the bus
> error should come in at a rather hight rate. Looking to the code, maybe
> you need to test STATF to check for state changes (and not ERR).
Likely the ERR bits are only valid if the BUSERR bit in INTF is set.
>> RX: bytes packets errors dropped overrun mcast
>> 0 0 6 0 0 0
>> TX: bytes packets errors dropped carrier collsns
>> 106 18 0 0 0 0
>> root@imx6qrom5420b1:~#
>> /////re-connect cable
>> can0 169 [8] 35 55 A3 1C 0F 47 2E 5B
>> can0 318 [8] 11 AA 27 11 D2 1B CE 34
>> can0 577 [8] A0 A4 EE 50 8D A2 E1 3E
>> can0 4ED [8] 52 96 17 7E 31 FC 7D 7C
>> can0 2E7 [8] 92 48 D4 39 05 1E 9F 50
>> can0 200 [8] 4A 66 F6 02 1E 71 8E 26
>> can0 29A [8] 49 63 2E 7D C9 77 85 7A
>> can0 15A [7] 3C 0E 65 74 C3 62 80
>> can0 011 [1] D2
>> can0 26B [3] FC D6 68
>> can0 5CE [8] 6F 02 B5 14 BC 7A D7 02
>>
>> root@imx6qrom5420b1:~# ip -s -d link show can0
>> 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
>> mode DEFAULT group default qlen 10
>> link/can promiscuity 0
>> can <BERR-REPORTING> state ERROR-ACTIVE (berr-counter tx 117 rx 0)
>> restart-ms 0
>> bitrate 1000000 sample-point 0.750
>> tq 62 prop-seg 5 phase-seg1 6 phase-seg2 4 sjw 1
>> hi3110: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
>> clock 16000000
>> re-started bus-errors arbit-lost error-warn error-pass bus-off
>> 0 7 0 1 1 0
>> RX: bytes packets errors dropped overrun mcast
>> 0 0 7 0 0 0
>> TX: bytes packets errors dropped carrier collsns
>> 181 29 0 0 0 0
>>
>>
>> //Reboot the board and test with bus error reporting off
>>
>> root@imx6qrom5420b1:~# ip link set can0 up type can bitrate 1000000
>> berr-reporting off
>> root@imx6qrom5420b1:~# candump -e any,0:0,#FFFFFFF &
>> [1] 782
>> root@imx6qrom5420b1:~# cangen can0
>> can0 1FA [3] C9 FE C2
>> can0 3E2 [5] 85 37 03 5B 6F
>> can0 289 [8] A4 F6 BF 4A 3F 70 65 1B
>> can0 12D [8] B2 72 10 33 AB B4 68 64
>> can0 054 [2] 01 D7
>> can0 4A6 [8] 29 7D 76 56 CA C1 60 00
>> can0 768 [8] 97 3D 92 08 61 C1 D9 03
>> can0 098 [6] A4 A8 5A 60 92 1A
>> can0 3C9 [8] 71 78 0D 25 AB 27 8B 51
>> /////disconnect cable
>> write: No buffer space available
>> root@imx6qrom5420b1:~# ip -s -d link show can0
>> 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
>> mode DEFAULT group default qlen 10
>> link/can promiscuity 0
>> can state ERROR-ACTIVE (berr-counter tx 128 rx 0) restart-ms 0
>> bitrate 1000000 sample-point 0.750
>> tq 62 prop-seg 5 phase-seg1 6 phase-seg2 4 sjw 1
>> hi3110: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
>> clock 16000000
>> re-started bus-errors arbit-lost error-warn error-pass bus-off
>> 0 0 0 0 0 0
>> RX: bytes packets errors dropped overrun mcast
>> 0 0 0 0 0 0
>> TX: bytes packets errors dropped carrier collsns
>> 56 9 0 0 0 0
>> root@imx6qrom5420b1:~#
>> /////re-connect cable
>> can0 20000088 [8] 00 00 00 19 00 00 7F 00 ERRORFRAME
>> protocol-violation{{}{acknowledge-slot}}
>> bus-error
>> error-counter-tx-rx{{127}{0}}
>> can0 553 [6] 1A E4 60 6B DC 07
>> can0 7E3 [8] 1C 78 95 6E 10 81 AA 40
>> can0 20C [8] BB 35 13 25 60 0A 56 57
>> can0 1D0 [8] 48 4A 39 64 76 E6 57 08
>> can0 43A [1] 40
>> can0 2CF [7] 03 45 5E 0F 67 33 4C
>> can0 1CD [8] F9 4D AB 1D 96 A5 67 0E
>> can0 515 [8] 41 CD F2 5F 68 92 43 16
>> can0 661 [8] 45 9A 73 69 45 EE 8B 42
>> can0 41B [1] 55
>> can0 52F [1] 87
>
> After some more messages there should be also:
>
> can0 20000200 [8] 00 40 00 00 00 00 5F 00 ERRORFRAME
> state-change{back-to-error-active}
> error-counter-tx-rx{{95}{0}}
>
> For each message sent, the error counter decreases by 8.
>
>
>>
>> root@imx6qrom5420b1:~# ip -s -d link show can0
>> 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
>> mode DEFAULT group default qlen 10
>> link/can promiscuity 0
>> can state ERROR-ACTIVE (berr-counter tx 117 rx 0) restart-ms 0
>> bitrate 1000000 sample-point 0.750
>> tq 62 prop-seg 5 phase-seg1 6 phase-seg2 4 sjw 1
>> hi3110: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
>> clock 16000000
>> re-started bus-errors arbit-lost error-warn error-pass bus-off
>> 0 1 0 0 0 0
>
> Strange, some counters got lost.
>
>> RX: bytes packets errors dropped overrun mcast
>> 0 0 1 0 0 0
>> TX: bytes packets errors dropped carrier collsns
>> 120 20 0 0 0 0
>>
>>
>>> Also, any error message should show the bus error counts in data[7,8]:
>>>
>>> http://lxr.free-electrons.com/source/drivers/net/can/sja1000/sja1000.c#L408
>>>
>>>
>>
>> I can add this in v4 version of the patch (Above log has this patch
>> applied).
>
> Looks good.
>
>>> And please check bus-off as well (short-circuiting CAN low and high).
>>>
>>
>> I have not been able to check the bus-off condition by (short-circuiting
>> CAN low and high). The tec error count remains at 128 when I short the
>> CAN low and high pins and the status never goes BUSOFF.
>
> You also need to send a message and the short-circuit should be at the
> connector of the sending host. What tranceiver is used? Do you know?
You could try to set a different bit-rate on the other CAN controller.
Then try to send or receive messages.
Wolfgang.
Hi Wolfgang,
On Tue, Mar 14, 2017 at 2:08 PM, Wolfgang Grandegger <[email protected]> wrote:
...snip....
>> /////disconnect cable
>> can0 20000088 [8] 00 00 00 19 00 00 28 00 ERRORFRAME
>> protocol-violation{{}{acknowledge-slot}}
>> bus-error
>> error-counter-tx-rx{{40}{0}}
>> can0 20000088 [8] 00 00 00 19 00 00 58 00 ERRORFRAME
>> protocol-violation{{}{acknowledge-slot}}
>> bus-error
>> error-counter-tx-rx{{88}{0}}
>> can0 20000088 [8] 00 00 00 19 00 00 80 00 ERRORFRAME
>> protocol-violation{{}{acknowledge-slot}}
>> bus-error
>> error-counter-tx-rx{{128}{0}}
>
>
> TX error warning is missing.
>
This support was missing in the driver, added in V4 patch.
>> can0 2000008C [8] 00 20 00 19 00 00 80 00 ERRORFRAME
>> controller-problem{tx-error-passive}
>> protocol-violation{{}{acknowledge-slot}}
>> bus-error
>> error-counter-tx-rx{{128}{0}}
>
>
> Here "tx-error-passiv" is packed with a bus error. What I'm looking for are
> state change messages similar to:
>
> can0 20000204 [8] 00 08 00 00 00 00 60 00 ERRORFRAME
> controller-problem{tx-error-warning}
> state-change{tx-error-warning}
> error-counter-tx-rx{{96}{0}}
> can0 20000204 [8] 00 30 00 00 00 00 80 00 ERRORFRAME
> controller-problem{tx-error-passive}
> state-change{tx-error-passive}
> error-counter-tx-rx{{128}{0}
>
> They should always come, even with "berr-reporting off".
>
HI-3110 has only 1 bus error interrupt. There is no dedicated state
change interrupts like other controllers.
So here is my plan:
- Have the bus error interrupt always enabled
- If berr-reporting off, then have the isr checks/reports state changes
- if berr-reporting on, then have the isr checks/reports bus errors
and state changes (Does it make sense packing the error message, if
the ISR finds both bus and state changes?)
>> write: No buffer space available
>> root@imx6qrom5420b1:~# ip -s -d link show can0
>> 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
>> mode DEFAULT group default qlen 10
>> link/can promiscuity 0
>> can <BERR-REPORTING> state ERROR-PASSIVE (berr-counter tx 128 rx 0)
>> restart-ms 0
>> bitrate 1000000 sample-point 0.750
>> tq 62 prop-seg 5 phase-seg1 6 phase-seg2 4 sjw 1
>> hi3110: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
>> clock 16000000
>> re-started bus-errors arbit-lost error-warn error-pass bus-off
>> 0 6 0 1 1 0
>
>
> The error warning and passive counter increased , though. Also the bus error
> should come in at a rather hight rate. Looking to the code, maybe
> you need to test STATF to check for state changes (and not ERR).
>
Apologize, just realized In the above case some error packets were
lost, because I forgot to set the CPU frequency to max. Will resend
the log.
..snip...
>
> After some more messages there should be also:
>
> can0 20000200 [8] 00 40 00 00 00 00 5F 00 ERRORFRAME
> state-change{back-to-error-active}
> error-counter-tx-rx{{95}{0}}
>
> For each message sent, the error counter decreases by 8.
>
The HI-3110 controller decrements the error counter by 1 for every message sent.
The error count increments by 8 when there is an error.
>
>>
>> root@imx6qrom5420b1:~# ip -s -d link show can0
>> 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
>> mode DEFAULT group default qlen 10
>> link/can promiscuity 0
>> can state ERROR-ACTIVE (berr-counter tx 117 rx 0) restart-ms 0
>> bitrate 1000000 sample-point 0.750
>> tq 62 prop-seg 5 phase-seg1 6 phase-seg2 4 sjw 1
>> hi3110: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
>> clock 16000000
>> re-started bus-errors arbit-lost error-warn error-pass bus-off
>> 0 1 0 0 0 0
>
>
> Strange, some counters got lost.
>
This was a bug introduced when adding berr-reporting, have fixed in v4 patch.
>>
>> I have not been able to check the bus-off condition by (short-circuiting
>> CAN low and high). The tec error count remains at 128 when I short the
>> CAN low and high pins and the status never goes BUSOFF.
>
>
> You also need to send a message and the short-circuit should be at the
> connector of the sending host. What tranceiver is used? Do you know?
>
ADM3054 transceiver is used with HI-3111. I connected the
HI-3111/ADM3054 board to kvaser leaf and ran "cangen -i can0" and
"candump -e any,0:0,#FFFFFFF" on the board. Removed the cable and
shorted the CAN_H/L pins coming out of ADM3054. I will try your
suggestion of using a different bit-rate on the Kvaser leaf instead.
I appreciate your continued feedback, it has helped significantly
improve the error handling of the driver. Looking back I should have
based it on sja1000 or flexcan driver.
Thanks,
Akshay
Hello Akshay,
Am 15.03.2017 um 05:44 schrieb Akshay Bhat:
> Hi Wolfgang,
>
> On Tue, Mar 14, 2017 at 2:08 PM, Wolfgang Grandegger <[email protected]> wrote:
> ...snip....
>>> /////disconnect cable
>>> can0 20000088 [8] 00 00 00 19 00 00 28 00 ERRORFRAME
>>> protocol-violation{{}{acknowledge-slot}}
>>> bus-error
>>> error-counter-tx-rx{{40}{0}}
>>> can0 20000088 [8] 00 00 00 19 00 00 58 00 ERRORFRAME
>>> protocol-violation{{}{acknowledge-slot}}
>>> bus-error
>>> error-counter-tx-rx{{88}{0}}
>>> can0 20000088 [8] 00 00 00 19 00 00 80 00 ERRORFRAME
>>> protocol-violation{{}{acknowledge-slot}}
>>> bus-error
>>> error-counter-tx-rx{{128}{0}}
>>
>>
>> TX error warning is missing.
>>
>
> This support was missing in the driver, added in V4 patch.
>
>>> can0 2000008C [8] 00 20 00 19 00 00 80 00 ERRORFRAME
>>> controller-problem{tx-error-passive}
>>> protocol-violation{{}{acknowledge-slot}}
>>> bus-error
>>> error-counter-tx-rx{{128}{0}}
>>
>>
>> Here "tx-error-passiv" is packed with a bus error. What I'm looking for are
>> state change messages similar to:
>>
>> can0 20000204 [8] 00 08 00 00 00 00 60 00 ERRORFRAME
>> controller-problem{tx-error-warning}
>> state-change{tx-error-warning}
>> error-counter-tx-rx{{96}{0}}
>> can0 20000204 [8] 00 30 00 00 00 00 80 00 ERRORFRAME
>> controller-problem{tx-error-passive}
>> state-change{tx-error-passive}
>> error-counter-tx-rx{{128}{0}
>>
>> They should always come, even with "berr-reporting off".
>>
>
> HI-3110 has only 1 bus error interrupt. There is no dedicated state
> change interrupts like other controllers.
To double check: Could you please read INTF, ERR and STATF at the
beginning of the ISR and print it out (using dev_dbg and fiends). Then
run a test with no cable connected and bus error reporting off.
Wolfgang.
Hello Akshay,
Am 15.03.2017 um 05:44 schrieb Akshay Bhat:
> Hi Wolfgang,
>
> On Tue, Mar 14, 2017 at 2:08 PM, Wolfgang Grandegger <[email protected]> wrote:
> ...snip....
>>> /////disconnect cable
>>> can0 20000088 [8] 00 00 00 19 00 00 28 00 ERRORFRAME
>>> protocol-violation{{}{acknowledge-slot}}
>>> bus-error
>>> error-counter-tx-rx{{40}{0}}
>>> can0 20000088 [8] 00 00 00 19 00 00 58 00 ERRORFRAME
>>> protocol-violation{{}{acknowledge-slot}}
>>> bus-error
>>> error-counter-tx-rx{{88}{0}}
>>> can0 20000088 [8] 00 00 00 19 00 00 80 00 ERRORFRAME
>>> protocol-violation{{}{acknowledge-slot}}
>>> bus-error
>>> error-counter-tx-rx{{128}{0}}
>>
>>
>> TX error warning is missing.
>>
>
> This support was missing in the driver, added in V4 patch.
>
>>> can0 2000008C [8] 00 20 00 19 00 00 80 00 ERRORFRAME
>>> controller-problem{tx-error-passive}
>>> protocol-violation{{}{acknowledge-slot}}
>>> bus-error
>>> error-counter-tx-rx{{128}{0}}
>>
>>
>> Here "tx-error-passiv" is packed with a bus error. What I'm looking for are
>> state change messages similar to:
>>
>> can0 20000204 [8] 00 08 00 00 00 00 60 00 ERRORFRAME
>> controller-problem{tx-error-warning}
>> state-change{tx-error-warning}
>> error-counter-tx-rx{{96}{0}}
>> can0 20000204 [8] 00 30 00 00 00 00 80 00 ERRORFRAME
>> controller-problem{tx-error-passive}
>> state-change{tx-error-passive}
>> error-counter-tx-rx{{128}{0}
>>
>> They should always come, even with "berr-reporting off".
>>
>
> HI-3110 has only 1 bus error interrupt. There is no dedicated state
> change interrupts like other controllers.
I have little hope! Some revision of the flexcan controller have the
same problem
>
> So here is my plan:
> - Have the bus error interrupt always enabled
> - If berr-reporting off, then have the isr checks/reports state changes
Error state change messages should always be there. These are the
important one.
> - if berr-reporting on, then have the isr checks/reports bus errors
> and state changes (Does it make sense packing the error message, if
> the ISR finds both bus and state changes?)
If berr-reporting is off, simply do not create an error message for bus
errors, and only if the state changed. If it's "on" create an additional
bus error message.
http://lxr.free-electrons.com/source/drivers/net/can/flexcan.c#L334
>>> write: No buffer space available
>>> root@imx6qrom5420b1:~# ip -s -d link show can0
>>> 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
>>> mode DEFAULT group default qlen 10
>>> link/can promiscuity 0
>>> can <BERR-REPORTING> state ERROR-PASSIVE (berr-counter tx 128 rx 0)
>>> restart-ms 0
>>> bitrate 1000000 sample-point 0.750
>>> tq 62 prop-seg 5 phase-seg1 6 phase-seg2 4 sjw 1
>>> hi3110: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
>>> clock 16000000
>>> re-started bus-errors arbit-lost error-warn error-pass bus-off
>>> 0 6 0 1 1 0
>>
>>
>> The error warning and passive counter increased , though. Also the bus error
>> should come in at a rather hight rate. Looking to the code, maybe
>> you need to test STATF to check for state changes (and not ERR).
>>
>
> Apologize, just realized In the above case some error packets were
> lost, because I forgot to set the CPU frequency to max. Will resend
> the log.
>
> ..snip...
>>
>> After some more messages there should be also:
>>
>> can0 20000200 [8] 00 40 00 00 00 00 5F 00 ERRORFRAME
>> state-change{back-to-error-active}
>> error-counter-tx-rx{{95}{0}}
>>
>> For each message sent, the error counter decreases by 8.
>>
>
> The HI-3110 controller decrements the error counter by 1 for every message sent.
> The error count increments by 8 when there is an error.
Seems OK according to:
http://electronics.stackexchange.com/questions/220596/can-error-counters-behaviour
>>
>>>
>>> root@imx6qrom5420b1:~# ip -s -d link show can0
>>> 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN
>>> mode DEFAULT group default qlen 10
>>> link/can promiscuity 0
>>> can state ERROR-ACTIVE (berr-counter tx 117 rx 0) restart-ms 0
>>> bitrate 1000000 sample-point 0.750
>>> tq 62 prop-seg 5 phase-seg1 6 phase-seg2 4 sjw 1
>>> hi3110: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
>>> clock 16000000
>>> re-started bus-errors arbit-lost error-warn error-pass bus-off
>>> 0 1 0 0 0 0
>>
>>
>> Strange, some counters got lost.
>>
>
> This was a bug introduced when adding berr-reporting, have fixed in v4 patch.
>
>>>
>>> I have not been able to check the bus-off condition by (short-circuiting
>>> CAN low and high). The tec error count remains at 128 when I short the
>>> CAN low and high pins and the status never goes BUSOFF.
>>
>>
>> You also need to send a message and the short-circuit should be at the
>> connector of the sending host. What tranceiver is used? Do you know?
>>
>
> ADM3054 transceiver is used with HI-3111. I connected the
> HI-3111/ADM3054 board to kvaser leaf and ran "cangen -i can0" and
> "candump -e any,0:0,#FFFFFFF" on the board. Removed the cable and
> shorted the CAN_H/L pins coming out of ADM3054. I will try your
> suggestion of using a different bit-rate on the Kvaser leaf instead.
>
> I appreciate your continued feedback, it has helped significantly
> improve the error handling of the driver. Looking back I should have
> based it on sja1000 or flexcan driver.
Well, the SJA1000 is the reference concerning error reporting. It's very
detailed. Most of the error cases from
http://lxr.free-electrons.com/source/include/uapi/linux/can/error.h
are SJA1000 related. Most other CAN controllers report much less. And
the Flexcan driver handles a lot of different chip revisions and uses
mail boxes. It's not a good base for the Hi-311x.
Wolfgang.
Wolfgang.
Hi Wolfgang,
On 03/15/2017 05:42 AM, Wolfgang Grandegger wrote:
> Hello Akshay,
>
..snip..
>>
>> So here is my plan:
>> - Have the bus error interrupt always enabled
>> - If berr-reporting off, then have the isr checks/reports state changes
>
> Error state change messages should always be there. These are the
> important one.
>
>> - if berr-reporting on, then have the isr checks/reports bus errors
>> and state changes (Does it make sense packing the error message, if
>> the ISR finds both bus and state changes?)
>
> If berr-reporting is off, simply do not create an error message for bus
> errors, and only if the state changed. If it's "on" create an additional
> bus error message.
>
> http://lxr.free-electrons.com/source/drivers/net/can/flexcan.c#L334
>
>
I have fixed the driver to handle the error reporting. Also thanks for
your tip for generating bus-off by setting the host device at a
different CAN bit rate! Below are logs with the updated driver. Let me
know if you have any concerns, if not I will submit the v4 patch.
berr-reporting on case:
http://pastebin.com/qDRLERmW
berr-reporting off case:
http://pastebin.com/fUn3j7qU
Thanks,
Akshay
Hello Akshay,
Am 16.03.2017 um 18:06 schrieb Akshay Bhat:
> Hi Wolfgang,
>
> On 03/15/2017 05:42 AM, Wolfgang Grandegger wrote:
>> Hello Akshay,
>>
> ..snip..
>>>
>>> So here is my plan:
>>> - Have the bus error interrupt always enabled
>>> - If berr-reporting off, then have the isr checks/reports state changes
>>
>> Error state change messages should always be there. These are the
>> important one.
>>
>>> - if berr-reporting on, then have the isr checks/reports bus errors
>>> and state changes (Does it make sense packing the error message, if
>>> the ISR finds both bus and state changes?)
>>
>> If berr-reporting is off, simply do not create an error message for bus
>> errors, and only if the state changed. If it's "on" create an additional
>> bus error message.
>>
>> http://lxr.free-electrons.com/source/drivers/net/can/flexcan.c#L334
>>
>>
>
> I have fixed the driver to handle the error reporting. Also thanks for
> your tip for generating bus-off by setting the host device at a
> different CAN bit rate! Below are logs with the updated driver. Let me
> know if you have any concerns, if not I will submit the v4 patch.
>
> berr-reporting on case:
> http://pastebin.com/qDRLERmW
Looks much better now! There are message for state changes to error
warning and passive. Just the following message is not correct:
(000.200824) can0 20000004 [8] 00 40 00 00 00 00 5F 19 ERRORFRAME
controller-problem{}
error-counter-tx-rx{{95}{25}}
Sorry, forgot to mention... the function can_change_state() [1]
should be used for that purpose, if possible. It fixes the issue
above as well.
> berr-reporting off case:
> http://pastebin.com/fUn3j7qU
Ditto.
I just had another look to the manual and there is this undocumented
STATFE register at offset 0x1E. It's mentioned in some other parts of
the doc as interrupt enable register for STATF events. I would assume
the same bit layout than STATF. If you set bit 2 (BUSOFF), 3 (ERRP)
and 4 (ERRW), you may get interrupts. It's worth a try, I think. If
it works, it's the much better solution.
Wolfgang.
[1] http://lxr.free-electrons.com/ident?i=can_change_state
Wolfgang.
Hi Wolfgang,
On 03/16/2017 04:02 PM, Wolfgang Grandegger wrote:
>
> Looks much better now! There are message for state changes to error
> warning and passive. Just the following message is not correct:
>
> (000.200824) can0 20000004 [8] 00 40 00 00 00 00 5F 19 ERRORFRAME
> controller-problem{}
> error-counter-tx-rx{{95}{25}}
>
> Sorry, forgot to mention... the function can_change_state() [1]
> should be used for that purpose, if possible. It fixes the issue
> above as well.
>
The updated driver (the one used to create the above log) is using
can_change_state() function. data[1] 40 corresponds to
CAN_ERR_CRTL_ACTIVE, so looks correct? Could it be that the can-utils I
am using is old and not reporting state change?
Tentative v4 driver for reference:
http://pastebin.com/4xFVL1Sj
>> berr-reporting off case:
>> http://pastebin.com/fUn3j7qU
>
> Ditto.
>
> I just had another look to the manual and there is this undocumented
> STATFE register at offset 0x1E. It's mentioned in some other parts of
> the doc as interrupt enable register for STATF events. I would assume
> the same bit layout than STATF. If you set bit 2 (BUSOFF), 3 (ERRP)
> and 4 (ERRW), you may get interrupts. It's worth a try, I think. If
> it works, it's the much better solution.
>
The HI-311x has a INT pin and a STAT pin. The hardware I have has only
the INT pin connected to the processor. If the STAT pin was also
connected, then like you mentioned it could be a much better solution to
use STAT for state changes.
Enabling BUSOFF/ERRP/ERRW bits in STATFE did not generate any interrupts
on the INT pin. Should we make it a requirement that both INT and STAT
pins need to be connected in hardware for the driver to do the error
reporting?
Thanks,
Akshay
> Wolfgang.
>
> [1] http://lxr.free-electrons.com/ident?i=can_change_state
>
> Wolfgang.
>
Hello Akshay,
Am 16.03.2017 um 23:29 schrieb Akshay Bhat:
> Hi Wolfgang,
>
> On 03/16/2017 04:02 PM, Wolfgang Grandegger wrote:
>>
>> Looks much better now! There are message for state changes to error
>> warning and passive. Just the following message is not correct:
>>
>> (000.200824) can0 20000004 [8] 00 40 00 00 00 00 5F 19 ERRORFRAME
>> controller-problem{}
>> error-counter-tx-rx{{95}{25}}
>>
>> Sorry, forgot to mention... the function can_change_state() [1]
>> should be used for that purpose, if possible. It fixes the issue
>> above as well.
>>
>
> The updated driver (the one used to create the above log) is using
> can_change_state() function. data[1] 40 corresponds to
> CAN_ERR_CRTL_ACTIVE, so looks correct? Could it be that the can-utils I
> am using is old and not reporting state change?
Hm, yes. The raw data looks correct. You could download and build a
recent version from "https://github.com/linux-can/can-utils" to check.
It could also be a bug.
> Tentative v4 driver for reference:
> http://pastebin.com/4xFVL1Sj
>
>>> berr-reporting off case:
>>> http://pastebin.com/fUn3j7qU
>>
>> Ditto.
>>
>> I just had another look to the manual and there is this undocumented
>> STATFE register at offset 0x1E. It's mentioned in some other parts of
>> the doc as interrupt enable register for STATF events. I would assume
>> the same bit layout than STATF. If you set bit 2 (BUSOFF), 3 (ERRP)
>> and 4 (ERRW), you may get interrupts. It's worth a try, I think. If
>> it works, it's the much better solution.
>>
>
> The HI-311x has a INT pin and a STAT pin. The hardware I have has only
> the INT pin connected to the processor. If the STAT pin was also
> connected, then like you mentioned it could be a much better solution to
> use STAT for state changes.
OK, I understand.
> Enabling BUSOFF/ERRP/ERRW bits in STATFE did not generate any interrupts
> on the INT pin. Should we make it a requirement that both INT and STAT
> pins need to be connected in hardware for the driver to do the error
> reporting?
As I said, it's the better solution, especially if interrupt flooding
does harm. How does your system behave when bus errors come in due to no
cable connected?
So far using NAPI was mandatory. There is the problem of out-of-order
message reception if handled in the isr on multi processor systems.
Marc, what is the current policy?
Wolfgang.
Hello Akshay,
Am 17.03.2017 um 08:39 schrieb Wolfgang Grandegger:
> Hello Akshay,
>
> Am 16.03.2017 um 23:29 schrieb Akshay Bhat:
>> Hi Wolfgang,
>>
>> On 03/16/2017 04:02 PM, Wolfgang Grandegger wrote:
>>>
>>> Looks much better now! There are message for state changes to error
>>> warning and passive. Just the following message is not correct:
>>>
>>> (000.200824) can0 20000004 [8] 00 40 00 00 00 00 5F 19
>>> ERRORFRAME
>>> controller-problem{}
>>> error-counter-tx-rx{{95}{25}}
>>>
>>> Sorry, forgot to mention... the function can_change_state() [1]
>>> should be used for that purpose, if possible. It fixes the issue
>>> above as well.
>>>
>>
>> The updated driver (the one used to create the above log) is using
>> can_change_state() function. data[1] 40 corresponds to
>> CAN_ERR_CRTL_ACTIVE, so looks correct? Could it be that the can-utils I
>> am using is old and not reporting state change?
>
> Hm, yes. The raw data looks correct. You could download and build a
> recent version from "https://github.com/linux-can/can-utils" to check.
> It could also be a bug.
Support for that flags has been added in December 2014.
Wolfgang-
Hi Wolfgang,
On 03/17/2017 03:39 AM, Wolfgang Grandegger wrote:
> Hello Akshay,
>
> Am 16.03.2017 um 23:29 schrieb Akshay Bhat:
>> Hi Wolfgang,
>>
>> On 03/16/2017 04:02 PM, Wolfgang Grandegger wrote:
>>>
>>> Looks much better now! There are message for state changes to error
>>> warning and passive. Just the following message is not correct:
>>>
>>> (000.200824) can0 20000004 [8] 00 40 00 00 00 00 5F 19
>>> ERRORFRAME
>>> controller-problem{}
>>> error-counter-tx-rx{{95}{25}}
>>>
>>> Sorry, forgot to mention... the function can_change_state() [1]
>>> should be used for that purpose, if possible. It fixes the issue
>>> above as well.
>>>
>>
>> The updated driver (the one used to create the above log) is using
>> can_change_state() function. data[1] 40 corresponds to
>> CAN_ERR_CRTL_ACTIVE, so looks correct? Could it be that the can-utils I
>> am using is old and not reporting state change?
>
> Hm, yes. The raw data looks correct. You could download and build a
> recent version from "https://github.com/linux-can/can-utils" to check.
> It could also be a bug.
>
Turned out to be a old version of can-utils. Using the above git tree
reports the flag.
(000.200308) can0 20000004 [8] 00 40 00 00 00 00 5F 00 ERRORFRAME
controller-problem{back-to-error-active}
error-counter-tx-rx{{95}{0}}
>> Enabling BUSOFF/ERRP/ERRW bits in STATFE did not generate any interrupts
>> on the INT pin. Should we make it a requirement that both INT and STAT
>> pins need to be connected in hardware for the driver to do the error
>> reporting?
>
> As I said, it's the better solution, especially if interrupt flooding
> does harm. How does your system behave when bus errors come in due to no
> cable connected?
>
I did not see any issues on the system with the cable disconnected. In
my particular setup with the cable disconnected the system goes to
tx-error-passive and does not get any further interrupts until a state
change occurs.
> So far using NAPI was mandatory. There is the problem of out-of-order
> message reception if handled in the isr on multi processor systems.
> Marc, what is the current policy?
>
Since this is a SPI based CAN, I am wary for any additional latencies
NAPI might introduce. The RX handling is being done at the very
beginning of the ISR for this reason.
Can we go ahead with the existing implementation and re-visit this at a
later time?
Thanks again for all your help in reviewing/improving the driver :)
Akshay
> Wolfgang.
Hi Akshay,
Am 17.03.2017 um 17:00 schrieb Akshay Bhat:
> Hi Wolfgang,
>
> On 03/17/2017 03:39 AM, Wolfgang Grandegger wrote:
>> Hello Akshay,
>>
>> Am 16.03.2017 um 23:29 schrieb Akshay Bhat:
>>> Hi Wolfgang,
>>>
>>> On 03/16/2017 04:02 PM, Wolfgang Grandegger wrote:
>>>>
>>>> Looks much better now! There are message for state changes to error
>>>> warning and passive. Just the following message is not correct:
>>>>
>>>> (000.200824) can0 20000004 [8] 00 40 00 00 00 00 5F 19
>>>> ERRORFRAME
>>>> controller-problem{}
>>>> error-counter-tx-rx{{95}{25}}
>>>>
>>>> Sorry, forgot to mention... the function can_change_state() [1]
>>>> should be used for that purpose, if possible. It fixes the issue
>>>> above as well.
>>>>
>>>
>>> The updated driver (the one used to create the above log) is using
>>> can_change_state() function. data[1] 40 corresponds to
>>> CAN_ERR_CRTL_ACTIVE, so looks correct? Could it be that the can-utils I
>>> am using is old and not reporting state change?
>>
>> Hm, yes. The raw data looks correct. You could download and build a
>> recent version from "https://github.com/linux-can/can-utils" to check.
>> It could also be a bug.
>>
>
> Turned out to be a old version of can-utils. Using the above git tree
> reports the flag.
>
> (000.200308) can0 20000004 [8] 00 40 00 00 00 00 5F 00 ERRORFRAME
> controller-problem{back-to-error-active}
> error-counter-tx-rx{{95}{0}}
>
>>> Enabling BUSOFF/ERRP/ERRW bits in STATFE did not generate any interrupts
>>> on the INT pin. Should we make it a requirement that both INT and STAT
>>> pins need to be connected in hardware for the driver to do the error
>>> reporting?
>>
>> As I said, it's the better solution, especially if interrupt flooding
>> does harm. How does your system behave when bus errors come in due to no
>> cable connected?
>>
>
> I did not see any issues on the system with the cable disconnected. In
> my particular setup with the cable disconnected the system goes to
> tx-error-passive and does not get any further interrupts until a state
> change occurs.
Hm, that's unusual. Cable disconnected and then send a message:
$ grep /proc/interrupts; sleep 10; /proc/interrupts
should make things clear. But maybe it's a clever chip and it does stop
sending error messages if the error counter does not change any more.
After bus-off, the chip is quiet, of course. Should have a closer look
to the CAN standard.
>> So far using NAPI was mandatory. There is the problem of out-of-order
>> message reception if handled in the isr on multi processor systems.
>> Marc, what is the current policy?
>>
>
> Since this is a SPI based CAN, I am wary for any additional latencies
> NAPI might introduce. The RX handling is being done at the very
> beginning of the ISR for this reason.
>
> Can we go ahead with the existing implementation and re-visit this at a
> later time?
Likely yes, as Marc has already reviewed the driver once.
BTW: what system board/processor are you using?
> Thanks again for all your help in reviewing/improving the driver :)
You are welcome!
Wolfgang.
Hi Wolfgang,
On 03/17/2017 01:04 PM, Wolfgang Grandegger wrote:
>
> Hm, that's unusual. Cable disconnected and then send a message:
>
> $ grep /proc/interrupts; sleep 10; /proc/interrupts
>
> should make things clear. But maybe it's a clever chip and it does stop
> sending error messages if the error counter does not change any more.
> After bus-off, the chip is quiet, of course. Should have a closer look
> to the CAN standard.
>
The interrupt count does not increment after device reaches
tx-error-passive (with cable disconnected).
# while true; do grep -i hi3110 /proc/interrupts; sleep 10; done &
[1] 793
#
111: 0 0 gpio-mxc 12 Edge hi3110
# candump -t d -e any,0:0,#FFFFFFF &
[2] 798
# cansend can0 123#
#
(000.000000) can0 20000004 [8] 00 08 00 00 00 00 60 00 ERRORFRAME
controller-problem{tx-error-warning}
error-counter-tx-rx{{96}{0}}
(000.002122) can0 20000004 [8] 00 20 00 00 00 00 80 00 ERRORFRAME
controller-problem{tx-error-passive}
error-counter-tx-rx{{128}{0}}
111: 10 0 gpio-mxc 12 Edge hi3110
111: 10 0 gpio-mxc 12 Edge hi3110
111: 10 0 gpio-mxc 12 Edge hi3110
>>> So far using NAPI was mandatory. There is the problem of out-of-order
>>> message reception if handled in the isr on multi processor systems.
>>> Marc, what is the current policy?
>>>
>>
>> Since this is a SPI based CAN, I am wary for any additional latencies
>> NAPI might introduce. The RX handling is being done at the very
>> beginning of the ISR for this reason.
>>
>> Can we go ahead with the existing implementation and re-visit this at a
>> later time?
>
> Likely yes, as Marc has already reviewed the driver once.
>
Thanks, I will go ahead and submit v4 patch.
> BTW: what system board/processor are you using?
>
It is a custom board using Phytec phyFLEX-i.MX6 Dual SOM.
Hello Akshay,
Am 17.03.2017 um 19:28 schrieb Akshay Bhat:
> Hi Wolfgang,
>
> On 03/17/2017 01:04 PM, Wolfgang Grandegger wrote:
>>
>> Hm, that's unusual. Cable disconnected and then send a message:
>>
>> $ grep /proc/interrupts; sleep 10; /proc/interrupts
>>
>> should make things clear. But maybe it's a clever chip and it does stop
>> sending error messages if the error counter does not change any more.
>> After bus-off, the chip is quiet, of course. Should have a closer look
>> to the CAN standard.
>>
>
> The interrupt count does not increment after device reaches
> tx-error-passive (with cable disconnected).
>
> # while true; do grep -i hi3110 /proc/interrupts; sleep 10; done &
> [1] 793
> #
> 111: 0 0 gpio-mxc 12 Edge hi3110
> # candump -t d -e any,0:0,#FFFFFFF &
> [2] 798
> # cansend can0 123#
> #
> (000.000000) can0 20000004 [8] 00 08 00 00 00 00 60 00 ERRORFRAME
> controller-problem{tx-error-warning}
> error-counter-tx-rx{{96}{0}}
> (000.002122) can0 20000004 [8] 00 20 00 00 00 00 80 00 ERRORFRAME
> controller-problem{tx-error-passive}
> error-counter-tx-rx{{128}{0}}
> 111: 10 0 gpio-mxc 12 Edge hi3110
> 111: 10 0 gpio-mxc 12 Edge hi3110
> 111: 10 0 gpio-mxc 12 Edge hi3110
OK, then there is no good reason connecting the STAT interrupt pin.
Wolfgang.