Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752875AbYKDMMU (ORCPT ); Tue, 4 Nov 2008 07:12:20 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751392AbYKDMMA (ORCPT ); Tue, 4 Nov 2008 07:12:00 -0500 Received: from moutng.kundenserver.de ([212.227.126.188]:49430 "EHLO moutng.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751281AbYKDML6 (ORCPT ); Tue, 4 Nov 2008 07:11:58 -0500 From: Arnd Bergmann To: linuxppc-dev@ozlabs.org, Jan-Bernd Themann Subject: Re: [PATCH RFC v2] net: add PCINet driver Date: Tue, 4 Nov 2008 13:09:25 +0100 User-Agent: KMail/1.9.9 Cc: Ira Snyder , netdev@vger.kernel.org, Stephen Hemminger , linux-kernel@vger.kernel.org References: <20081029202027.GH12879@ovro.caltech.edu> In-Reply-To: <20081029202027.GH12879@ovro.caltech.edu> X-Face: I@=L^?./?$U,EK.)V[4*>`zSqm0>65YtkOe>TFD'!aw?7OVv#~5xd\s,[~w]-J!)|%=]>=?utf-8?q?+=0A=09=7EohchhkRGW=3F=7C6=5FqTmkd=5Ft=3FLZC=23Q-=60=2E=60Y=2Ea=5E?= =?utf-8?q?3zb?=) =?utf-8?q?+U-JVN=5DWT=25cw=23=5BYo0=267C=26bL12wWGlZi=0A=09=7EJ=3B=5Cwg?= =?utf-8?q?=3B3zRnz?=,J"CT_)=\H'1/{?SR7GDu?WIopm.HaBG=QYj"NZD_[zrM\Gip^U MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200811041309.25869.arnd@arndb.de> X-Provags-ID: V01U2FsdGVkX1/DdG6iyEHJuimUamjL26UxWf8h2/c44PFkkQR nF1kyWmWYVgD/bmc5T6zg3YKD21a+a0YacAhQw1wJ1zEjAvU9c BGOfxFnRIBThWi46zmyiA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 12352 Lines: 354 On Wednesday 29 October 2008, Ira Snyder wrote: > This adds support to Linux for a virtual ethernet interface which uses the > PCI bus as its transport mechanism. It creates a simple, familiar, and fast > method of communication for two devices connected by a PCI interface. Very interesting. I have seen multiple such drivers, and still plan to have one as well for the cell/axon platform. > This is the third posting of this driver. I got some feedback, and have > corrected the problems. Stephen, thanks for the review! I also got some > off-list feedback from a potential user, so I believe this is worth getting > into mainline. Sorry for not having replied earlier, I did not see first postings. While I did not see any major problems with your code, I think we should actually do it in a different way, which I have talked about at the Linux plumbers conference, and will again at the UKUUG Linux conference this weekend. You can find my slides from the previous talk at http://userweb.kernel.org/%7Earnd/papers/plumbers08/plumbers-slides.pdf Since you have code and I only have plans, I suppose I can't make you wait for me to get a driver for your platform, but I hope we can join forces in the future and share code in the future. My fundamental criticism of your code is that it is not flexible enough. We have a number of platforms that act as a PCI, PCIe or similar kind of device and want to transfer network data. At the same time, some of these want to use the same infrastructure for non-network data like MPI (ibverbs), block device or file system (9p). Even in your limited setup, it seems that you should be able to share more code between the two implementations you have posted here. When looking at the code, I found a lot of duplication between the drivers that you should be able to avoid. My suggestion is to base this on the virtio infrastructure, and some of my colleagues at IBM already implemented a driver that uses virtio-net on a PCIe connection. Arnd <>< Now a few comments on your code: > diff --git a/arch/powerpc/boot/dts/mpc834x_mds.dts b/arch/powerpc/boot/dts/mpc834x_mds.dts > index c986c54..3bc8975 100644 > --- a/arch/powerpc/boot/dts/mpc834x_mds.dts > +++ b/arch/powerpc/boot/dts/mpc834x_mds.dts > @@ -104,6 +104,13 @@ > mode = "cpu"; > }; > > + message-unit@8030 { > + compatible = "fsl,mpc8349-mu"; > + reg = <0x8030 0xd0>; > + interrupts = <69 0x8>; > + interrupt-parent = <&ipic>; > + }; What is a message unit? Is this the mailbox you use in the driver? We are facing a similar problem on Axon regarding probing of the virtual device because you can't easily tell from the device tree whether the machine is connected to something that is trying to communicate. The message-unit does not seem to be network-specific, so it's not particularly nice to have the network unit grab that of_device. > +config PCINET_FSL > + tristate "PCINet Virtual Ethernet over PCI support (Freescale)" > + depends on MPC834x_MDS && !PCI > + select DMA_ENGINE > + select FSL_DMA > + help > + When running as a PCI Agent, this driver will create a virtual > + ethernet link running over the PCI bus, allowing simplified > + communication with the host system. The host system will need > + to use the corresponding driver. > + > + If in doubt, say N. Why 'depends on !PCI'? This means that you cannot build a kernel that is able to run both as host and endpoint for PCInet, right? > +config PCINET_HOST > + tristate "PCINet Virtual Ethernet over PCI support (Host)" > + depends on PCI > + help > + This driver will let you communicate with a PCINet client device > + using a virtual ethernet link running over the PCI bus. This > + allows simplified communication with the client system. > + > + This is inteded for use in a system that has a crate full of > + computers running Linux, all connected by a PCI backplane. > + > + If in doubt, say N. Is this meant to be portable across different hardware implementations? If the driver can only work with PCINET_FSL on the other side, you should probably mention that in the description. > +config PCINET_DISABLE_CHECKSUM > + bool "Disable packet checksumming" > + depends on PCINET_FSL || PCINET_HOST > + default n > + help > + Disable packet checksumming on packets received by the PCINet > + driver. This gives a possible speed boost. Why make this one optional? Is there ever a reason to enable checksumming? If there is, how about making it tunable with ethtool? > +struct circ_buf_desc { > + __le32 sc; > + __le32 len; > + __le32 addr; > +} __attribute__((__packed__)); It would be useful to always force aligning the desciptors to the whole 32 bit and avoid the packing here. Unaligned accesses are inefficient on many systems. > +typedef struct circ_buf_desc cbd_t; Also, don't pass structures by value if they don't fit into one or two registers. > +/* Buffer Descriptor Accessors */ > +#define CBDW_SC(_cbd, _sc) iowrite32((_sc), &(_cbd)->sc) > +#define CBDW_LEN(_cbd, _len) iowrite32((_len), &(_cbd)->len) > +#define CBDW_ADDR(_cbd, _addr) iowrite32((_addr), &(_cbd)->addr) > + > +#define CBDR_SC(_cbd) ioread32(&(_cbd)->sc) > +#define CBDR_LEN(_cbd) ioread32(&(_cbd)->len) > +#define CBDR_ADDR(_cbd) ioread32(&(_cbd)->addr) We have found that accessing remote descriptors using mmio read is rather slow, and changed the code to always do local reads and remote writes. On the host side, I would argue that you should use out_le32 rather than iowrite32, because you are not operating on a PCI device but an OF device. > +/* IMMR Accessor Helpers */ > +#define IMMR_R32(_off) ioread32(priv->immr+(_off)) > +#define IMMR_W32(_off, _val) iowrite32((_val), priv->immr+(_off)) > +#define IMMR_R32BE(_off) ioread32be(priv->immr+(_off)) > +#define IMMR_W32BE(_off, _val) iowrite32be((_val), priv->immr+(_off)) IIRC, the IMMR is a platform resource, so the system should map those registers already and provide accessor functions for the registers you need. Simply allowing to pass an offset in here does not look clean. > +static void wqtuart_rx_char(struct uart_port *port, const char ch); > +static void wqtuart_stop_tx(struct uart_port *port); You should try to avoid forward declarations for static functions. If you order the function implementation correctly, that will also give you the expected reading order in the driver. > +struct wqt_dev; > +typedef void (*wqt_irqhandler_t)(struct wqt_dev *); Much more importantly, never do forward declarations for global functions! These belong into a header that is included by both the user and the definition. > +struct wqt_dev { > + /*--------------------------------------------------------------------*/ > + /* OpenFirmware Infrastructure */ > + /*--------------------------------------------------------------------*/ > + struct of_device *op; > + struct device *dev; Why the dev? You can always get that from the of_device, right? > + int irq; > + void __iomem *immr; I don't think immr is device specific. > + struct mutex irq_mutex; > + int interrupt_count; > + > + spinlock_t irq_lock; > + struct wqt_irqhandlers handlers; > + > + /*--------------------------------------------------------------------*/ > + /* UART Device Infrastructure */ > + /*--------------------------------------------------------------------*/ > + struct uart_port port; > + bool uart_rx_enabled; > + bool uart_open; hmm, if you need a uart, that sounds like it should be a separate driver altogether. What is the relation between the network interface and the UART? > + struct workqueue_struct *wq; A per-device pointer to a workqueue_struct is unusual. What are you doing this for? How about using the system work queue? > + /*--------------------------------------------------------------------*/ > + /* Ethernet Device Infrastructure */ > + /*--------------------------------------------------------------------*/ > + struct net_device *ndev; Why make this a separate structure? If you have one of these per net_device, you should embed the net_device into your own structure. > + struct tasklet_struct tx_complete_tasklet; Using a tasklet for tx processing sounds fishy because most of the network code already runs at softirq time. You do not gain anything by another softirq context. > +/*----------------------------------------------------------------------------*/ > +/* Status Register Helper Operations */ > +/*----------------------------------------------------------------------------*/ > + > +static DEFINE_SPINLOCK(status_lock); > + > +static void wqtstatus_setbit(struct wqt_dev *priv, u32 bit) > +{ > + unsigned long flags; > + > + spin_lock_irqsave(&status_lock, flags); > + IMMR_W32(OMR1_OFFSET, IMMR_R32(OMR1_OFFSET) | bit); > + spin_unlock_irqrestore(&status_lock, flags); > +} > + > +static void wqtstatus_clrbit(struct wqt_dev *priv, u32 bit) > +{ > + unsigned long flags; > + > + spin_lock_irqsave(&status_lock, flags); > + IMMR_W32(OMR1_OFFSET, IMMR_R32(OMR1_OFFSET) & ~bit); > + spin_unlock_irqrestore(&status_lock, flags); > +} This looks misplaced, as mentioned before. > +static int wqtstatus_remote_testbit(struct wqt_dev *priv, u32 bit) > +{ > + return IMMR_R32(IMR1_OFFSET) & bit; > +} Are you sure that do not need a spinlock here? The lock also prevents reordering the device access arount the code using the mmio data. > +/*----------------------------------------------------------------------------*/ > +/* Message Sending and Processing Operations */ > +/*----------------------------------------------------------------------------*/ > + > +static irqreturn_t wqt_interrupt(int irq, void *dev_id) > +{ > + struct wqt_dev *priv = dev_id; > + u32 imisr, idr; > + unsigned long flags; > + > + imisr = IMMR_R32(IMISR_OFFSET); > + idr = IMMR_R32(IDR_OFFSET); > + > + if (!(imisr & 0x8)) > + return IRQ_NONE; > + > + /* Clear all of the interrupt sources, we'll handle them next */ > + IMMR_W32(IDR_OFFSET, idr); > + > + /* Lock over all of the handlers, so they cannot get called when > + * the code doesn't expect them to be called */ > + spin_lock_irqsave(&priv->irq_lock, flags); If this is in an interrupt handler, why disable the interrupts again? The same comment applies to many of the other places where you use spin_lock_irqsave rather than spin_lock or spin_lock_irq. > +/* NOTE: All handlers are called with priv->irq_lock held */ > + > +static void empty_handler(struct wqt_dev *priv) > +{ > + /* Intentionally left empty */ > +} > + > +static void net_start_req_handler(struct wqt_dev *priv) > +{ > + schedule_work(&priv->net_start_work); > +} > + > +static void net_start_ack_handler(struct wqt_dev *priv) > +{ > + complete(&priv->net_start_completion); > +} > + > +static void net_stop_req_handler(struct wqt_dev *priv) > +{ > + schedule_work(&priv->net_stop_work); > +} > + > +static void net_stop_ack_handler(struct wqt_dev *priv) > +{ > + complete(&priv->net_stop_completion); > +} > + > +static void net_tx_complete_handler(struct wqt_dev *priv) > +{ > + tasklet_schedule(&priv->tx_complete_tasklet); > +} > + > +static void net_rx_packet_handler(struct wqt_dev *priv) > +{ > + wqtstatus_setbit(priv, PCINET_NET_RXINT_OFF); > + netif_rx_schedule(priv->ndev, &priv->napi); > +} > + > +static void uart_rx_ready_handler(struct wqt_dev *priv) > +{ > + wqtuart_rx_char(&priv->port, IMMR_R32(IMR0_OFFSET) & 0xff); > + IMMR_W32(ODR_OFFSET, UART_TX_EMPTY_DBELL); > +} > + > +static void uart_tx_empty_handler(struct wqt_dev *priv) > +{ > + priv->uart_tx_ready = true; > + wake_up(&priv->uart_tx_wait); > +} Since these all seem to be trivial functions, why go through the indirect function pointers at all? It seems that wqt_interrupt could do this just as well. > + > +static int wqt_request_irq(struct wqt_dev *priv) > +{ > + int ret = 0; no point initializing ret. > + mutex_lock(&priv->irq_mutex); What data does the irq_mutex protect? Arnd <>< -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/