2004-01-25 15:25:38

by Marcel Sebek

[permalink] [raw]
Subject: [RFC/PATCH] IMQ port to 2.6

I have ported IMQ driver from 2.4 to 2.6.2-rc1.

Original version was from http://trash.net/~kaber/imq/.


diff -urN linux-2.6.orig/drivers/net/Kconfig linux-2.6.new/drivers/net/Kconfig
--- linux-2.6.orig/drivers/net/Kconfig 2004-01-21 19:33:36.000000000 +0100
+++ linux-2.6.new/drivers/net/Kconfig 2004-01-25 15:08:20.000000000 +0100
@@ -85,6 +85,20 @@
To compile this driver as a module, choose M here: the module
will be called eql. If unsure, say N.

+config IMQ
+ tristate "IMQ (intermediate queueing device) support"
+ depends on NETDEVICES && NETFILTER
+ ---help---
+ The imq device(s) is used as placeholder for QoS queueing disciplines.
+ Every packet entering/leaving the ip stack can be directed through
+ the imq device where it's enqueued/dequeued to the attached qdisc.
+ This allows you to treat network devices as classes and distribute
+ bandwidth among them. Iptables is used to specify through which imq
+ device, if any, packets travel.
+
+ To compile this driver as a module, choose M here: the module
+ will be called imq. If unsure, say N.
+
config TUN
tristate "Universal TUN/TAP device driver support"
depends on NETDEVICES
diff -urN linux-2.6.orig/drivers/net/Makefile linux-2.6.new/drivers/net/Makefile
--- linux-2.6.orig/drivers/net/Makefile 2004-01-21 19:33:36.000000000 +0100
+++ linux-2.6.new/drivers/net/Makefile 2004-01-25 15:08:20.000000000 +0100
@@ -110,6 +110,7 @@
endif

obj-$(CONFIG_DUMMY) += dummy.o
+obj-$(CONFIG_IMQ) += imq.o
obj-$(CONFIG_DE600) += de600.o
obj-$(CONFIG_DE620) += de620.o
obj-$(CONFIG_AT1500) += lance.o
diff -urN linux-2.6.orig/drivers/net/imq.c linux-2.6.new/drivers/net/imq.c
--- linux-2.6.orig/drivers/net/imq.c 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.new/drivers/net/imq.c 2004-01-25 15:08:51.000000000 +0100
@@ -0,0 +1,323 @@
+/*
+ * Pseudo-driver for the intermediate queue device.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors: Patrick McHardy, <[email protected]>
+ *
+ * The first version was written by Martin Devera, <[email protected]>
+ *
+ * Credits: Jan Rafaj <[email protected]>
+ * - Update patch to 2.4.21
+ * Sebastian Strollo <[email protected]>
+ * - Fix "Dead-loop on netdevice imq"-issue
+ * Marcel Sebek <[email protected]>
+ * - Update to 2.6.2-rc1
+ */
+
+#include <linux/config.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/moduleparam.h>
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/rtnetlink.h>
+#include <linux/if_arp.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter_ipv4.h>
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+#include <linux/netfilter_ipv6.h>
+#endif
+#include <linux/imq.h>
+#include <net/pkt_sched.h>
+
+static nf_hookfn imq_nf_hook;
+
+static struct nf_hook_ops imq_ingress_ipv4 = {
+ .hook = imq_nf_hook,
+ .owner = THIS_MODULE,
+ .pf = PF_INET,
+ .hooknum = NF_IP_PRE_ROUTING,
+ .priority = NF_IP_PRI_MANGLE + 1
+};
+
+static struct nf_hook_ops imq_egress_ipv4 = {
+ .hook = imq_nf_hook,
+ .owner = THIS_MODULE,
+ .pf = PF_INET,
+ .hooknum = NF_IP_POST_ROUTING,
+ .priority = NF_IP_PRI_LAST
+};
+
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+static struct nf_hook_ops imq_ingress_ipv6 = {
+ .hook = imq_nf_hook,
+ .owner = THIS_MODULE,
+ .pf = PF_INET6,
+ .hooknum = NF_IP6_PRE_ROUTING,
+ .priority = NF_IP6_PRI_MANGLE + 1
+};
+
+static struct nf_hook_ops imq_egress_ipv6 = {
+ .hook = imq_nf_hook,
+ .owner = THIS_MODULE,
+ .pf = PF_INET6,
+ .hooknum = NF_IP6_POST_ROUTING,
+ .priority = NF_IP6_PRI_LAST
+};
+#endif
+
+static unsigned int numdevs = 2;
+
+module_param(numdevs, int, 0);
+
+static struct net_device *imq_devs;
+
+
+static struct net_device_stats *imq_get_stats(struct net_device *dev)
+{
+ return (struct net_device_stats *)dev->priv;
+}
+
+/* called for packets kfree'd in qdiscs at places other than enqueue */
+static void imq_skb_destructor(struct sk_buff *skb)
+{
+ struct nf_info *info = skb->nf_info;
+
+ if (info) {
+ if (info->indev)
+ dev_put(info->indev);
+ if (info->outdev)
+ dev_put(info->outdev);
+ kfree(info);
+ }
+}
+
+static int imq_dev_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+ struct net_device_stats *stats = (struct net_device_stats*) dev->priv;
+
+ stats->tx_bytes += skb->len;
+ stats->tx_packets++;
+
+ skb->imq_flags = 0;
+ skb->destructor = NULL;
+
+ dev->trans_start = jiffies;
+ nf_reinject(skb, skb->nf_info, NF_ACCEPT);
+ return 0;
+}
+
+static int imq_nf_queue(struct sk_buff *skb, struct nf_info *info,
+ void *data)
+{
+ struct net_device *dev;
+ struct net_device_stats *stats;
+ struct sk_buff *skb2 = NULL;
+ struct Qdisc *q;
+ unsigned int index = skb->imq_flags&IMQ_F_IFMASK;
+ int ret = -1;
+
+ if (index > numdevs)
+ return -1;
+
+ dev = imq_devs + index;
+ if (!(dev->flags & IFF_UP)) {
+ skb->imq_flags = 0;
+ nf_reinject(skb, info, NF_ACCEPT);
+ return 0;
+ }
+ dev->last_rx = jiffies;
+
+ if (skb->destructor) {
+ skb2 = skb;
+ skb = skb_clone(skb, GFP_ATOMIC);
+ if (!skb)
+ return -1;
+ }
+ skb->nf_info = info;
+
+ stats = (struct net_device_stats *)dev->priv;
+ stats->rx_bytes+= skb->len;
+ stats->rx_packets++;
+
+ spin_lock_bh(&dev->queue_lock);
+ q = dev->qdisc;
+ if (q->enqueue) {
+ q->enqueue(skb_get(skb), q);
+ if (skb_shared(skb)) {
+ skb->destructor = imq_skb_destructor;
+ kfree_skb(skb);
+ ret = 0;
+ }
+ }
+ if (spin_is_locked(&dev->xmit_lock))
+ netif_schedule(dev);
+ else
+ qdisc_run(dev);
+ spin_unlock_bh(&dev->queue_lock);
+
+ if (skb2)
+ kfree_skb(ret ? skb : skb2);
+
+ return ret;
+}
+
+static unsigned int imq_nf_hook(unsigned int hook, struct sk_buff **pskb,
+ const struct net_device *indev,
+ const struct net_device *outdev,
+ int (*okfn)(struct sk_buff *))
+{
+ if ((*pskb)->imq_flags & IMQ_F_ENQUEUE)
+ return NF_QUEUE;
+
+ return NF_ACCEPT;
+}
+
+
+static int __init imq_init_hooks(void)
+{
+ int err;
+
+ if ((err = nf_register_queue_handler(PF_INET, imq_nf_queue, NULL)))
+ goto err1;
+ if ((err = nf_register_hook(&imq_ingress_ipv4)))
+ goto err2;
+ if ((err = nf_register_hook(&imq_egress_ipv4)))
+ goto err3;
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+ if ((err = nf_register_queue_handler(PF_INET6, imq_nf_queue, NULL)))
+ goto err4;
+ if ((err = nf_register_hook(&imq_ingress_ipv6)))
+ goto err5;
+ if ((err = nf_register_hook(&imq_egress_ipv6)))
+ goto err6;
+#endif
+
+ return 0;
+
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+err6:
+ nf_unregister_hook(&imq_ingress_ipv6);
+err5:
+ nf_unregister_queue_handler(PF_INET6);
+err4:
+ nf_unregister_hook(&imq_egress_ipv4);
+#endif
+err3:
+ nf_unregister_hook(&imq_ingress_ipv4);
+err2:
+ nf_unregister_queue_handler(PF_INET);
+err1:
+ return err;
+}
+
+static void __exit imq_unhook(void)
+{
+ nf_unregister_hook(&imq_ingress_ipv4);
+ nf_unregister_hook(&imq_egress_ipv4);
+ nf_unregister_queue_handler(PF_INET);
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+ nf_unregister_hook(&imq_ingress_ipv6);
+ nf_unregister_hook(&imq_egress_ipv6);
+ nf_unregister_queue_handler(PF_INET6);
+#endif
+}
+
+static int __init imq_dev_init(struct net_device *dev)
+{
+ dev->hard_start_xmit = imq_dev_xmit;
+ dev->type = ARPHRD_VOID;
+ dev->mtu = 1500;
+ dev->tx_queue_len = 30;
+ dev->flags = IFF_NOARP;
+ dev->priv = kmalloc(sizeof(struct net_device_stats), GFP_KERNEL);
+ if (dev->priv == NULL)
+ return -ENOMEM;
+ memset(dev->priv, 0, sizeof(struct net_device_stats));
+ dev->get_stats = imq_get_stats;
+
+ return 0;
+}
+
+static void imq_dev_uninit(struct net_device *dev)
+{
+ kfree(dev->priv);
+}
+
+static int __init imq_init_devs(void)
+{
+ struct net_device *dev;
+ int i;
+
+ if (!numdevs || numdevs > IMQ_MAX_DEVS) {
+ printk(KERN_ERR "numdevs has to be betweed 1 and %u\n",
+ IMQ_MAX_DEVS);
+ return -EINVAL;
+ }
+
+ imq_devs = kmalloc(sizeof(struct net_device) * numdevs, GFP_KERNEL);
+ if (!imq_devs)
+ return -ENOMEM;
+ memset(imq_devs, 0, sizeof(struct net_device) * numdevs);
+
+ /* we start counting at zero */
+ numdevs--;
+
+ for (i = 0, dev = imq_devs; i <= numdevs; i++, dev++) {
+ SET_MODULE_OWNER(dev);
+ strcpy(dev->name, "imq%d");
+ dev->init = imq_dev_init;
+ dev->uninit = imq_dev_uninit;
+
+ if (register_netdev(dev) < 0)
+ goto err_register;
+ }
+ return 0;
+
+err_register:
+ for (; i; i--)
+ unregister_netdev(--dev);
+ kfree(imq_devs);
+ return -EIO;
+}
+
+static void imq_cleanup_devs(void)
+{
+ int i;
+ struct net_device *dev = imq_devs;
+
+ for (i = 0; i <= numdevs; i++)
+ unregister_netdev(dev++);
+
+ kfree(imq_devs);
+}
+
+static int __init imq_init_module(void)
+{
+ int err;
+
+ if ((err = imq_init_devs()))
+ return err;
+ if ((err = imq_init_hooks())) {
+ imq_cleanup_devs();
+ return err;
+ }
+
+ printk(KERN_INFO "imq driver loaded.\n");
+
+ return 0;
+}
+
+static void __exit imq_cleanup_module(void)
+{
+ imq_unhook();
+ imq_cleanup_devs();
+}
+
+module_init(imq_init_module);
+module_exit(imq_cleanup_module);
+MODULE_LICENSE("GPL");
diff -urN linux-2.6.orig/include/linux/imq.h linux-2.6.new/include/linux/imq.h
--- linux-2.6.orig/include/linux/imq.h 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.new/include/linux/imq.h 2004-01-25 15:08:20.000000000 +0100
@@ -0,0 +1,9 @@
+#ifndef _IMQ_H
+#define _IMQ_H
+
+#define IMQ_MAX_DEVS 16
+
+#define IMQ_F_IFMASK 0x7f
+#define IMQ_F_ENQUEUE 0x80
+
+#endif /* _IMQ_H */
diff -urN linux-2.6.orig/include/linux/netfilter_ipv4/ipt_IMQ.h linux-2.6.new/include/linux/netfilter_ipv4/ipt_IMQ.h
--- linux-2.6.orig/include/linux/netfilter_ipv4/ipt_IMQ.h 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.new/include/linux/netfilter_ipv4/ipt_IMQ.h 2004-01-25 15:08:20.000000000 +0100
@@ -0,0 +1,8 @@
+#ifndef _IPT_IMQ_H
+#define _IPT_IMQ_H
+
+struct ipt_imq_info {
+ unsigned int todev; /* target imq device */
+};
+
+#endif /* _IPT_IMQ_H */
diff -urN linux-2.6.orig/include/linux/netfilter_ipv6/ip6t_IMQ.h linux-2.6.new/include/linux/netfilter_ipv6/ip6t_IMQ.h
--- linux-2.6.orig/include/linux/netfilter_ipv6/ip6t_IMQ.h 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.new/include/linux/netfilter_ipv6/ip6t_IMQ.h 2004-01-25 15:08:20.000000000 +0100
@@ -0,0 +1,8 @@
+#ifndef _IP6T_IMQ_H
+#define _IP6T_IMQ_H
+
+struct ip6t_imq_info {
+ unsigned int todev; /* target imq device */
+};
+
+#endif /* _IP6T_IMQ_H */
diff -urN linux-2.6.orig/include/linux/skbuff.h linux-2.6.new/include/linux/skbuff.h
--- linux-2.6.orig/include/linux/skbuff.h 2004-01-10 14:02:40.000000000 +0100
+++ linux-2.6.new/include/linux/skbuff.h 2004-01-25 15:08:20.000000000 +0100
@@ -98,6 +98,10 @@
struct nf_conntrack *master;
};

+#if defined(CONFIG_IMQ) || defined(CONFIG_IMQ_MODULE)
+struct nf_info;
+#endif
+
#ifdef CONFIG_BRIDGE_NETFILTER
struct nf_bridge_info {
atomic_t use;
@@ -246,6 +250,10 @@
unsigned long nfmark;
__u32 nfcache;
struct nf_ct_info *nfct;
+#if defined(CONFIG_IMQ) || defined(CONFIG_IMQ_MODULE)
+ unsigned char imq_flags;
+ struct nf_info *nf_info;
+#endif
#ifdef CONFIG_NETFILTER_DEBUG
unsigned int nf_debug;
#endif
diff -urN linux-2.6.orig/net/core/skbuff.c linux-2.6.new/net/core/skbuff.c
--- linux-2.6.orig/net/core/skbuff.c 2003-11-25 16:58:45.000000000 +0100
+++ linux-2.6.new/net/core/skbuff.c 2004-01-25 15:08:20.000000000 +0100
@@ -313,6 +313,10 @@
#ifdef CONFIG_NET_SCHED
C(tc_index);
#endif
+#if defined(CONFIG_IMQ) || defined(CONFIG_IMQ_MODULE)
+ C(imq_flags);
+ C(nf_info);
+#endif
C(truesize);
atomic_set(&n->users, 1);
C(head);
@@ -357,6 +361,10 @@
new->nfcache = old->nfcache;
new->nfct = old->nfct;
nf_conntrack_get(old->nfct);
+#if defined(CONFIG_IMQ) || defined(CONFIG_IMQ_MODULE)
+ new->imq_flags = old->imq_flags;
+ new->nf_info = old->nf_info;
+#endif
#ifdef CONFIG_NETFILTER_DEBUG
new->nf_debug = old->nf_debug;
#endif
diff -urN linux-2.6.orig/net/ipv4/netfilter/Kconfig linux-2.6.new/net/ipv4/netfilter/Kconfig
--- linux-2.6.orig/net/ipv4/netfilter/Kconfig 2004-01-21 19:34:33.000000000 +0100
+++ linux-2.6.new/net/ipv4/netfilter/Kconfig 2004-01-25 15:08:20.000000000 +0100
@@ -478,6 +478,15 @@

To compile it as a module, choose M here. If unsure, say N.

+config IP_NF_TARGET_IMQ
+ tristate "IMQ target support"
+ depends on IP_NF_MANGLE
+ help
+ This option adds a `IMQ' target which is used to specify if and
+ to which imq device packets should get enqueued/dequeued.
+
+ To compile it as a module, choose M here. If unsure, say N.
+
config IP_NF_TARGET_LOG
tristate "LOG target support"
depends on IP_NF_IPTABLES
diff -urN linux-2.6.orig/net/ipv4/netfilter/Makefile linux-2.6.new/net/ipv4/netfilter/Makefile
--- linux-2.6.orig/net/ipv4/netfilter/Makefile 2003-09-10 16:09:48.000000000 +0200
+++ linux-2.6.new/net/ipv4/netfilter/Makefile 2004-01-25 15:08:21.000000000 +0100
@@ -72,6 +72,7 @@
obj-$(CONFIG_IP_NF_TARGET_ECN) += ipt_ECN.o
obj-$(CONFIG_IP_NF_TARGET_DSCP) += ipt_DSCP.o
obj-$(CONFIG_IP_NF_TARGET_MARK) += ipt_MARK.o
+obj-$(CONFIG_IP_NF_TARGET_IMQ) += ipt_IMQ.o
obj-$(CONFIG_IP_NF_TARGET_MASQUERADE) += ipt_MASQUERADE.o
obj-$(CONFIG_IP_NF_TARGET_REDIRECT) += ipt_REDIRECT.o
obj-$(CONFIG_IP_NF_TARGET_NETMAP) += ipt_NETMAP.o
diff -urN linux-2.6.orig/net/ipv4/netfilter/ipt_IMQ.c linux-2.6.new/net/ipv4/netfilter/ipt_IMQ.c
--- linux-2.6.orig/net/ipv4/netfilter/ipt_IMQ.c 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.new/net/ipv4/netfilter/ipt_IMQ.c 2004-01-25 15:08:21.000000000 +0100
@@ -0,0 +1,78 @@
+/*
+ * This target marks packets to be enqueued to an imq device
+ */
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/netfilter_ipv4/ip_tables.h>
+#include <linux/netfilter_ipv4/ipt_IMQ.h>
+#include <linux/imq.h>
+
+static unsigned int imq_target(struct sk_buff **pskb,
+ const struct net_device *in,
+ const struct net_device *out,
+ unsigned int hooknum,
+ const void *targinfo,
+ void *userdata)
+{
+ struct ipt_imq_info *mr = (struct ipt_imq_info*)targinfo;
+
+ (*pskb)->imq_flags = mr->todev | IMQ_F_ENQUEUE;
+ (*pskb)->nfcache |= NFC_ALTERED;
+
+ return IPT_CONTINUE;
+}
+
+static int imq_checkentry(const char *tablename,
+ const struct ipt_entry *e,
+ void *targinfo,
+ unsigned int targinfosize,
+ unsigned int hook_mask)
+{
+ struct ipt_imq_info *mr;
+
+ if (targinfosize != IPT_ALIGN(sizeof(struct ipt_imq_info))) {
+ printk(KERN_WARNING "IMQ: invalid targinfosize\n");
+ return 0;
+ }
+ mr = (struct ipt_imq_info*)targinfo;
+
+ if (strcmp(tablename, "mangle") != 0) {
+ printk(KERN_WARNING
+ "IMQ: IMQ can only be called from \"mangle\" table, not \"%s\"\n",
+ tablename);
+ return 0;
+ }
+
+ if (mr->todev > IMQ_MAX_DEVS) {
+ printk(KERN_WARNING
+ "IMQ: invalid device specified, highest is %u\n",
+ IMQ_MAX_DEVS);
+ return 0;
+ }
+
+ return 1;
+}
+
+static struct ipt_target ipt_imq_reg = {
+ .name = "IMQ",
+ .target = imq_target,
+ .checkentry = imq_checkentry,
+ .me = THIS_MODULE
+};
+
+static int __init init(void)
+{
+ if (ipt_register_target(&ipt_imq_reg))
+ return -EINVAL;
+
+ return 0;
+}
+
+static void __exit fini(void)
+{
+ ipt_unregister_target(&ipt_imq_reg);
+}
+
+module_init(init);
+module_exit(fini);
+MODULE_LICENSE("GPL");
diff -urN linux-2.6.orig/net/ipv6/netfilter/Kconfig linux-2.6.new/net/ipv6/netfilter/Kconfig
--- linux-2.6.orig/net/ipv6/netfilter/Kconfig 2003-09-28 10:43:59.000000000 +0200
+++ linux-2.6.new/net/ipv6/netfilter/Kconfig 2004-01-25 15:08:21.000000000 +0100
@@ -217,6 +217,15 @@

To compile it as a module, choose M here. If unsure, say N.

+config IP6_NF_TARGET_IMQ
+ tristate "IMQ target support"
+ depends on IP6_NF_MANGLE
+ help
+ This option adds a `IMQ' target which is used to specify if and
+ to which imq device packets should get enqueued/dequeued.
+
+ To compile it as a module, choose M here. If unsure, say N.
+
#dep_tristate ' LOG target support' CONFIG_IP6_NF_TARGET_LOG $CONFIG_IP6_NF_IPTABLES
endmenu

diff -urN linux-2.6.orig/net/ipv6/netfilter/Makefile linux-2.6.new/net/ipv6/netfilter/Makefile
--- linux-2.6.orig/net/ipv6/netfilter/Makefile 2003-05-05 01:53:32.000000000 +0200
+++ linux-2.6.new/net/ipv6/netfilter/Makefile 2004-01-25 15:08:21.000000000 +0100
@@ -19,6 +19,7 @@
obj-$(CONFIG_IP6_NF_FILTER) += ip6table_filter.o
obj-$(CONFIG_IP6_NF_MANGLE) += ip6table_mangle.o
obj-$(CONFIG_IP6_NF_TARGET_MARK) += ip6t_MARK.o
+obj-$(CONFIG_IP6_NF_TARGET_IMQ) += ip6t_IMQ.o
obj-$(CONFIG_IP6_NF_QUEUE) += ip6_queue.o
obj-$(CONFIG_IP6_NF_TARGET_LOG) += ip6t_LOG.o
obj-$(CONFIG_IP6_NF_MATCH_HL) += ip6t_hl.o
diff -urN linux-2.6.orig/net/ipv6/netfilter/ip6t_IMQ.c linux-2.6.new/net/ipv6/netfilter/ip6t_IMQ.c
--- linux-2.6.orig/net/ipv6/netfilter/ip6t_IMQ.c 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.new/net/ipv6/netfilter/ip6t_IMQ.c 2004-01-25 15:08:21.000000000 +0100
@@ -0,0 +1,78 @@
+/*
+ * This target marks packets to be enqueued to an imq device
+ */
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#include <linux/netfilter_ipv6/ip6t_IMQ.h>
+#include <linux/imq.h>
+
+static unsigned int imq_target(struct sk_buff **pskb,
+ unsigned int hooknum,
+ const struct net_device *in,
+ const struct net_device *out,
+ const void *targinfo,
+ void *userdata)
+{
+ struct ip6t_imq_info *mr = (struct ip6t_imq_info*)targinfo;
+
+ (*pskb)->imq_flags = mr->todev | IMQ_F_ENQUEUE;
+ (*pskb)->nfcache |= NFC_ALTERED;
+
+ return IP6T_CONTINUE;
+}
+
+static int imq_checkentry(const char *tablename,
+ const struct ip6t_entry *e,
+ void *targinfo,
+ unsigned int targinfosize,
+ unsigned int hook_mask)
+{
+ struct ip6t_imq_info *mr;
+
+ if (targinfosize != IP6T_ALIGN(sizeof(struct ip6t_imq_info))) {
+ printk(KERN_WARNING "IMQ: invalid targinfosize\n");
+ return 0;
+ }
+ mr = (struct ip6t_imq_info*)targinfo;
+
+ if (strcmp(tablename, "mangle") != 0) {
+ printk(KERN_WARNING
+ "IMQ: IMQ can only be called from \"mangle\" table, not \"%s\"\n",
+ tablename);
+ return 0;
+ }
+
+ if (mr->todev > IMQ_MAX_DEVS) {
+ printk(KERN_WARNING
+ "IMQ: invalid device specified, highest is %u\n",
+ IMQ_MAX_DEVS);
+ return 0;
+ }
+
+ return 1;
+}
+
+static struct ip6t_target ip6t_imq_reg = {
+ .name = "IMQ",
+ .target = imq_target,
+ .checkentry = imq_checkentry,
+ .me = THIS_MODULE
+};
+
+static int __init init(void)
+{
+ if (ip6t_register_target(&ip6t_imq_reg))
+ return -EINVAL;
+
+ return 0;
+}
+
+static void __exit fini(void)
+{
+ ip6t_unregister_target(&ip6t_imq_reg);
+}
+
+module_init(init);
+module_exit(fini);
+MODULE_LICENSE("GPL");
diff -urN linux-2.6.orig/net/sched/sch_generic.c linux-2.6.new/net/sched/sch_generic.c
--- linux-2.6.orig/net/sched/sch_generic.c 2003-11-25 16:58:47.000000000 +0100
+++ linux-2.6.new/net/sched/sch_generic.c 2004-01-25 15:08:21.000000000 +0100
@@ -30,6 +30,9 @@
#include <linux/skbuff.h>
#include <linux/rtnetlink.h>
#include <linux/init.h>
+#if defined(CONFIG_IMQ) || defined(CONFIG_IMQ_MODULE)
+#include <linux/imq.h>
+#endif
#include <net/sock.h>
#include <net/pkt_sched.h>

@@ -90,7 +93,11 @@
spin_unlock(&dev->queue_lock);

if (!netif_queue_stopped(dev)) {
- if (netdev_nit)
+ if (netdev_nit
+#if defined(CONFIG_IMQ) || defined(CONFIG_IMQ_MODULE)
+ && !(skb->imq_flags & IMQ_F_ENQUEUE)
+#endif
+ )
dev_queue_xmit_nit(skb, dev);

if (dev->hard_start_xmit(skb, dev) == 0) {

--
Marcel Sebek
jabber: [email protected] ICQ: 279852819
linux user number: 307850 GPG ID: 5F88735E
GPG FP: 0F01 BAB8 3148 94DB B95D 1FCA 8B63 CA06 5F88 735E


2004-01-25 16:44:44

by Tomas Szepe

[permalink] [raw]
Subject: Re: [RFC/PATCH] IMQ port to 2.6

On Jan-25 2004, Sun, 16:24 +0100
Marcel Sebek <[email protected]> wrote:

> I have ported IMQ driver from 2.4 to 2.6.2-rc1.
> Original version was from http://trash.net/~kaber/imq/.
> ...

It would definitely be nice to see IMQ merged at last.

--
Tomas Szepe <[email protected]>

2004-01-25 19:23:16

by jamal

[permalink] [raw]
Subject: Re: [RFC/PATCH] IMQ port to 2.6


There has been no real good reason as to why IMQ is needed to begin
with. It may be easy to use and has been highly publized (which is
always a dangerous thing in Linux).

Maybe lets take a step back and see how people use it. How and why do
you use IMQ? Is this because you couldnt use the ingress qdisc?
Note, the abstraction to begin with is in the wrong place - it sure is
an easy and nice looking hack. So is the current ingress qdisc, but we
are laying that to rest with TC extensions.

cheers,
jamal

On Sun, 2004-01-25 at 11:44, Tomas Szepe wrote:
> On Jan-25 2004, Sun, 16:24 +0100
> Marcel Sebek <[email protected]> wrote:
>
> > I have ported IMQ driver from 2.4 to 2.6.2-rc1.
> > Original version was from http://trash.net/~kaber/imq/.
> > ...
>
> It would definitely be nice to see IMQ merged at last.

2004-01-25 19:34:40

by David Miller

[permalink] [raw]
Subject: Re: [RFC/PATCH] IMQ port to 2.6

From: [email protected] (Marcel Sebek)
Date: Sun, 25 Jan 2004 16:24:19 +0100

I have ported IMQ driver from 2.4 to 2.6.2-rc1.

Original version was from http://trash.net/~kaber/imq/.

Patrick, do you mind if I merge this 2.6.x port into my tree?

2004-01-25 20:23:33

by Vladimir B. Savkin

[permalink] [raw]
Subject: Re: [RFC/PATCH] IMQ port to 2.6

On Sun, Jan 25, 2004 at 02:22:19PM -0500, jamal wrote:
>
> There has been no real good reason as to why IMQ is needed to begin
> with. It may be easy to use and has been highly publized (which is
> always a dangerous thing in Linux).
>
> Maybe lets take a step back and see how people use it. How and why do
> you use IMQ? Is this because you couldnt use the ingress qdisc?

Think multiple clients connected via PPP. I want to shape traffic,
so ingress is out of question. I want different clients in a same
htb class, so using qdisc on each ppp interface is out of
question. It seems to me that IMQ is the only way to achieve my goals.

> Note, the abstraction to begin with is in the wrong place - it sure is
> an easy and nice looking hack. So is the current ingress qdisc, but we
> are laying that to rest with TC extensions.
>
>
~
:wq
With best regards,
Vladimir Savkin.

2004-01-25 20:25:56

by Patrick McHardy

[permalink] [raw]
Subject: Re: [RFC/PATCH] IMQ port to 2.6

David S. Miller wrote:
> From: [email protected] (Marcel Sebek)
> Date: Sun, 25 Jan 2004 16:24:19 +0100
>
> I have ported IMQ driver from 2.4 to 2.6.2-rc1.
>
> Original version was from http://trash.net/~kaber/imq/.
>
> Patrick, do you mind if I merge this 2.6.x port into my tree?
>

Please don't. The imq device is buggy, it crashes when used
for ingress and egress at the same time, additionally it's
unmaintained since one or two years. The lartc list is full
of bugreports. Some users that depend on the functionality
are working on a better implementation, I'd suggest to wait
until then.

Best regards,
Patrick

2004-01-25 22:05:58

by David Miller

[permalink] [raw]
Subject: Re: [RFC/PATCH] IMQ port to 2.6

From: Patrick McHardy <[email protected]>
Date: Sun, 25 Jan 2004 21:23:18 +0100

David S. Miller wrote:
> Patrick, do you mind if I merge this 2.6.x port into my tree?

Please don't. The imq device is buggy,
...
Some users that depend on the functionality
are working on a better implementation, I'd suggest to wait
until then.

Ok.

2004-01-25 23:45:59

by jamal

[permalink] [raw]
Subject: Re: [RFC/PATCH] IMQ port to 2.6

On Sun, 2004-01-25 at 15:21, Vladimir B. Savkin wrote:
> On Sun, Jan 25, 2004 at 02:22:19PM -0500, jamal wrote:

> Think multiple clients connected via PPP. I want to shape traffic,
> so ingress is out of question. I want different clients in a same

Ok,
a) why do you want to shape on ingress instead of policing?
OR
b) Why cant you achieve the same results by marking on ingress and
shaping on egress?

> htb class, so using qdisc on each ppp interface is out of
> question. It seems to me that IMQ is the only way to achieve my goals.

By multiple clients i believe you mean you want to say "-i ppp+"?
We had a long discussion on this a while back (search netdev)
and i think it is a valid point for dynamic devices like ppp.
We need to rethink how we do things. Theres a lot of valu in having per
device tables (scalability being one).
IMO, this alone does not justify the existence of IMQ.
We should do this (and other things) right, maybe a sync with the
netfilter folks will be the right thing to do.

cheers,
jamal

2004-01-26 00:11:44

by Vladimir B. Savkin

[permalink] [raw]
Subject: Re: [RFC/PATCH] IMQ port to 2.6

On Sun, Jan 25, 2004 at 06:45:16PM -0500, jamal wrote:
> On Sun, 2004-01-25 at 15:21, Vladimir B. Savkin wrote:
> > On Sun, Jan 25, 2004 at 02:22:19PM -0500, jamal wrote:
>
> > Think multiple clients connected via PPP. I want to shape traffic,
> > so ingress is out of question. I want different clients in a same
>
> Ok,
> a) why do you want to shape on ingress instead of policing?

With typical internet traffic patterns, policing will drop many packets,
and shaping will not.

> OR
> b) Why cant you achieve the same results by marking on ingress and
> shaping on egress?

Well, as I understand it, there's no "real" ingress and "real" egress.
Look at this:
Any forwarded packet
1) comes from one interface
2) receives some treatment (filtering, routing decision, maybe
delaying if we shape, mangling etc.)
and
3) goes away via some other interface

step (1) is "ingress"
step (3) is "egress"
qdiscs work at step (2), so all of them are intermediate in this sense

Well, ok, if a qdisc receives a feedback from egress interface
on when to dequeue a packet (when interface is ready to send),
we can say that it is an egress qdisc.

But in my case, PPP connections are really PPTP or PPPoE.
Internal network bandwidth is not a premium, so all internal
interfaces are always ready to send.

So, I don't shape at ingress or at egress, I shape passing-through
traffic.

> > htb class, so using qdisc on each ppp interface is out of
> > question. It seems to me that IMQ is the only way to achieve my goals.
>
> By multiple clients i believe you mean you want to say "-i ppp+"?
> We had a long discussion on this a while back (search netdev)
> and i think it is a valid point for dynamic devices like ppp.

Well, I don't really care whether those interfaces are dynamic or
static. They could be multiple vlans, and nothing would
change in marking or shaping. I use clients' IPs for marking,
and routing table cares about interfaces.

> We need to rethink how we do things. Theres a lot of valu in having per
> device tables (scalability being one).
> IMO, this alone does not justify the existence of IMQ.

I just can't think of a better abstraction that would handle my case.

> We should do this (and other things) right, maybe a sync with the
> netfilter folks will be the right thing to do.
>

~
:wq
With best regards,
Vladimir Savkin.

2004-01-26 03:10:36

by jamal

[permalink] [raw]
Subject: Re: [RFC/PATCH] IMQ port to 2.6

On Sun, 2004-01-25 at 19:11, Vladimir B. Savkin wrote:
> On Sun, Jan 25, 2004 at 06:45:16PM -0500, jamal wrote:
[..]
>
> With typical internet traffic patterns, policing will drop many packets,
> and shaping will not.

What is typical internet traffic? I guess you mean TCP (thats what 90%
of the traffic is)
In that case, the effect of dropping or delaying on throughput is
similar. Studies i have seen indicate that throughput is directly
proportional to the square root of the drop probability
(drop is what you get when you police).
It is also influenced by the delay (which is what you introduce when you
shape). I have not seen anything in favor of shaping; i could be wrong
(so if you know of something or have experimented pass the data).
For detailed analysis at least fro RENO, this would be a good reference:
http://citeseer.nj.nec.com/padhye98modeling.html

>
> > OR
> > b) Why cant you achieve the same results by marking on ingress and
> > shaping on egress?
>
> Well, as I understand it, there's no "real" ingress and "real" egress.

There is essentially only egress.

> Look at this:
> Any forwarded packet
> 1) comes from one interface
> 2) receives some treatment (filtering, routing decision, maybe
> delaying if we shape, mangling etc.)
> and
> 3) goes away via some other interface
>
> step (1) is "ingress"

There is no ingress perse. Separation of ingress and egress is typically
a switch fabric or even a bus. So in this case, since you already
have crossed the bus you are in ingress teritory.
There is an ingress qdisc, but it is fake. The major value it adds
is to drop early when there is need to (no point in making forwarding
decision when you know you will drop the packet i.e no point in wasting
those processor cycles)- and therefore the ingress qdisc act as a
holder of filters.

> step (3) is "egress"
> qdiscs work at step (2), so all of them are intermediate in this sense
>
>
>
> Well, ok, if a qdisc receives a feedback from egress interface
> on when to dequeue a packet (when interface is ready to send),
> we can say that it is an egress qdisc.
>

Look at my explanation above.

> But in my case, PPP connections are really PPTP or PPPoE.
> Internal network bandwidth is not a premium, so all internal
> interfaces are always ready to send.
>
> So, I don't shape at ingress or at egress, I shape passing-through
> traffic.
>

The noun is not important. You crossed the bus already, you are in
processor land.
The value is being able to drop as early as possible when you need to.
If you are not dropping and desire only to delay the packets, then do it
at the proper egress device.

> > > htb class, so using qdisc on each ppp interface is out of
> > > question. It seems to me that IMQ is the only way to achieve my goals.
> >
> > By multiple clients i believe you mean you want to say "-i ppp+"?
> > We had a long discussion on this a while back (search netdev)
> > and i think it is a valid point for dynamic devices like ppp.
>
> Well, I don't really care whether those interfaces are dynamic or
> static. They could be multiple vlans, and nothing would
> change in marking or shaping. I use clients' IPs for marking,
> and routing table cares about interfaces.
>

Maybe i am misunderstanding what you are after.
couldnt you use -i ppp+ -j mark --set-mark x in the ingress/prerouting
and use the fwmark to shape on the egress?
Post your script examples.

> > We need to rethink how we do things. Theres a lot of valu in having per
> > device tables (scalability being one).
> > IMO, this alone does not justify the existence of IMQ.
>
> I just can't think of a better abstraction that would handle my case.

I think it is time we came with a single solution for how packets are
managed. Your needs should be met, the problem is we may be having too
many cooks creating the same meal.

cheers,
jamal

2004-01-26 09:32:42

by Vladimir B. Savkin

[permalink] [raw]
Subject: Re: [RFC/PATCH] IMQ port to 2.6

On Sun, Jan 25, 2004 at 10:09:48PM -0500, jamal wrote:
> On Sun, 2004-01-25 at 19:11, Vladimir B. Savkin wrote:
> > On Sun, Jan 25, 2004 at 06:45:16PM -0500, jamal wrote:
> [..]
> >
> > With typical internet traffic patterns, policing will drop many packets,
> > and shaping will not.
>
> What is typical internet traffic? I guess you mean TCP (thats what 90%
> of the traffic is)
> In that case, the effect of dropping or delaying on throughput is
> similar. Studies i have seen indicate that throughput is directly
> proportional to the square root of the drop probability
> (drop is what you get when you police).
> It is also influenced by the delay (which is what you introduce when you
> shape). I have not seen anything in favor of shaping; i could be wrong
> (so if you know of something or have experimented pass the data).

Yes, I have experimented. Shaping works much better:
much less packets dropped, much better donwload rates for clients.

> For detailed analysis at least fro RENO, this would be a good reference:
> http://citeseer.nj.nec.com/padhye98modeling.html
>
[snip]
> Maybe i am misunderstanding what you are after.
> couldnt you use -i ppp+ -j mark --set-mark x in the ingress/prerouting
> and use the fwmark to shape on the egress?
> Post your script examples.
>

I want to shape traffic that comes from upstream to clients connected
via PPTP.

Here is a part of my scripts:

DEVICE=imq0
/sbin/tc qidisc add dev $DEVICE root handle 10: htb r2q 1 default 100
/sbin/tc class add dev $DEVICE parent 10:0 classid 10:1 est 1sec 8sec htb \
rate 10Mbit burst 400k
/sbin/tc class add dev $DEVICE parent 10:1 classid 10:2 est 1sec 8sec htb \
rate 180kbps ceil 180kbps burst 3000
# default class for users
/sbin/tc class add dev $DEVICE parent 10:2 classid 10:101 est 1sec 8sec htb \
rate 20kbps burst 1k ceil 50kbps cburst 1k
/sbin/tc qdisc add dev $DEVICE parent 10:101 wrr \
dest ip 128 1 wmode1=1 wmode2=1
/sbin/tc filter add dev $DEVICE protocol ip parent 10:0 \
prio 100 handle 1 fw flowid 10:101
# more classes to follow ...


The limit 50kbps is artificial, so there's no bottleneck in
connection from upstream to this router. I cannot allocate all
the channel bandwidth to clients for some political reasons.
Then, I mark packets I want to go to this default user class with mark "1",
like this:

iptables -t mangle -A FORWARD -i $UPLINK_DEV -d $CLIENTS_NET \
-j IMQ --todev 0 # traffic from internet to clients
iptables -t mangle -A FORWARD -i $UPLINK_DEV -d $CLIENTS_NET \
-j MARK --set-mark 1 # default class
# here I can change fwmark for packets that deserve
# some special treatment

So, I shape traffic destined to clients, and I use "wrr" to
divide bandwidth fairly. I cannot attach qdisc to an egress device
because there's no single one, each client has its own ppp interface.

Well, I could move this shaping upstream, but what if upstream router was
some dumb cisco with no "wrr" qdisc?


~
:wq
With best regards,
Vladimir Savkin.

2004-01-26 13:39:13

by jamal

[permalink] [raw]
Subject: Re: [RFC/PATCH] IMQ port to 2.6

On Mon, 2004-01-26 at 04:32, Vladimir B. Savkin wrote:

> On Sun, Jan 25, 2004 at 10:09:48PM -0500, jamal wrote:
[..]
> > shape). I have not seen anything in favor of shaping; i could be wrong
> > (so if you know of something or have experimented pass the data).
>
> Yes, I have experimented. Shaping works much better:
> much less packets dropped, much better donwload rates for clients.
>

I cant say i doubt you, but your word alone is insufficient data ;->

The important point is the eventual effective throughput and fairness
amongst the flows. Whether it is induced by an increased RTT from
shaping or a single packet retransmit on some misbehaving flows because
of policing is less important. i.e it is not evil for packets to
be dropped.
When you analyse something like this you should look at the aggregate
throughput instead of a single client with better downloads (probably at
the expense of another poor client download).

> I want to shape traffic that comes from upstream to clients connected
> via PPTP.

So if i understand correctly and was to draw this:
you have clients on the left side coming in through ethx and that need
to be tunneled to some pppoe/pptp before going out ethy on the right
hand side. The right handside represents "upstream" in your terminology.
Is this correct? I hate it when people ask me for a diagram for
something that looks obvious;-> but bear with me and supply me with a
diagram if i didnt understand you.

>
> Here is a part of my scripts:
>
> DEVICE=imq0
> /sbin/tc qidisc add dev $DEVICE root handle 10: htb r2q 1 default 100
> /sbin/tc class add dev $DEVICE parent 10:0 classid 10:1 est 1sec 8sec htb \
> rate 10Mbit burst 400k
> /sbin/tc class add dev $DEVICE parent 10:1 classid 10:2 est 1sec 8sec htb \
> rate 180kbps ceil 180kbps burst 3000
> # default class for users
> /sbin/tc class add dev $DEVICE parent 10:2 classid 10:101 est 1sec 8sec htb \
> rate 20kbps burst 1k ceil 50kbps cburst 1k
> /sbin/tc qdisc add dev $DEVICE parent 10:101 wrr \
> dest ip 128 1 wmode1=1 wmode2=1
> /sbin/tc filter add dev $DEVICE protocol ip parent 10:0 \
> prio 100 handle 1 fw flowid 10:101
> # more classes to follow ...
>

So why not have the above attached to ethy? Why does it have to be done
at some other device?

>
> The limit 50kbps is artificial, so there's no bottleneck in
> connection from upstream to this router. I cannot allocate all
> the channel bandwidth to clients for some political reasons.
> Then, I mark packets I want to go to this default user class with mark "1",
> like this:
>
> iptables -t mangle -A FORWARD -i $UPLINK_DEV -d $CLIENTS_NET \
> -j IMQ --todev 0 # traffic from internet to clients
> iptables -t mangle -A FORWARD -i $UPLINK_DEV -d $CLIENTS_NET \
> -j MARK --set-mark 1 # default class


Why do you need the redirect to IMQ?
If you can selectively mark packets here (or at any other netfilter
hook) you could use the fwmark classifier to attach to different
10:x classes on the ethy interface. I feel i am missing something.

> So, I shape traffic destined to clients, and I use "wrr" to
> divide bandwidth fairly. I cannot attach qdisc to an egress device
> because there's no single one, each client has its own ppp interface.
>

I mean the ethy interface not the ppp* interfaces. Mark the packets;
use fwmark classifier.

> Well, I could move this shaping upstream, but what if upstream router was
> some dumb cisco with no "wrr" qdisc?

You dont have to.
Give me the diagram.

cheers,
jamal

2004-01-26 13:55:56

by Vladimir B. Savkin

[permalink] [raw]
Subject: Re: [RFC/PATCH] IMQ port to 2.6

On Mon, Jan 26, 2004 at 08:38:33AM -0500, jamal wrote:
> On Mon, 2004-01-26 at 04:32, Vladimir B. Savkin wrote:
>
> > On Sun, Jan 25, 2004 at 10:09:48PM -0500, jamal wrote:
> [..]
> > > shape). I have not seen anything in favor of shaping; i could be wrong
> > > (so if you know of something or have experimented pass the data).
> >
> > Yes, I have experimented. Shaping works much better:
> > much less packets dropped, much better donwload rates for clients.
> >
>
> I cant say i doubt you, but your word alone is insufficient data ;->

You can see for youself. Police users' traffic to half of the normal rate
and here them scream :) Then change policing to shaping using wrr
(or htb class for each user), and sfq on the leafs, and users are happy.

> The important point is the eventual effective throughput and fairness
> amongst the flows. Whether it is induced by an increased RTT from
> shaping or a single packet retransmit on some misbehaving flows because
> of policing is less important. i.e it is not evil for packets to
> be dropped.
> When you analyse something like this you should look at the aggregate
> throughput instead of a single client with better downloads (probably at
> the expense of another poor client download).

Well, I use wrr + sfq exactly for fairness. No such thing can be
achieved with policing only.

>
> > I want to shape traffic that comes from upstream to clients connected
> > via PPTP.
>
> So if i understand correctly and was to draw this:
> you have clients on the left side coming in through ethx and that need
> to be tunneled to some pppoe/pptp before going out ethy on the right
> hand side. The right handside represents "upstream" in your terminology.
> Is this correct? I hate it when people ask me for a diagram for
> something that looks obvious;-> but bear with me and supply me with a
> diagram if i didnt understand you.

Here it is:

+---------+ +-ppp0- ... - client0
| +-eth1-<+-ppp1- ... - client1
Internet ----- eth0-+ router | . . . . . . . .
| +-eth2-< . . . . . .
+---------+ +-pppN- ... - clientN


Traffic flows from internet to clients.
The ethX names are for example only, my setup is more complex actually,
but that complexity is not related to IMQ or traffic shaping.
Clients use PPTP or PPPoE to connect to router.
See, there's no single interface I can attach qdisc to, if I want
to put all clients into the same qdisc.

>
> >
> > Here is a part of my scripts:
> >
> > DEVICE=imq0
> > /sbin/tc qidisc add dev $DEVICE root handle 10: htb r2q 1 default 100
> > /sbin/tc class add dev $DEVICE parent 10:0 classid 10:1 est 1sec 8sec htb \
> > rate 10Mbit burst 400k
> > /sbin/tc class add dev $DEVICE parent 10:1 classid 10:2 est 1sec 8sec htb \
> > rate 180kbps ceil 180kbps burst 3000
> > # default class for users
> > /sbin/tc class add dev $DEVICE parent 10:2 classid 10:101 est 1sec 8sec htb \
> > rate 20kbps burst 1k ceil 50kbps cburst 1k
> > /sbin/tc qdisc add dev $DEVICE parent 10:101 wrr \
> > dest ip 128 1 wmode1=1 wmode2=1
> > /sbin/tc filter add dev $DEVICE protocol ip parent 10:0 \
> > prio 100 handle 1 fw flowid 10:101
> > # more classes to follow ...
> >
>
> So why not have the above attached to ethy? Why does it have to be done
> at some other device?
>
> >
> > The limit 50kbps is artificial, so there's no bottleneck in
> > connection from upstream to this router. I cannot allocate all
> > the channel bandwidth to clients for some political reasons.
> > Then, I mark packets I want to go to this default user class with mark "1",
> > like this:
> >
> > iptables -t mangle -A FORWARD -i $UPLINK_DEV -d $CLIENTS_NET \
> > -j IMQ --todev 0 # traffic from internet to clients
> > iptables -t mangle -A FORWARD -i $UPLINK_DEV -d $CLIENTS_NET \
> > -j MARK --set-mark 1 # default class
>
>
> Why do you need the redirect to IMQ?
> If you can selectively mark packets here (or at any other netfilter
> hook) you could use the fwmark classifier to attach to different
> 10:x classes on the ethy interface. I feel i am missing something.
>
> > So, I shape traffic destined to clients, and I use "wrr" to
> > divide bandwidth fairly. I cannot attach qdisc to an egress device
> > because there's no single one, each client has its own ppp interface.
> >
>
> I mean the ethy interface not the ppp* interfaces. Mark the packets;
> use fwmark classifier.
>
> > Well, I could move this shaping upstream, but what if upstream router was
> > some dumb cisco with no "wrr" qdisc?
>
> You dont have to.
> Give me the diagram.
>
> cheers,
> jamal
>
~
:wq
With best regards,
Vladimir Savkin.

2004-01-26 14:30:36

by jamal

[permalink] [raw]
Subject: Re: [RFC/PATCH] IMQ port to 2.6

On Mon, 2004-01-26 at 08:55, Vladimir B. Savkin wrote:
> On Mon, Jan 26, 2004 at 08:38:33AM -0500, jamal wrote:

> > I cant say i doubt you, but your word alone is insufficient data ;->
>
> You can see for youself. Police users' traffic to half of the normal rate
> and here them scream :) Then change policing to shaping using wrr
> (or htb class for each user), and sfq on the leafs, and users are happy.
>

;-> Sorry I dont have time. But this could be a nice paper since
i havent seen this topic covered. If you want to write one i could
help provide you an outline.


> Well, I use wrr + sfq exactly for fairness. No such thing can be
> achieved with policing only.
>

Thats what i was assuming. Shaping alone is insufficient as well.

> Here it is:
>
> +---------+ +-ppp0- ... - client0
> | +-eth1-<+-ppp1- ... - client1
> Internet ----- eth0-+ router | . . . . . . . .
> | +-eth2-< . . . . . .
> +---------+ +-pppN- ... - clientN
>
>
> Traffic flows from internet to clients.
> The ethX names are for example only, my setup is more complex actually,
> but that complexity is not related to IMQ or traffic shaping.
> Clients use PPTP or PPPoE to connect to router.
> See, there's no single interface I can attach qdisc to, if I want
> to put all clients into the same qdisc.
>

So why cant you attach a ingress qdisc on eth1-2 and use policing to
mark excess traffic (not drop)? On eth0 all you do is based on the mark
you stash them on a different class i.e move the stuff you have on
IMQ0 to eth0.

Example on ingress:

meter1=" police index 1 rate $CIR1"
meter1a=" police index 2 rate $PIR1"

index 2 is shared by all flows for default.
index 1 (and others) is guaranteeing rate (20Kbps) for each of the flows
etc.
Look for example at examples/Edge32-ca-u32

The most important thing to know is that policers can be shared across
devices, flows etc using the "index" operator.

I just noticed you are copying linux-kernel. Please take it off the list
in your response, this is a netdev issue. This should warn anyone
interested in the thread to join netdev.

cheers,
jamal

2004-01-26 15:24:58

by Tomas Szepe

[permalink] [raw]
Subject: Re: [RFC/PATCH] IMQ port to 2.6

On Jan-26 2004, Mon, 16:55 +0300
Vladimir B. Savkin <[email protected]> wrote:

> +---------+ +-ppp0- ... - client0
> | +-eth1-<+-ppp1- ... - client1
> Internet ----- eth0-+ router | . . . . . . . .
> | +-eth2-< . . . . . .
> +---------+ +-pppN- ... - clientN

Actually, this is very much like what we're using IMQ for:

+-----------+ eth1 --- \
| shaper + eth2 ---
Internet --- eth0 + in bridge + . --- ... WAN (10 C's of customer IPs)
| setup + . ---
+-----------+ ethN --- /

We're shaping single IPs and groups of IPs, applying tariff rates
on the sum of inbound and outbound flow (this last point, I'm told,
is the primary reason for our use of IMQ). The machine also does
IP accounting (through custom userland software based on libpcap)
and has to be an ethernet bridge so that it can be replaced by
a piece of wire should it fail and there was no backup hardware left.

At this moment we're on sfq/u32/htb/IMQ/mangle. We've figured out
that unless we mess with iptable_nat, IMQ-enabled kernels will work
perfectly reliably (SNAT in particular seems deadly). We don't
insist on IMQ. In fact, we would be very grateful if somebody
could point us to an alternative mechanism to IMQ that would allow
us to effectively shape by the sum of both traffic directions of
a given IP, as we'd like to deploy "shaping firewalls" that would
also do SNAT.

--
Tomas Szepe <[email protected]>