2010-11-12 03:00:06

by Bruno Randolf

[permalink] [raw]
Subject: [PATCH v7 0/3] Generic exponentially weighted moving average (EWMA)

This series adds a generic exponentially weighted moving average (EWMA) library
function and uses it from ath5k and mac80211. There are more places within
mac80211 and other wireless drivers which could benefit from this function,
which I'm going to update later and I hope this function can be useful in many
places all over the kernel.

As it has been discussed on LKML, the preferred way of merging this is thru
wireless-testing, since this is where the first callers will be
(http://marc.info/?l=linux-kernel&m=128949956117559&w=2).

Thanks,
Bruno

---

Bruno Randolf (3):
Add generic exponentially weighted moving average (EWMA) function
ath5k: Use generic EWMA library
nl80211/mac80211: Report signal average


drivers/net/wireless/ath/ath5k/Kconfig | 1 +
drivers/net/wireless/ath/ath5k/ani.c | 4 +-
drivers/net/wireless/ath/ath5k/ath5k.h | 26 +--------------
drivers/net/wireless/ath/ath5k/base.c | 4 +-
drivers/net/wireless/ath/ath5k/debug.c | 2 +
include/linux/average.h | 32 ++++++++++++++++++
include/linux/nl80211.h | 2 +
include/net/cfg80211.h | 4 ++
lib/Kconfig | 3 ++
lib/Makefile | 2 +
lib/average.c | 57 ++++++++++++++++++++++++++++++++
net/mac80211/Kconfig | 1 +
net/mac80211/cfg.c | 3 +-
net/mac80211/rx.c | 1 +
net/mac80211/sta_info.c | 2 +
net/mac80211/sta_info.h | 3 ++
net/wireless/nl80211.c | 3 ++
17 files changed, 120 insertions(+), 30 deletions(-)
create mode 100644 include/linux/average.h
create mode 100644 lib/average.c

--
Signature


2010-11-12 03:00:13

by Bruno Randolf

[permalink] [raw]
Subject: [PATCH v7 1/3] Add generic exponentially weighted moving average (EWMA) function

This adds generic functions for calculating Exponentially Weighted Moving
Averages (EWMA). This implementation makes use of a structure which keeps the
EWMA parameters and a scaled up internal representation to reduce rounding
errors.

The original idea for this implementation came from the rt2x00 driver
(rt2x00link.c). I would like to use it in several places in the mac80211 and
ath5k code and I hope it can be useful in many other places in the kernel code.

Signed-off-by: Bruno Randolf <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>

--

v7: Include bug.h. ewma_init() returns void. Use CONFIG_AVERAGE.

v6: Fixed ULONG_MAX in comment. Add reviewed by KOSAKI Motohiro.

v5: Use unsigned long instead of insigned int.

v4: Initialize internal variable to 0. Remove unneeded const qualifiers.

v3: Addressing Andrew Mortons comments: Implement in lib/average.c and make
access and initalization functions. Use unsigned int for values. Rename
functions to ewma_* since there might be other moving average
implementations which are not exponentially weighted.

v2: Renamed 'samples' to 'weight'. Added more documentation. Use avg_val
pointer. Add a WARN_ON_ONCE for invalid values of 'weight'. Divide
and round up/down.
---
include/linux/average.h | 32 ++++++++++++++++++++++++++
lib/Kconfig | 3 ++
lib/Makefile | 2 ++
lib/average.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 94 insertions(+), 0 deletions(-)
create mode 100644 include/linux/average.h
create mode 100644 lib/average.c

diff --git a/include/linux/average.h b/include/linux/average.h
new file mode 100644
index 0000000..ee3da51
--- /dev/null
+++ b/include/linux/average.h
@@ -0,0 +1,32 @@
+#ifndef _LINUX_AVERAGE_H
+#define _LINUX_AVERAGE_H
+
+#include <linux/kernel.h>
+
+/* Exponentially weighted moving average (EWMA) */
+
+/* For more documentation see lib/average.c */
+
+struct ewma {
+ unsigned long internal;
+ unsigned long factor;
+ unsigned long weight;
+};
+
+extern void ewma_init(struct ewma *avg, unsigned long factor,
+ unsigned long weight);
+
+extern struct ewma *ewma_add(struct ewma *avg, unsigned long val);
+
+/**
+ * ewma_get() - Get average value
+ * @avg: Average structure
+ *
+ * Returns the average value held in @avg.
+ */
+static inline unsigned long ewma_get(const struct ewma *avg)
+{
+ return DIV_ROUND_CLOSEST(avg->internal, avg->factor);
+}
+
+#endif /* _LINUX_AVERAGE_H */
diff --git a/lib/Kconfig b/lib/Kconfig
index fa9bf2c..3116aa6 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -210,4 +210,7 @@ config GENERIC_ATOMIC64
config LRU_CACHE
tristate

+config AVERAGE
+ bool
+
endmenu
diff --git a/lib/Makefile b/lib/Makefile
index e6a3763..76d3b85 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,6 +106,8 @@ obj-$(CONFIG_GENERIC_ATOMIC64) += atomic64.o

obj-$(CONFIG_ATOMIC64_SELFTEST) += atomic64_test.o

+obj-$(CONFIG_AVERAGE) += average.o
+
hostprogs-y := gen_crc32table
clean-files := crc32table.h

diff --git a/lib/average.c b/lib/average.c
new file mode 100644
index 0000000..f1d1b46
--- /dev/null
+++ b/lib/average.c
@@ -0,0 +1,57 @@
+/*
+ * lib/average.c
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2. See the file COPYING for more details.
+ */
+
+#include <linux/module.h>
+#include <linux/average.h>
+#include <linux/bug.h>
+
+/**
+ * DOC: Exponentially Weighted Moving Average (EWMA)
+ *
+ * These are generic functions for calculating Exponentially Weighted Moving
+ * Averages (EWMA). We keep a structure with the EWMA parameters and a scaled
+ * up internal representation of the average value to prevent rounding errors.
+ * The factor for scaling up and the exponential weight (or decay rate) have to
+ * be specified thru the init fuction. The structure should not be accessed
+ * directly but only thru the helper functions.
+ */
+
+/**
+ * ewma_init() - Initialize EWMA parameters
+ * @avg: Average structure
+ * @factor: Factor to use for the scaled up internal value. The maximum value
+ * of averages can be ULONG_MAX/(factor*weight).
+ * @weight: Exponential weight, or decay rate. This defines how fast the
+ * influence of older values decreases. Has to be bigger than 1.
+ *
+ * Initialize the EWMA parameters for a given struct ewma @avg.
+ */
+void ewma_init(struct ewma *avg, unsigned long factor, unsigned long weight)
+{
+ WARN_ON(weight <= 1 || factor == 0);
+ avg->internal = 0;
+ avg->weight = weight;
+ avg->factor = factor;
+}
+EXPORT_SYMBOL(ewma_init);
+
+/**
+ * ewma_add() - Exponentially weighted moving average (EWMA)
+ * @avg: Average structure
+ * @val: Current value
+ *
+ * Add a sample to the average.
+ */
+struct ewma *ewma_add(struct ewma *avg, unsigned long val)
+{
+ avg->internal = avg->internal ?
+ (((avg->internal * (avg->weight - 1)) +
+ (val * avg->factor)) / avg->weight) :
+ (val * avg->factor);
+ return avg;
+}
+EXPORT_SYMBOL(ewma_add);

2010-11-12 03:00:21

by Bruno Randolf

[permalink] [raw]
Subject: [PATCH v7 2/3] ath5k: Use generic EWMA library

Remove ath5k's private moving average implementation in favour of the generic
library version.

Signed-off-by: Bruno Randolf <[email protected]>
---
drivers/net/wireless/ath/ath5k/Kconfig | 1 +
drivers/net/wireless/ath/ath5k/ani.c | 4 ++--
drivers/net/wireless/ath/ath5k/ath5k.h | 26 ++------------------------
drivers/net/wireless/ath/ath5k/base.c | 4 ++--
drivers/net/wireless/ath/ath5k/debug.c | 2 +-
5 files changed, 8 insertions(+), 29 deletions(-)

diff --git a/drivers/net/wireless/ath/ath5k/Kconfig b/drivers/net/wireless/ath/ath5k/Kconfig
index eb83b7b..4784457 100644
--- a/drivers/net/wireless/ath/ath5k/Kconfig
+++ b/drivers/net/wireless/ath/ath5k/Kconfig
@@ -4,6 +4,7 @@ config ATH5K
select MAC80211_LEDS
select LEDS_CLASS
select NEW_LEDS
+ select AVERAGE
---help---
This module adds support for wireless adapters based on
Atheros 5xxx chipset.
diff --git a/drivers/net/wireless/ath/ath5k/ani.c b/drivers/net/wireless/ath/ath5k/ani.c
index f141919..545e489 100644
--- a/drivers/net/wireless/ath/ath5k/ani.c
+++ b/drivers/net/wireless/ath/ath5k/ani.c
@@ -216,7 +216,7 @@ static void
ath5k_ani_raise_immunity(struct ath5k_hw *ah, struct ath5k_ani_state *as,
bool ofdm_trigger)
{
- int rssi = ah->ah_beacon_rssi_avg.avg;
+ int rssi = ewma_get(&ah->ah_beacon_rssi_avg);

ATH5K_DBG_UNLIMIT(ah->ah_sc, ATH5K_DEBUG_ANI, "raise immunity (%s)",
ofdm_trigger ? "ODFM" : "CCK");
@@ -301,7 +301,7 @@ ath5k_ani_raise_immunity(struct ath5k_hw *ah, struct ath5k_ani_state *as,
static void
ath5k_ani_lower_immunity(struct ath5k_hw *ah, struct ath5k_ani_state *as)
{
- int rssi = ah->ah_beacon_rssi_avg.avg;
+ int rssi = ewma_get(&ah->ah_beacon_rssi_avg);

ATH5K_DBG_UNLIMIT(ah->ah_sc, ATH5K_DEBUG_ANI, "lower immunity");

diff --git a/drivers/net/wireless/ath/ath5k/ath5k.h b/drivers/net/wireless/ath/ath5k/ath5k.h
index 308b79e..2718136 100644
--- a/drivers/net/wireless/ath/ath5k/ath5k.h
+++ b/drivers/net/wireless/ath/ath5k/ath5k.h
@@ -25,6 +25,7 @@

#include <linux/io.h>
#include <linux/types.h>
+#include <linux/average.h>
#include <net/mac80211.h>

/* RX/TX descriptor hw structs
@@ -1102,7 +1103,7 @@ struct ath5k_hw {
struct ath5k_nfcal_hist ah_nfcal_hist;

/* average beacon RSSI in our BSS (used by ANI) */
- struct ath5k_avg_val ah_beacon_rssi_avg;
+ struct ewma ah_beacon_rssi_avg;

/* noise floor from last periodic calibration */
s32 ah_noise_floor;
@@ -1315,27 +1316,4 @@ static inline u32 ath5k_hw_bitswap(u32 val, unsigned int bits)
return retval;
}

-#define AVG_SAMPLES 8
-#define AVG_FACTOR 1000
-
-/**
- * ath5k_moving_average - Exponentially weighted moving average
- * @avg: average structure
- * @val: current value
- *
- * This implementation make use of a struct ath5k_avg_val to prevent rounding
- * errors.
- */
-static inline struct ath5k_avg_val
-ath5k_moving_average(const struct ath5k_avg_val avg, const int val)
-{
- struct ath5k_avg_val new;
- new.avg_weight = avg.avg_weight ?
- (((avg.avg_weight * ((AVG_SAMPLES) - 1)) +
- (val * (AVG_FACTOR))) / (AVG_SAMPLES)) :
- (val * (AVG_FACTOR));
- new.avg = new.avg_weight / (AVG_FACTOR);
- return new;
-}
-
#endif
diff --git a/drivers/net/wireless/ath/ath5k/base.c b/drivers/net/wireless/ath/ath5k/base.c
index 4d21202..ae069b7 100644
--- a/drivers/net/wireless/ath/ath5k/base.c
+++ b/drivers/net/wireless/ath/ath5k/base.c
@@ -1307,8 +1307,7 @@ ath5k_update_beacon_rssi(struct ath5k_softc *sc, struct sk_buff *skb, int rssi)
memcmp(mgmt->bssid, common->curbssid, ETH_ALEN) != 0)
return;

- ah->ah_beacon_rssi_avg = ath5k_moving_average(ah->ah_beacon_rssi_avg,
- rssi);
+ ewma_add(&ah->ah_beacon_rssi_avg, rssi);

/* in IBSS mode we should keep RSSI statistics per neighbour */
/* le16_to_cpu(mgmt->u.beacon.capab_info) & WLAN_CAPABILITY_IBSS */
@@ -2562,6 +2561,7 @@ ath5k_reset(struct ath5k_softc *sc, struct ieee80211_channel *chan)
ah->ah_cal_next_full = jiffies;
ah->ah_cal_next_ani = jiffies;
ah->ah_cal_next_nf = jiffies;
+ ewma_init(&ah->ah_beacon_rssi_avg, 1000, 8);

/*
* Change channels and update the h/w rate map if we're switching;
diff --git a/drivers/net/wireless/ath/ath5k/debug.c b/drivers/net/wireless/ath/ath5k/debug.c
index 54dcf77..043cb15 100644
--- a/drivers/net/wireless/ath/ath5k/debug.c
+++ b/drivers/net/wireless/ath/ath5k/debug.c
@@ -719,7 +719,7 @@ static ssize_t read_file_ani(struct file *file, char __user *user_buf,
st->mib_intr);
len += snprintf(buf+len, sizeof(buf)-len,
"beacon RSSI average:\t%d\n",
- sc->ah->ah_beacon_rssi_avg.avg);
+ (int)ewma_get(&sc->ah->ah_beacon_rssi_avg));

#define CC_PRINT(_struct, _field) \
_struct._field, \

2010-11-12 03:00:31

by Bruno Randolf

[permalink] [raw]
Subject: [PATCH v7 3/3] nl80211/mac80211: Report signal average

Extend nl80211 to report an exponential weighted moving average (EWMA) of the
signal value. Since the signal value usually fluctuates between different
packets, an average can be more useful than the value of the last packet.

This uses the recently added generic EWMA library function.

Signed-off-by: Bruno Randolf <[email protected]>
---
include/linux/nl80211.h | 2 ++
include/net/cfg80211.h | 4 ++++
net/mac80211/Kconfig | 1 +
net/mac80211/cfg.c | 3 ++-
net/mac80211/rx.c | 1 +
net/mac80211/sta_info.c | 2 ++
net/mac80211/sta_info.h | 3 +++
net/wireless/nl80211.c | 3 +++
8 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/include/linux/nl80211.h b/include/linux/nl80211.h
index fb877b5..0ceb552 100644
--- a/include/linux/nl80211.h
+++ b/include/linux/nl80211.h
@@ -1132,6 +1132,7 @@ enum nl80211_rate_info {
* @__NL80211_STA_INFO_AFTER_LAST: internal
* @NL80211_STA_INFO_MAX: highest possible station info attribute
* @NL80211_STA_INFO_SIGNAL: signal strength of last received PPDU (u8, dBm)
+ * @NL80211_STA_INFO_SIGNAL_AVG: signal strength average (u8, dBm)
* @NL80211_STA_INFO_TX_BITRATE: current unicast tx rate, nested attribute
* containing info as possible, see &enum nl80211_sta_info_txrate.
* @NL80211_STA_INFO_RX_PACKETS: total received packet (u32, from this station)
@@ -1149,6 +1150,7 @@ enum nl80211_sta_info {
NL80211_STA_INFO_PLID,
NL80211_STA_INFO_PLINK_STATE,
NL80211_STA_INFO_SIGNAL,
+ NL80211_STA_INFO_SIGNAL_AVG,
NL80211_STA_INFO_TX_BITRATE,
NL80211_STA_INFO_RX_PACKETS,
NL80211_STA_INFO_TX_PACKETS,
diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index e5702f5..f21f701 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -424,6 +424,7 @@ struct station_parameters {
* @STATION_INFO_TX_RETRIES: @tx_retries filled
* @STATION_INFO_TX_FAILED: @tx_failed filled
* @STATION_INFO_RX_DROP_MISC: @rx_dropped_misc filled
+ * @STATION_INFO_SIGNAL_AVG: @signal_avg filled
*/
enum station_info_flags {
STATION_INFO_INACTIVE_TIME = 1<<0,
@@ -439,6 +440,7 @@ enum station_info_flags {
STATION_INFO_TX_RETRIES = 1<<10,
STATION_INFO_TX_FAILED = 1<<11,
STATION_INFO_RX_DROP_MISC = 1<<12,
+ STATION_INFO_SIGNAL_AVG = 1<<13,
};

/**
@@ -485,6 +487,7 @@ struct rate_info {
* @plid: mesh peer link id
* @plink_state: mesh peer link state
* @signal: signal strength of last received packet in dBm
+ * @signal_avg: signal strength average in dBm
* @txrate: current unicast bitrate to this station
* @rx_packets: packets received from this station
* @tx_packets: packets transmitted to this station
@@ -505,6 +508,7 @@ struct station_info {
u16 plid;
u8 plink_state;
s8 signal;
+ s8 signal_avg;
struct rate_info txrate;
u32 rx_packets;
u32 tx_packets;
diff --git a/net/mac80211/Kconfig b/net/mac80211/Kconfig
index 4d6f865..798d9b9 100644
--- a/net/mac80211/Kconfig
+++ b/net/mac80211/Kconfig
@@ -6,6 +6,7 @@ config MAC80211
select CRYPTO_ARC4
select CRYPTO_AES
select CRC32
+ select AVERAGE
---help---
This option enables the hardware independent IEEE 802.11
networking stack.
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 18bd0e5..d327918 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -343,8 +343,9 @@ static void sta_set_sinfo(struct sta_info *sta, struct station_info *sinfo)

if ((sta->local->hw.flags & IEEE80211_HW_SIGNAL_DBM) ||
(sta->local->hw.flags & IEEE80211_HW_SIGNAL_UNSPEC)) {
- sinfo->filled |= STATION_INFO_SIGNAL;
+ sinfo->filled |= STATION_INFO_SIGNAL | STATION_INFO_SIGNAL_AVG;
sinfo->signal = (s8)sta->last_signal;
+ sinfo->signal_avg = (s8) -ewma_get(&sta->avg_signal);
}

sinfo->txrate.flags = 0;
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 902b03e..34bcc84 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1158,6 +1158,7 @@ ieee80211_rx_h_sta_process(struct ieee80211_rx_data *rx)
sta->rx_fragments++;
sta->rx_bytes += rx->skb->len;
sta->last_signal = status->signal;
+ ewma_add(&sta->avg_signal, -status->signal);

/*
* Change STA power saving mode only at the end of a frame
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 6d8f897..bed23e1 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -241,6 +241,8 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
sta->local = local;
sta->sdata = sdata;

+ ewma_init(&sta->avg_signal, 1000, 8);
+
if (sta_prepare_rate_control(local, sta, gfp)) {
kfree(sta);
return NULL;
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index 9265aca..84062e2 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -13,6 +13,7 @@
#include <linux/types.h>
#include <linux/if_ether.h>
#include <linux/workqueue.h>
+#include <linux/average.h>
#include "key.h"

/**
@@ -224,6 +225,7 @@ enum plink_state {
* @rx_fragments: number of received MPDUs
* @rx_dropped: number of dropped MPDUs from this STA
* @last_signal: signal of last received frame from this STA
+ * @avg_signal: moving average of signal of received frames from this STA
* @last_seq_ctrl: last received seq/frag number from this STA (per RX queue)
* @tx_filtered_count: number of frames the hardware filtered for this STA
* @tx_retry_failed: number of frames that failed retry
@@ -291,6 +293,7 @@ struct sta_info {
unsigned long rx_fragments;
unsigned long rx_dropped;
int last_signal;
+ struct ewma avg_signal;
__le16 last_seq_ctrl[NUM_RX_DATA_QUEUES];

/* Updated from TX status path only, no locking requirements */
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 4e78e3f..660987d 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -1841,6 +1841,9 @@ static int nl80211_send_station(struct sk_buff *msg, u32 pid, u32 seq,
if (sinfo->filled & STATION_INFO_SIGNAL)
NLA_PUT_U8(msg, NL80211_STA_INFO_SIGNAL,
sinfo->signal);
+ if (sinfo->filled & STATION_INFO_SIGNAL_AVG)
+ NLA_PUT_U8(msg, NL80211_STA_INFO_SIGNAL_AVG,
+ sinfo->signal_avg);
if (sinfo->filled & STATION_INFO_TX_BITRATE) {
txrate = nla_nest_start(msg, NL80211_STA_INFO_TX_BITRATE);
if (!txrate)

2010-11-16 10:24:53

by Jouni Malinen

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nl80211/mac80211: Report signal average

On Fri, Nov 12, 2010 at 12:00:35PM +0900, Bruno Randolf wrote:
> Extend nl80211 to report an exponential weighted moving average (EWMA) of the
> signal value. Since the signal value usually fluctuates between different
> packets, an average can be more useful than the value of the last packet.
>
> This uses the recently added generic EWMA library function.

Isn't that generic EWMA library function making this much more CPU
intensive than necessary? I would assume the compiler is not able to
optimize the unsigned long division in ewma_add() when the weight value
is "hidden" in this way through ewma_init() call. While generic library
functions can be nice, it could be useful to have an option for using an
inline version of ewma_add that takes in avg->weight as one of the
arguments to allow cases like weight=8 here to be implemented using
shift right instead of unsigned long division especially when done on
hot path..

> @@ -1158,6 +1158,7 @@ ieee80211_rx_h_sta_process(struct ieee80211_rx_data *rx)
> sta->last_signal = status->signal;
> + ewma_add(&sta->avg_signal, -status->signal);

This one here would be done for every frame received and by using a
function call that ends up doing a potentially expensive division.

> @@ -241,6 +241,8 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
> + ewma_init(&sta->avg_signal, 1000, 8);

The divisor (weight) is set to 8 here which should allow for quite a bit
more efficient ewma_add implementation if it were to be done in a way
that makes it easier for the compiler to see what value is being used
(e.g., passing this 8 to ewma_add_inline()) to allow shift right to be
used..


Or am I missing something here and we have a compiler that is actually
able to take care of this kind of optimizations by itself in the current
ewma library design? Or is the unsigned long division with the expected
weight values going to be fast enough to not justify caring about it
even on any of the embedded boards anyway?

--
Jouni Malinen PGP id EFC895FA

2010-11-17 08:27:51

by Bruno Randolf

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nl80211/mac80211: Report signal average

On Tue November 16 2010 18:37:43 Jouni Malinen wrote:
> On Fri, Nov 12, 2010 at 12:00:35PM +0900, Bruno Randolf wrote:
> > Extend nl80211 to report an exponential weighted moving average (EWMA) of
> > the signal value. Since the signal value usually fluctuates between
> > different packets, an average can be more useful than the value of the
> > last packet.
> >
> > This uses the recently added generic EWMA library function.
>
> Isn't that generic EWMA library function making this much more CPU
> intensive than necessary? I would assume the compiler is not able to
> optimize the unsigned long division in ewma_add() when the weight value
> is "hidden" in this way through ewma_init() call. While generic library
> functions can be nice, it could be useful to have an option for using an
> inline version of ewma_add that takes in avg->weight as one of the
> arguments to allow cases like weight=8 here to be implemented using
> shift right instead of unsigned long division especially when done on
> hot path..
>
> > @@ -1158,6 +1158,7 @@ ieee80211_rx_h_sta_process(struct ieee80211_rx_data
> > *rx)
> >
> > sta->last_signal = status->signal;
> >
> > + ewma_add(&sta->avg_signal, -status->signal);
>
> This one here would be done for every frame received and by using a
> function call that ends up doing a potentially expensive division.
>
> > @@ -241,6 +241,8 @@ struct sta_info *sta_info_alloc(struct
> > ieee80211_sub_if_data *sdata, + ewma_init(&sta->avg_signal, 1000, 8);
>
> The divisor (weight) is set to 8 here which should allow for quite a bit
> more efficient ewma_add implementation if it were to be done in a way
> that makes it easier for the compiler to see what value is being used
> (e.g., passing this 8 to ewma_add_inline()) to allow shift right to be
> used..
>
>
> Or am I missing something here and we have a compiler that is actually
> able to take care of this kind of optimizations by itself in the current
> ewma library design? Or is the unsigned long division with the expected
> weight values going to be fast enough to not justify caring about it
> even on any of the embedded boards anyway?

I understand that this could be more efficient, but if it matters or not -
honestly, I don't know. Can a more knowledgeable person than me comment on
this?

bruno

2010-11-17 16:15:26

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nl80211/mac80211: Report signal average

On Wed, 2010-11-17 at 17:28 +0900, Bruno Randolf wrote:

> > Or am I missing something here and we have a compiler that is actually
> > able to take care of this kind of optimizations by itself in the current
> > ewma library design? Or is the unsigned long division with the expected
> > weight values going to be fast enough to not justify caring about it
> > even on any of the embedded boards anyway?
>
> I understand that this could be more efficient, but if it matters or not -
> honestly, I don't know. Can a more knowledgeable person than me comment on
> this?

Yes, it does matter -- think of an embedded MIPS board running at 500MHz
and trying to push 11n speeds.

johannes

2010-11-17 23:11:20

by Bob Copeland

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nl80211/mac80211: Report signal average

On Wed, Nov 17, 2010 at 11:16 AM, Johannes Berg
<[email protected]> wrote:
> On Wed, 2010-11-17 at 17:28 +0900, Bruno Randolf wrote:
>> I understand that this could be more efficient, but if it matters or not -
>> honestly, I don't know. Can a more knowledgeable person than me comment on
>> this?
>
> Yes, it does matter -- think of an embedded MIPS board running at 500MHz
> and trying to push 11n speeds.

I assume the number of samples (weight) is the more
important tunable. One option is you can require factor
to be a power of two that is much larger than weight,
then at least you can store factor/weight precomputed
and multiply by it instead of doing a divide in ewma_add.
Then ewma_get can also just be a shift as well.

--
Bob Copeland %% http://www.bobcopeland.com

2010-11-19 08:49:15

by Bruno Randolf

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nl80211/mac80211: Report signal average

On Thu November 18 2010 08:11:18 Bob Copeland wrote:
> On Wed, Nov 17, 2010 at 11:16 AM, Johannes Berg
> <[email protected]> wrote:
> > On Wed, 2010-11-17 at 17:28 +0900, Bruno Randolf wrote:
> >> I understand that this could be more efficient, but if it matters or not
> >> - honestly, I don't know. Can a more knowledgeable person than me
> >> comment on this?
> >
> > Yes, it does matter -- think of an embedded MIPS board running at 500MHz
> > and trying to push 11n speeds.

OK, I have done tests with a 266MHz x86 board, pushing 30Mbps UDP thru it. It
can handle maximum around 18Mbps thuput in both cases, with and without the
average. And that is using the ewma function two times (ath5k beacons and
signal average). Here are the results of iperf -u -c IP -b 30M -t 120 (2min of
30Mbps UDP):

With average:

0.0-120.4 sec 269 MBytes 18.7 Mbits/sec 0.370 ms 114323/306094 (37%)
0.0-120.4 sec 263 MBytes 18.3 Mbits/sec 0.336 ms 118668/306110 (39%)
0.0-120.4 sec 265 MBytes 18.5 Mbits/sec 0.229 ms 117036/306120 (38%)

Without:

0.0-120.4 sec 267 MBytes 18.6 Mbits/sec 0.385 ms 115457/306123 (38%)
0.0-120.4 sec 267 MBytes 18.6 Mbits/sec 0.374 ms 115868/306063 (38%)
0.0-120.4 sec 261 MBytes 18.2 Mbits/sec 0.566 ms 119603/306112 (39%)

So I conclude that compared to all the other things we do when we receive a
packet having one division more or less does not have a huge performance
impact.

> I assume the number of samples (weight) is the more
> important tunable. One option is you can require factor
> to be a power of two that is much larger than weight,
> then at least you can store factor/weight precomputed
> and multiply by it instead of doing a divide in ewma_add.
> Then ewma_get can also just be a shift as well.

That's a good idea! I will try to improve the code.

bruno

2010-11-19 09:06:50

by Bruno Randolf

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nl80211/mac80211: Report signal average

On Thu November 18 2010 08:11:18 Bob Copeland wrote:
> On Wed, Nov 17, 2010 at 11:16 AM, Johannes Berg
>
> <[email protected]> wrote:
> > On Wed, 2010-11-17 at 17:28 +0900, Bruno Randolf wrote:
> >> I understand that this could be more efficient, but if it matters or not
> >> - honestly, I don't know. Can a more knowledgeable person than me
> >> comment on this?
> >
> > Yes, it does matter -- think of an embedded MIPS board running at 500MHz
> > and trying to push 11n speeds.
>
> I assume the number of samples (weight) is the more
> important tunable. One option is you can require factor
> to be a power of two that is much larger than weight,
> then at least you can store factor/weight precomputed
> and multiply by it instead of doing a divide in ewma_add.
> Then ewma_get can also just be a shift as well.

Hmm, maybe I suck in mathemathics, but I don't see a way to do that given the
formula:

(((internal * (weight - 1)) + (val * factor)) / weight

bruno

2010-11-19 12:17:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nl80211/mac80211: Report signal average

On Fri, 2010-11-19 at 18:07 +0900, Bruno Randolf wrote:

> Hmm, maybe I suck in mathemathics, but I don't see a way to do that given the
> formula:
>
> (((internal * (weight - 1)) + (val * factor)) / weight

If you assume weight == 2^n, you can write that as:

(((internal << n) - internal) + (val * factor)) >> n

2010-11-19 14:06:01

by Stefan Richter

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nl80211/mac80211: Report signal average

On Nov 19 Bruno Randolf wrote:
> Here are the results of iperf -u -c IP -b 30M -t 120

You only showed that CPU utilization is not so high that an effect on
network throughput can be seen. You did not measure what the increase
in CPU utilization actually was. :-)
--
Stefan Richter
-=====-==-=- =-== =--==
http://arcgraph.de/sr/

2010-11-19 17:51:40

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nl80211/mac80211: Report signal average

On Fri, 2010-11-19 at 17:49 +0900, Bruno Randolf wrote:

> OK, I have done tests with a 266MHz x86 board, pushing 30Mbps UDP thru it. It
> can handle maximum around 18Mbps thuput in both cases, with and without the

That sounds like you're limited by 11g, not by the CPU ...

johannes

2010-11-19 18:57:17

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nl80211/mac80211: Report signal average

On Tue, 2010-11-16 at 11:37 +0200, Jouni Malinen wrote:
> On Fri, Nov 12, 2010 at 12:00:35PM +0900, Bruno Randolf wrote:
> > Extend nl80211 to report an exponential weighted moving average (EWMA) of the
> > signal value. Since the signal value usually fluctuates between different
> > packets, an average can be more useful than the value of the last packet.
> >
> > This uses the recently added generic EWMA library function.
>
> Isn't that generic EWMA library function making this much more CPU
> intensive than necessary?

John, is there any reason you completely ignored Jouni's feedback here?
This wasn't addressed in v8.

johannes

2010-11-19 23:02:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nl80211/mac80211: Report signal average

On Fri, 2010-11-19 at 17:28 -0500, Bob Copeland wrote:
> On Fri, Nov 19, 2010 at 06:07:05PM +0900, Bruno Randolf wrote:
> > Hmm, maybe I suck in mathemathics, but I don't see a way to do that given the
> > formula:
> >
> > (((internal * (weight - 1)) + (val * factor)) / weight
>
> I was thinking something along the lines of:
>
> round = (1 << n) - 1;
> (((internal * (weight - 1) + round) >> n) + val) * ((1 << n) / weight)
>
> where (1 << n) is the factor and ((1 << n) / weight) can be precomputed.
> If you think about it, this is just reciprocal multiplication in fixed-
> point math with n bits of decimal resolution.
>
> The problem is the shift of the older terms introduces roundoff error, but
> there are some tricks you can do to maintain bounded error, e.g. shifting
> by some smaller factor of n and scaling other terms -- in the limit you
> reinvent floating point and then it's slower than division :)

Sure, x/y := x/z * z/y, and by picking z := 2^n, we can pre-compute z/y
and write x/z using a shift. The problem however is always range vs
granularity, you chose to first /z and then *z/y, this avoids some
overflow issues but truncates the lower n bits of x.

If you first *z/y and then /z you keep your low bits but risk loosing
the top bits to an overflow.

I guess the question is do we really need weights outside of 2^n? If
not, you can use the weight := 2^n version. If you do, you get to pick
either of the previously mentioned options.

Sadly gcc doesn't sanely support a u128 type, which would be very useful
to avoid some of these overflow issues (like we used to use u64 mults
for u32 fixed points mults).

2010-11-19 23:15:04

by Bob Copeland

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nl80211/mac80211: Report signal average

On Fri, Nov 19, 2010 at 06:07:05PM +0900, Bruno Randolf wrote:
> Hmm, maybe I suck in mathemathics, but I don't see a way to do that given the
> formula:
>
> (((internal * (weight - 1)) + (val * factor)) / weight

I was thinking something along the lines of:

round = (1 << n) - 1;
(((internal * (weight - 1) + round) >> n) + val) * ((1 << n) / weight)

where (1 << n) is the factor and ((1 << n) / weight) can be precomputed.
If you think about it, this is just reciprocal multiplication in fixed-
point math with n bits of decimal resolution.

The problem is the shift of the older terms introduces roundoff error, but
there are some tricks you can do to maintain bounded error, e.g. shifting
by some smaller factor of n and scaling other terms -- in the limit you
reinvent floating point and then it's slower than division :)

--
Bob Copeland %% http://www.bobcopeland.com

2010-11-22 02:36:51

by Bruno Randolf

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nl80211/mac80211: Report signal average

On Sat November 20 2010 02:52:31 Johannes Berg wrote:
> On Fri, 2010-11-19 at 17:49 +0900, Bruno Randolf wrote:
> > OK, I have done tests with a 266MHz x86 board, pushing 30Mbps UDP thru
> > it. It can handle maximum around 18Mbps thuput in both cases, with and
> > without the
>
> That sounds like you're limited by 11g, not by the CPU ...

Nope. I forgot to mention that I was testing in A mode and I expect 30Mbps
thruput here (I get that consistently on other boards). Also I see my CPU is
at 100% with top.

bruno

2010-11-22 02:41:21

by Bruno Randolf

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nl80211/mac80211: Report signal average

On Fri November 19 2010 23:04:51 Stefan Richter wrote:
> On Nov 19 Bruno Randolf wrote:
> > Here are the results of iperf -u -c IP -b 30M -t 120
>
> You only showed that CPU utilization is not so high that an effect on
> network throughput can be seen. You did not measure what the increase
> in CPU utilization actually was. :-)

True. That's exactly the point: "CPU utilization is not so high that an effect
on network throughput can be seen", which was what people were worrying about.
So IMHO: No problem! :) But of course it makes sense to optimize and I'll try
to do that in subsequent patches.

bruno


2010-11-22 07:28:20

by Stefan Richter

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nl80211/mac80211: Report signal average

On Nov 22 Bruno Randolf wrote:
> On Fri November 19 2010 23:04:51 Stefan Richter wrote:
> > You only showed that CPU utilization is not so high that an effect on
> > network throughput can be seen. You did not measure what the increase
> > in CPU utilization actually was. :-)
>
> True. That's exactly the point: "CPU utilization is not so high that an effect
> on network throughput can be seen", which was what people were worrying about.

No, people worried about CPU utilization that can be reduced.

> So IMHO: No problem! :) But of course it makes sense to optimize and I'll try
> to do that in subsequent patches.

Good.
--
Stefan Richter
-=====-==-=- =-== =-==-
http://arcgraph.de/sr/

2010-11-22 19:00:38

by John W. Linville

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nl80211/mac80211: Report signal average

On Fri, Nov 19, 2010 at 10:58:36AM -0800, Johannes Berg wrote:
> On Tue, 2010-11-16 at 11:37 +0200, Jouni Malinen wrote:
> > On Fri, Nov 12, 2010 at 12:00:35PM +0900, Bruno Randolf wrote:
> > > Extend nl80211 to report an exponential weighted moving average (EWMA) of the
> > > signal value. Since the signal value usually fluctuates between different
> > > packets, an average can be more useful than the value of the last packet.
> > >
> > > This uses the recently added generic EWMA library function.
> >
> > Isn't that generic EWMA library function making this much more CPU
> > intensive than necessary?
>
> John, is there any reason you completely ignored Jouni's feedback here?
> This wasn't addressed in v8.

Well, I confess that I didn't fully appreciate Jouni's point.

Is there any reason to presume that Bruno cannot (or will not) improve
the performance from here? Do you have any measurement of the actual
performance impact?

John
--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2010-12-02 08:13:29

by Bruno Randolf

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] nl80211/mac80211: Report signal average

On Sat November 20 2010 08:01:55 Peter Zijlstra wrote:
> On Fri, 2010-11-19 at 17:28 -0500, Bob Copeland wrote:
> > On Fri, Nov 19, 2010 at 06:07:05PM +0900, Bruno Randolf wrote:
> > > Hmm, maybe I suck in mathemathics, but I don't see a way to do that
> > > given the formula:
> > >
> > > (((internal * (weight - 1)) + (val * factor)) / weight
> >
> > I was thinking something along the lines of:
> >
> > round = (1 << n) - 1;
> > (((internal * (weight - 1) + round) >> n) + val) * ((1 << n) / weight)
> >
> > where (1 << n) is the factor and ((1 << n) / weight) can be precomputed.
> > If you think about it, this is just reciprocal multiplication in fixed-
> > point math with n bits of decimal resolution.
> >
> > The problem is the shift of the older terms introduces roundoff error,
> > but there are some tricks you can do to maintain bounded error, e.g.
> > shifting by some smaller factor of n and scaling other terms -- in the
> > limit you reinvent floating point and then it's slower than division :)
>
> Sure, x/y := x/z * z/y, and by picking z := 2^n, we can pre-compute z/y
> and write x/z using a shift. The problem however is always range vs
> granularity, you chose to first /z and then *z/y, this avoids some
> overflow issues but truncates the lower n bits of x.
>
> If you first *z/y and then /z you keep your low bits but risk loosing
> the top bits to an overflow.
>
> I guess the question is do we really need weights outside of 2^n? If
> not, you can use the weight := 2^n version. If you do, you get to pick
> either of the previously mentioned options.
>
> Sadly gcc doesn't sanely support a u128 type, which would be very useful
> to avoid some of these overflow issues (like we used to use u64 mults
> for u32 fixed points mults).

Thank you all for your help and sorry for following up so late!

I think we don't really need weights outside of 2^n and i'm going to post a
patch based on Peter Zijlstra's formula. Thanks again! Would it make sense to
have the factor 2^n too, so we can bitshift there too?

bruno