This series adds a generic exponentially weighted moving average (EWMA) library function and uses it from ath5k and mac80211. There are more places within mac80211 and other wireless drivers which could benefit from this function, which I'm going to update later and I hope this function can be useful in many places all over the kernel. As it has been discussed on LKML, the preferred way of merging this is thru wireless-testing, since this is where the first callers will be (http://marc.info/?l=linux-kernel&m=128949956117559&w=2). Thanks, Bruno --- Bruno Randolf (3): Add generic exponentially weighted moving average (EWMA) function ath5k: Use generic EWMA library nl80211/mac80211: Report signal average drivers/net/wireless/ath/ath5k/Kconfig | 1 + drivers/net/wireless/ath/ath5k/ani.c | 4 +- drivers/net/wireless/ath/ath5k/ath5k.h | 26 +-------------- drivers/net/wireless/ath/ath5k/base.c | 4 +- drivers/net/wireless/ath/ath5k/debug.c | 2 + include/linux/average.h | 32 ++++++++++++++++++ include/linux/nl80211.h | 2 + include/net/cfg80211.h | 4 ++ lib/Kconfig | 3 ++ lib/Makefile | 2 + lib/average.c | 57 ++++++++++++++++++++++++++++++++ net/mac80211/Kconfig | 1 + net/mac80211/cfg.c | 3 +- net/mac80211/rx.c | 1 + net/mac80211/sta_info.c | 2 + net/mac80211/sta_info.h | 3 ++ net/wireless/nl80211.c | 3 ++ 17 files changed, 120 insertions(+), 30 deletions(-) create mode 100644 include/linux/average.h create mode 100644 lib/average.c -- Signature --
This adds generic functions for calculating Exponentially Weighted Moving
Averages (EWMA). This implementation makes use of a structure which keeps the
EWMA parameters and a scaled up internal representation to reduce rounding
errors.
The original idea for this implementation came from the rt2x00 driver
(rt2x00link.c). I would like to use it in several places in the mac80211 and
ath5k code and I hope it can be useful in many other places in the kernel code.
Signed-off-by: Bruno Randolf <br1@einfach.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
--
v7: Include bug.h. ewma_init() returns void. Use CONFIG_AVERAGE.
v6: Fixed ULONG_MAX in comment. Add reviewed by KOSAKI Motohiro.
v5: Use unsigned long instead of insigned int.
v4: Initialize internal variable to 0. Remove unneeded const qualifiers.
v3: Addressing Andrew Mortons comments: Implement in lib/average.c and make
access and initalization functions. Use unsigned int for values. Rename
functions to ewma_* since there might be other moving average
implementations which are not exponentially weighted.
v2: Renamed 'samples' to 'weight'. Added more documentation. Use avg_val
pointer. Add a WARN_ON_ONCE for invalid values of 'weight'. Divide
and round up/down.
---
include/linux/average.h | 32 ++++++++++++++++++++++++++
lib/Kconfig | 3 ++
lib/Makefile | 2 ++
lib/average.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 94 insertions(+), 0 deletions(-)
create mode 100644 include/linux/average.h
create mode 100644 lib/average.c
diff --git a/include/linux/average.h b/include/linux/average.h
new file mode 100644
index 0000000..ee3da51
--- /dev/null
+++ b/include/linux/average.h
@@ -0,0 +1,32 @@
+#ifndef _LINUX_AVERAGE_H
+#define _LINUX_AVERAGE_H
+
+#include <linux/kernel.h>
+
+/* Exponentially weighted moving average (EWMA) */
+
+/* For more documentation see lib/average.c */
+
+struct ewma ...Remove ath5k's private moving average implementation in favour of the generic
library version.
Signed-off-by: Bruno Randolf <br1@einfach.org>
---
drivers/net/wireless/ath/ath5k/Kconfig | 1 +
drivers/net/wireless/ath/ath5k/ani.c | 4 ++--
drivers/net/wireless/ath/ath5k/ath5k.h | 26 ++------------------------
drivers/net/wireless/ath/ath5k/base.c | 4 ++--
drivers/net/wireless/ath/ath5k/debug.c | 2 +-
5 files changed, 8 insertions(+), 29 deletions(-)
diff --git a/drivers/net/wireless/ath/ath5k/Kconfig b/drivers/net/wireless/ath/ath5k/Kconfig
index eb83b7b..4784457 100644
--- a/drivers/net/wireless/ath/ath5k/Kconfig
+++ b/drivers/net/wireless/ath/ath5k/Kconfig
@@ -4,6 +4,7 @@ config ATH5K
select MAC80211_LEDS
select LEDS_CLASS
select NEW_LEDS
+ select AVERAGE
---help---
This module adds support for wireless adapters based on
Atheros 5xxx chipset.
diff --git a/drivers/net/wireless/ath/ath5k/ani.c b/drivers/net/wireless/ath/ath5k/ani.c
index f141919..545e489 100644
--- a/drivers/net/wireless/ath/ath5k/ani.c
+++ b/drivers/net/wireless/ath/ath5k/ani.c
@@ -216,7 +216,7 @@ static void
ath5k_ani_raise_immunity(struct ath5k_hw *ah, struct ath5k_ani_state *as,
bool ofdm_trigger)
{
- int rssi = ah->ah_beacon_rssi_avg.avg;
+ int rssi = ewma_get(&ah->ah_beacon_rssi_avg);
ATH5K_DBG_UNLIMIT(ah->ah_sc, ATH5K_DEBUG_ANI, "raise immunity (%s)",
ofdm_trigger ? "ODFM" : "CCK");
@@ -301,7 +301,7 @@ ath5k_ani_raise_immunity(struct ath5k_hw *ah, struct ath5k_ani_state *as,
static void
ath5k_ani_lower_immunity(struct ath5k_hw *ah, struct ath5k_ani_state *as)
{
- int rssi = ah->ah_beacon_rssi_avg.avg;
+ int rssi = ewma_get(&ah->ah_beacon_rssi_avg);
ATH5K_DBG_UNLIMIT(ah->ah_sc, ATH5K_DEBUG_ANI, "lower immunity");
diff --git a/drivers/net/wireless/ath/ath5k/ath5k.h b/drivers/net/wireless/ath/ath5k/ath5k.h
index 308b79e..2718136 100644
--- a/drivers/net/wireless/ath/ath5k/ath5k.h
+++ b/drivers/net/wireless/ath/ath5k/ath5k.h
@@ ...Extend nl80211 to report an exponential weighted moving average (EWMA) of the
signal value. Since the signal value usually fluctuates between different
packets, an average can be more useful than the value of the last packet.
This uses the recently added generic EWMA library function.
Signed-off-by: Bruno Randolf <br1@einfach.org>
---
include/linux/nl80211.h | 2 ++
include/net/cfg80211.h | 4 ++++
net/mac80211/Kconfig | 1 +
net/mac80211/cfg.c | 3 ++-
net/mac80211/rx.c | 1 +
net/mac80211/sta_info.c | 2 ++
net/mac80211/sta_info.h | 3 +++
net/wireless/nl80211.c | 3 +++
8 files changed, 18 insertions(+), 1 deletions(-)
diff --git a/include/linux/nl80211.h b/include/linux/nl80211.h
index fb877b5..0ceb552 100644
--- a/include/linux/nl80211.h
+++ b/include/linux/nl80211.h
@@ -1132,6 +1132,7 @@ enum nl80211_rate_info {
* @__NL80211_STA_INFO_AFTER_LAST: internal
* @NL80211_STA_INFO_MAX: highest possible station info attribute
* @NL80211_STA_INFO_SIGNAL: signal strength of last received PPDU (u8, dBm)
+ * @NL80211_STA_INFO_SIGNAL_AVG: signal strength average (u8, dBm)
* @NL80211_STA_INFO_TX_BITRATE: current unicast tx rate, nested attribute
* containing info as possible, see &enum nl80211_sta_info_txrate.
* @NL80211_STA_INFO_RX_PACKETS: total received packet (u32, from this station)
@@ -1149,6 +1150,7 @@ enum nl80211_sta_info {
NL80211_STA_INFO_PLID,
NL80211_STA_INFO_PLINK_STATE,
NL80211_STA_INFO_SIGNAL,
+ NL80211_STA_INFO_SIGNAL_AVG,
NL80211_STA_INFO_TX_BITRATE,
NL80211_STA_INFO_RX_PACKETS,
NL80211_STA_INFO_TX_PACKETS,
diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index e5702f5..f21f701 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -424,6 +424,7 @@ struct station_parameters {
* @STATION_INFO_TX_RETRIES: @tx_retries filled
* @STATION_INFO_TX_FAILED: @tx_failed filled
* @STATION_INFO_RX_DROP_MISC: @rx_dropped_misc filled
+ * @STATION_INFO_SIGNAL_AVG: ...Isn't that generic EWMA library function making this much more CPU intensive than necessary? I would assume the compiler is not able to optimize the unsigned long division in ewma_add() when the weight value is "hidden" in this way through ewma_init() call. While generic library functions can be nice, it could be useful to have an option for using an inline version of ewma_add that takes in avg->weight as one of the arguments to allow cases like weight=8 here to be implemented using shift right instead of unsigned long division especially when done on This one here would be done for every frame received and by using a The divisor (weight) is set to 8 here which should allow for quite a bit more efficient ewma_add implementation if it were to be done in a way that makes it easier for the compiler to see what value is being used (e.g., passing this 8 to ewma_add_inline()) to allow shift right to be used.. Or am I missing something here and we have a compiler that is actually able to take care of this kind of optimizations by itself in the current ewma library design? Or is the unsigned long division with the expected weight values going to be fast enough to not justify caring about it even on any of the embedded boards anyway? -- Jouni Malinen PGP id EFC895FA --
I understand that this could be more efficient, but if it matters or not - honestly, I don't know. Can a more knowledgeable person than me comment on this? bruno --
Yes, it does matter -- think of an embedded MIPS board running at 500MHz and trying to push 11n speeds. johannes --
On Wed, Nov 17, 2010 at 11:16 AM, Johannes Berg I assume the number of samples (weight) is the more important tunable. One option is you can require factor to be a power of two that is much larger than weight, then at least you can store factor/weight precomputed and multiply by it instead of doing a divide in ewma_add. Then ewma_get can also just be a shift as well. -- Bob Copeland %% www.bobcopeland.com --
OK, I have done tests with a 266MHz x86 board, pushing 30Mbps UDP thru it. It can handle maximum around 18Mbps thuput in both cases, with and without the average. And that is using the ewma function two times (ath5k beacons and signal average). Here are the results of iperf -u -c IP -b 30M -t 120 (2min of 30Mbps UDP): With average: 0.0-120.4 sec 269 MBytes 18.7 Mbits/sec 0.370 ms 114323/306094 (37%) 0.0-120.4 sec 263 MBytes 18.3 Mbits/sec 0.336 ms 118668/306110 (39%) 0.0-120.4 sec 265 MBytes 18.5 Mbits/sec 0.229 ms 117036/306120 (38%) Without: 0.0-120.4 sec 267 MBytes 18.6 Mbits/sec 0.385 ms 115457/306123 (38%) 0.0-120.4 sec 267 MBytes 18.6 Mbits/sec 0.374 ms 115868/306063 (38%) 0.0-120.4 sec 261 MBytes 18.2 Mbits/sec 0.566 ms 119603/306112 (39%) So I conclude that compared to all the other things we do when we receive a packet having one division more or less does not have a huge performance That's a good idea! I will try to improve the code. bruno --
You only showed that CPU utilization is not so high that an effect on network throughput can be seen. You did not measure what the increase in CPU utilization actually was. :-) -- Stefan Richter -=====-==-=- =-== =--== http://arcgraph.de/sr/ --
True. That's exactly the point: "CPU utilization is not so high that an effect on network throughput can be seen", which was what people were worrying about. So IMHO: No problem! :) But of course it makes sense to optimize and I'll try to do that in subsequent patches. bruno --
Good. -- Stefan Richter -=====-==-=- =-== =-==- http://arcgraph.de/sr/ --
That sounds like you're limited by 11g, not by the CPU ... johannes --
Nope. I forgot to mention that I was testing in A mode and I expect 30Mbps thruput here (I get that consistently on other boards). Also I see my CPU is at 100% with top. bruno --
Hmm, maybe I suck in mathemathics, but I don't see a way to do that given the formula: (((internal * (weight - 1)) + (val * factor)) / weight bruno --
If you assume weight == 2^n, you can write that as: (((internal << n) - internal) + (val * factor)) >> n --
Sure, x/y := x/z * z/y, and by picking z := 2^n, we can pre-compute z/y and write x/z using a shift. The problem however is always range vs granularity, you chose to first /z and then *z/y, this avoids some overflow issues but truncates the lower n bits of x. If you first *z/y and then /z you keep your low bits but risk loosing the top bits to an overflow. I guess the question is do we really need weights outside of 2^n? If not, you can use the weight := 2^n version. If you do, you get to pick either of the previously mentioned options. Sadly gcc doesn't sanely support a u128 type, which would be very useful to avoid some of these overflow issues (like we used to use u64 mults for u32 fixed points mults). --
Thank you all for your help and sorry for following up so late! I think we don't really need weights outside of 2^n and i'm going to post a patch based on Peter Zijlstra's formula. Thanks again! Would it make sense to have the factor 2^n too, so we can bitshift there too? bruno --
I was thinking something along the lines of: round = (1 << n) - 1; (((internal * (weight - 1) + round) >> n) + val) * ((1 << n) / weight) where (1 << n) is the factor and ((1 << n) / weight) can be precomputed. If you think about it, this is just reciprocal multiplication in fixed- point math with n bits of decimal resolution. The problem is the shift of the older terms introduces roundoff error, but there are some tricks you can do to maintain bounded error, e.g. shifting by some smaller factor of n and scaling other terms -- in the limit you reinvent floating point and then it's slower than division :) -- Bob Copeland %% www.bobcopeland.com --
John, is there any reason you completely ignored Jouni's feedback here? This wasn't addressed in v8. johannes --
Well, I confess that I didn't fully appreciate Jouni's point. Is there any reason to presume that Bruno cannot (or will not) improve the performance from here? Do you have any measurement of the actual performance impact? John -- John W. Linville Someday the world will need a hero, and you linville@tuxdriver.com might be all we have. Be ready. --
| Jesse Barnes | Re: [stable] [BUG][PATCH] cpqphp: fix kernel NULL pointer dereference |
| Greg KH | [003/136] p54usb: add Zcomax XG-705A usbid |
| Magnus Damm | [PATCH 03/07] ARM: Use shared GIC entry macros on Realview |
| Oliver Neukum | Re: [Bug #13682] The webcam stopped working when upgrading from 2.6.29 to 2.6.30 |
| Martin Schwidefsky | Re: [PATCH] optimized ktime_get[_ts] for GENERIC_TIME=y |
git: | |
| Junio C Hamano | Re: Some advanced index playing |
| Jeff King | Re: confusion over the new branch and merge config |
| Robin Rosenberg | Re: cvs2svn conversion directly to git ready for experimentation |
| Linus Torvalds | git binary size... |
| Ævar Arnfjörð Bjarmason | Re: Challenge with Git-Bash |
| Linux Kernel Mailing List | md: move allocation of ->queue from mddev_find to md_probe< |
