Re: [PATCH v7 3/3] nl80211/mac80211: Report signal average

Previous thread: [PATCH v2] EG20T: Update PCH_UART driver to 2.6.36 by Tomoya MORINAGA on Thursday, November 11, 2010 - 7:55 pm. (3 messages)

Next thread: [tg_shares_up rewrite v3 07/11] sched: add sysctl_sched_shares_window by Paul Turner on Thursday, November 11, 2010 - 8:24 pm. (1 message)
From: Bruno Randolf
Date: Thursday, November 11, 2010 - 8:00 pm

This series adds a generic exponentially weighted moving average (EWMA) library
function and uses it from ath5k and mac80211. There are more places within
mac80211 and other wireless drivers which could benefit from this function,
which I'm going to update later and I hope this function can be useful in many
places all over the kernel. 

As it has been discussed on LKML, the preferred way of merging this is thru
wireless-testing, since this is where the first callers will be
(http://marc.info/?l=linux-kernel&m=128949956117559&w=2).

Thanks,
Bruno

---

Bruno Randolf (3):
      Add generic exponentially weighted moving average (EWMA) function
      ath5k: Use generic EWMA library
      nl80211/mac80211: Report signal average


 drivers/net/wireless/ath/ath5k/Kconfig |    1 +
 drivers/net/wireless/ath/ath5k/ani.c   |    4 +-
 drivers/net/wireless/ath/ath5k/ath5k.h |   26 +--------------
 drivers/net/wireless/ath/ath5k/base.c  |    4 +-
 drivers/net/wireless/ath/ath5k/debug.c |    2 +
 include/linux/average.h                |   32 ++++++++++++++++++
 include/linux/nl80211.h                |    2 +
 include/net/cfg80211.h                 |    4 ++
 lib/Kconfig                            |    3 ++
 lib/Makefile                           |    2 +
 lib/average.c                          |   57 ++++++++++++++++++++++++++++++++
 net/mac80211/Kconfig                   |    1 +
 net/mac80211/cfg.c                     |    3 +-
 net/mac80211/rx.c                      |    1 +
 net/mac80211/sta_info.c                |    2 +
 net/mac80211/sta_info.h                |    3 ++
 net/wireless/nl80211.c                 |    3 ++
 17 files changed, 120 insertions(+), 30 deletions(-)
 create mode 100644 include/linux/average.h
 create mode 100644 lib/average.c

-- 
Signature
--

From: Bruno Randolf
Date: Thursday, November 11, 2010 - 8:00 pm

This adds generic functions for calculating Exponentially Weighted Moving
Averages (EWMA). This implementation makes use of a structure which keeps the
EWMA parameters and a scaled up internal representation to reduce rounding
errors.

The original idea for this implementation came from the rt2x00 driver
(rt2x00link.c). I would like to use it in several places in the mac80211 and
ath5k code and I hope it can be useful in many other places in the kernel code.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

--

v7: Include bug.h. ewma_init() returns void. Use CONFIG_AVERAGE.

v6: Fixed ULONG_MAX in comment. Add reviewed by KOSAKI Motohiro.

v5: Use unsigned long instead of insigned int.

v4: Initialize internal variable to 0. Remove unneeded const qualifiers.

v3: Addressing Andrew Mortons comments: Implement in lib/average.c and make
    access and initalization functions. Use unsigned int for values. Rename
    functions to ewma_* since there might be other moving average
    implementations which are not exponentially weighted.

v2: Renamed 'samples' to 'weight'. Added more documentation. Use avg_val
    pointer. Add a WARN_ON_ONCE for invalid values of 'weight'. Divide
    and round up/down.
---
 include/linux/average.h |   32 ++++++++++++++++++++++++++
 lib/Kconfig             |    3 ++
 lib/Makefile            |    2 ++
 lib/average.c           |   57 +++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 94 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/average.h
 create mode 100644 lib/average.c

diff --git a/include/linux/average.h b/include/linux/average.h
new file mode 100644
index 0000000..ee3da51
--- /dev/null
+++ b/include/linux/average.h
@@ -0,0 +1,32 @@
+#ifndef _LINUX_AVERAGE_H
+#define _LINUX_AVERAGE_H
+
+#include <linux/kernel.h>
+
+/* Exponentially weighted moving average (EWMA) */
+
+/* For more documentation see lib/average.c */
+
+struct ewma ...
From: Bruno Randolf
Date: Thursday, November 11, 2010 - 8:00 pm

Remove ath5k's private moving average implementation in favour of the generic
library version.

Signed-off-by: Bruno Randolf <br1@einfach.org>
---
 drivers/net/wireless/ath/ath5k/Kconfig |    1 +
 drivers/net/wireless/ath/ath5k/ani.c   |    4 ++--
 drivers/net/wireless/ath/ath5k/ath5k.h |   26 ++------------------------
 drivers/net/wireless/ath/ath5k/base.c  |    4 ++--
 drivers/net/wireless/ath/ath5k/debug.c |    2 +-
 5 files changed, 8 insertions(+), 29 deletions(-)

diff --git a/drivers/net/wireless/ath/ath5k/Kconfig b/drivers/net/wireless/ath/ath5k/Kconfig
index eb83b7b..4784457 100644
--- a/drivers/net/wireless/ath/ath5k/Kconfig
+++ b/drivers/net/wireless/ath/ath5k/Kconfig
@@ -4,6 +4,7 @@ config ATH5K
 	select MAC80211_LEDS
 	select LEDS_CLASS
 	select NEW_LEDS
+	select AVERAGE
 	---help---
 	  This module adds support for wireless adapters based on
 	  Atheros 5xxx chipset.
diff --git a/drivers/net/wireless/ath/ath5k/ani.c b/drivers/net/wireless/ath/ath5k/ani.c
index f141919..545e489 100644
--- a/drivers/net/wireless/ath/ath5k/ani.c
+++ b/drivers/net/wireless/ath/ath5k/ani.c
@@ -216,7 +216,7 @@ static void
 ath5k_ani_raise_immunity(struct ath5k_hw *ah, struct ath5k_ani_state *as,
 			 bool ofdm_trigger)
 {
-	int rssi = ah->ah_beacon_rssi_avg.avg;
+	int rssi = ewma_get(&ah->ah_beacon_rssi_avg);
 
 	ATH5K_DBG_UNLIMIT(ah->ah_sc, ATH5K_DEBUG_ANI, "raise immunity (%s)",
 		ofdm_trigger ? "ODFM" : "CCK");
@@ -301,7 +301,7 @@ ath5k_ani_raise_immunity(struct ath5k_hw *ah, struct ath5k_ani_state *as,
 static void
 ath5k_ani_lower_immunity(struct ath5k_hw *ah, struct ath5k_ani_state *as)
 {
-	int rssi = ah->ah_beacon_rssi_avg.avg;
+	int rssi = ewma_get(&ah->ah_beacon_rssi_avg);
 
 	ATH5K_DBG_UNLIMIT(ah->ah_sc, ATH5K_DEBUG_ANI, "lower immunity");
 
diff --git a/drivers/net/wireless/ath/ath5k/ath5k.h b/drivers/net/wireless/ath/ath5k/ath5k.h
index 308b79e..2718136 100644
--- a/drivers/net/wireless/ath/ath5k/ath5k.h
+++ b/drivers/net/wireless/ath/ath5k/ath5k.h
@@ ...
From: Bruno Randolf
Date: Thursday, November 11, 2010 - 8:00 pm

Extend nl80211 to report an exponential weighted moving average (EWMA) of the
signal value. Since the signal value usually fluctuates between different
packets, an average can be more useful than the value of the last packet.

This uses the recently added generic EWMA library function.

Signed-off-by: Bruno Randolf <br1@einfach.org>
---
 include/linux/nl80211.h |    2 ++
 include/net/cfg80211.h  |    4 ++++
 net/mac80211/Kconfig    |    1 +
 net/mac80211/cfg.c      |    3 ++-
 net/mac80211/rx.c       |    1 +
 net/mac80211/sta_info.c |    2 ++
 net/mac80211/sta_info.h |    3 +++
 net/wireless/nl80211.c  |    3 +++
 8 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/include/linux/nl80211.h b/include/linux/nl80211.h
index fb877b5..0ceb552 100644
--- a/include/linux/nl80211.h
+++ b/include/linux/nl80211.h
@@ -1132,6 +1132,7 @@ enum nl80211_rate_info {
  * @__NL80211_STA_INFO_AFTER_LAST: internal
  * @NL80211_STA_INFO_MAX: highest possible station info attribute
  * @NL80211_STA_INFO_SIGNAL: signal strength of last received PPDU (u8, dBm)
+ * @NL80211_STA_INFO_SIGNAL_AVG: signal strength average (u8, dBm)
  * @NL80211_STA_INFO_TX_BITRATE: current unicast tx rate, nested attribute
  * 	containing info as possible, see &enum nl80211_sta_info_txrate.
  * @NL80211_STA_INFO_RX_PACKETS: total received packet (u32, from this station)
@@ -1149,6 +1150,7 @@ enum nl80211_sta_info {
 	NL80211_STA_INFO_PLID,
 	NL80211_STA_INFO_PLINK_STATE,
 	NL80211_STA_INFO_SIGNAL,
+	NL80211_STA_INFO_SIGNAL_AVG,
 	NL80211_STA_INFO_TX_BITRATE,
 	NL80211_STA_INFO_RX_PACKETS,
 	NL80211_STA_INFO_TX_PACKETS,
diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index e5702f5..f21f701 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -424,6 +424,7 @@ struct station_parameters {
  * @STATION_INFO_TX_RETRIES: @tx_retries filled
  * @STATION_INFO_TX_FAILED: @tx_failed filled
  * @STATION_INFO_RX_DROP_MISC: @rx_dropped_misc filled
+ * @STATION_INFO_SIGNAL_AVG: ...
From: Jouni Malinen
Date: Tuesday, November 16, 2010 - 2:37 am

Isn't that generic EWMA library function making this much more CPU
intensive than necessary? I would assume the compiler is not able to
optimize the unsigned long division in ewma_add() when the weight value
is "hidden" in this way through ewma_init() call. While generic library
functions can be nice, it could be useful to have an option for using an
inline version of ewma_add that takes in avg->weight as one of the
arguments to allow cases like weight=8 here to be implemented using
shift right instead of unsigned long division especially when done on

This one here would be done for every frame received and by using a

The divisor (weight) is set to 8 here which should allow for quite a bit
more efficient ewma_add implementation if it were to be done in a way
that makes it easier for the compiler to see what value is being used
(e.g., passing this 8 to ewma_add_inline()) to allow shift right to be
used..


Or am I missing something here and we have a compiler that is actually
able to take care of this kind of optimizations by itself in the current
ewma library design? Or is the unsigned long division with the expected
weight values going to be fast enough to not justify caring about it
even on any of the embedded boards anyway?

-- 
Jouni Malinen                                            PGP id EFC895FA
--

From: Bruno Randolf
Date: Wednesday, November 17, 2010 - 1:28 am

I understand that this could be more efficient, but if it matters or not - 
honestly, I don't know. Can a more knowledgeable person than me comment on 
this?

bruno
--

From: Johannes Berg
Date: Wednesday, November 17, 2010 - 9:16 am

Yes, it does matter -- think of an embedded MIPS board running at 500MHz
and trying to push 11n speeds.

johannes

--

From: Bob Copeland
Date: Wednesday, November 17, 2010 - 4:11 pm

On Wed, Nov 17, 2010 at 11:16 AM, Johannes Berg

I assume the number of samples (weight) is the more
important tunable.  One option is you can require factor
to be a power of two that is much larger than weight,
then at least you can store factor/weight precomputed
and multiply by it instead of doing a divide in ewma_add.
Then ewma_get can also just be a shift as well.

-- 
Bob Copeland %% www.bobcopeland.com
--

From: Bruno Randolf
Date: Friday, November 19, 2010 - 1:49 am

OK, I have done tests with a 266MHz x86 board, pushing 30Mbps UDP thru it. It 
can handle maximum around 18Mbps thuput in both cases, with and without the 
average. And that is using the ewma function two times (ath5k beacons and 
signal average). Here are the results of iperf -u -c IP -b 30M -t 120 (2min of 
30Mbps UDP):

With average:

0.0-120.4 sec    269 MBytes  18.7 Mbits/sec  0.370 ms 114323/306094 (37%)
0.0-120.4 sec    263 MBytes  18.3 Mbits/sec  0.336 ms 118668/306110 (39%)
0.0-120.4 sec    265 MBytes  18.5 Mbits/sec  0.229 ms 117036/306120 (38%)

Without:

0.0-120.4 sec    267 MBytes  18.6 Mbits/sec  0.385 ms 115457/306123 (38%)
0.0-120.4 sec    267 MBytes  18.6 Mbits/sec  0.374 ms 115868/306063 (38%)
0.0-120.4 sec    261 MBytes  18.2 Mbits/sec  0.566 ms 119603/306112 (39%)

So I conclude that compared to all the other things we do when we receive a 
packet having one division more or less does not have a huge performance 

That's a good idea! I will try to improve the code.

bruno
--

From: Stefan Richter
Date: Friday, November 19, 2010 - 7:04 am

You only showed that CPU utilization is not so high that an effect on
network throughput can be seen.  You did not measure what the increase
in CPU utilization actually was. :-)
-- 
Stefan Richter
-=====-==-=- =-== =--==
http://arcgraph.de/sr/
--

From: Bruno Randolf
Date: Sunday, November 21, 2010 - 7:41 pm

True. That's exactly the point: "CPU utilization is not so high that an effect 
on network throughput can be seen", which was what people were worrying about. 
So IMHO: No problem! :) But of course it makes sense to optimize and I'll try 
to do that in subsequent patches.

bruno



--

From: Stefan Richter
Date: Monday, November 22, 2010 - 12:26 am

Good.
-- 
Stefan Richter
-=====-==-=- =-== =-==-
http://arcgraph.de/sr/
--

From: Johannes Berg
Date: Friday, November 19, 2010 - 10:52 am

That sounds like you're limited by 11g, not by the CPU ...

johannes

--

From: Bruno Randolf
Date: Sunday, November 21, 2010 - 7:36 pm

Nope. I forgot to mention that I was testing in A mode and I expect 30Mbps 
thruput here (I get that consistently on other boards). Also I see my CPU is 
at 100% with top.

bruno
--

From: Bruno Randolf
Date: Friday, November 19, 2010 - 2:07 am

Hmm, maybe I suck in mathemathics, but I don't see a way to do that given the 
formula:

(((internal * (weight - 1)) + (val * factor)) / weight

bruno
--

From: Peter Zijlstra
Date: Friday, November 19, 2010 - 5:16 am

If you assume weight == 2^n, you can write that as:

(((internal << n) - internal) + (val * factor)) >> n


--

From: Peter Zijlstra
Date: Friday, November 19, 2010 - 4:01 pm

Sure, x/y := x/z * z/y, and by picking z := 2^n, we can pre-compute z/y
and write x/z using a shift. The problem however is always range vs
granularity, you chose to first /z and then *z/y, this avoids some
overflow issues but truncates the lower n bits of x.

If you first *z/y and then /z you keep your low bits but risk loosing
the top bits to an overflow.

I guess the question is do we really need weights outside of 2^n? If
not, you can use the weight := 2^n version. If you do, you get to pick
either of the previously mentioned options.

Sadly gcc doesn't sanely support a u128 type, which would be very useful
to avoid some of these overflow issues (like we used to use u64 mults
for u32 fixed points mults).
--

From: Bruno Randolf
Date: Thursday, December 2, 2010 - 1:12 am

Thank you all for your help and sorry for following up so late!

I think we don't really need weights outside of 2^n and i'm going to post a 
patch based on Peter Zijlstra's formula. Thanks again! Would it make sense to 
have the factor 2^n too, so we can bitshift there too?

bruno
--

From: Bob Copeland
Date: Friday, November 19, 2010 - 3:28 pm

I was thinking something along the lines of:

round = (1 << n) - 1;
(((internal * (weight - 1) + round) >> n) + val) * ((1 << n) / weight)

where (1 << n) is the factor and ((1 << n) / weight) can be precomputed.
If you think about it, this is just reciprocal multiplication in fixed-
point math with n bits of decimal resolution. 

The problem is the shift of the older terms introduces roundoff error, but
there are some tricks you can do to maintain bounded error, e.g. shifting
by some smaller factor of n and scaling other terms -- in the limit you
reinvent floating point and then it's slower than division :)

-- 
Bob Copeland %% www.bobcopeland.com

--

From: Johannes Berg
Date: Friday, November 19, 2010 - 11:58 am

John, is there any reason you completely ignored Jouni's feedback here?
This wasn't addressed in v8.

johannes

--

From: John W. Linville
Date: Monday, November 22, 2010 - 11:46 am

Well, I confess that I didn't fully appreciate Jouni's point.

Is there any reason to presume that Bruno cannot (or will not) improve
the performance from here?  Do you have any measurement of the actual
performance impact?

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.
--

Previous thread: [PATCH v2] EG20T: Update PCH_UART driver to 2.6.36 by Tomoya MORINAGA on Thursday, November 11, 2010 - 7:55 pm. (3 messages)

Next thread: [tg_shares_up rewrite v3 07/11] sched: add sysctl_sched_shares_window by Paul Turner on Thursday, November 11, 2010 - 8:24 pm. (1 message)