net_sched: SFB flow scheduler

This is the Stochastic Fair Blue scheduler, based on work from :

W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: A New Class of Active Queue
Management Algorithms. U. Michigan CSE-TR-387-99, April 1999.

http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf

This implementation is based on work done by Juliusz Chroboczek

General SFB algorithm can be found in figure 14, page 15:

B[l][n] : L x N array of bins (L levels, N bins per level)
enqueue()
Calculate hash function values h{0}, h{1}, .. h{L-1}
Update bins at each level
for i = 0 to L - 1
   if (B[i][h{i}].qlen > bin_size)
      B[i][h{i}].p_mark += p_increment;
   else if (B[i][h{i}].qlen == 0)
      B[i][h{i}].p_mark -= p_decrement;
p_min = min(B[0][h{0}].p_mark ... B[L-1][h{L-1}].p_mark);
if (p_min == 1.0)
    ratelimit();
else
    mark/drop with probabilty p_min;

I did the adaptation of Juliusz code to meet current kernel standards,
and various changes to address previous comments :

http://thread.gmane.org/gmane.linux.network/90225
http://thread.gmane.org/gmane.linux.network/90375

Default flow classifier is the rxhash introduced by RPS in 2.6.35, but
we can use an external flow classifier if wanted.

tc qdisc add dev $DEV parent 1:11 handle 11:  \
        est 0.5sec 2sec sfb limit 128

tc filter add dev $DEV protocol ip parent 11: handle 3 \
        flow hash keys dst divisor 1024

Notes:

1) SFB default child qdisc is pfifo_fast. It can be changed by another
qdisc but a child qdisc MUST not drop a packet previously queued. This
is because SFB needs to handle a dequeued packet in order to maintain
its virtual queue states. pfifo_head_drop or CHOKe should not be used.

2) ECN is enabled by default, unlike RED/CHOKe/GRED

With help from Patrick McHardy & Andi Kleen

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index d4bb6f5..5afee2b 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -522,4 +522,43 @@
 	__u16	offset[TC_QOPT_MAX_QUEUE];
 };
 
+/* SFB */
+
+enum {
+	TCA_SFB_UNSPEC,
+	TCA_SFB_PARMS,
+	__TCA_SFB_MAX,
+};
+
+#define TCA_SFB_MAX (__TCA_SFB_MAX - 1)
+
+/*
+ * Note: increment, decrement are Q0.16 fixed-point values.
+ */
+struct tc_sfb_qopt {
+	__u32 rehash_interval;	/* delay between hash move, in ms */
+	__u32 warmup_time;	/* double buffering warmup time in ms (warmup_time < rehash_interval) */
+	__u32 max;		/* max len of qlen_min */
+	__u32 bin_size;		/* maximum queue length per bin */
+	__u32 increment;	/* probability increment, (d1 in Blue) */
+	__u32 decrement;	/* probability decrement, (d2 in Blue) */
+	__u32 limit;		/* max SFB queue length */
+	__u32 penalty_rate;	/* inelastic flows are rate limited to 'rate' pps */
+	__u32 penalty_burst;
+};
+
+struct tc_sfb_xstats {
+	__u32 earlydrop;
+	__u32 penaltydrop;
+	__u32 bucketdrop;
+	__u32 queuedrop;
+	__u32 childdrop; /* drops in child qdisc */
+	__u32 marked;
+	__u32 maxqlen;
+	__u32 maxprob;
+	__u32 avgprob;
+};
+
+#define SFB_MAX_PROB 0xFFFF
+
 #endif