Linux QOS HFSC: различия между версиями

Материал из noname.com.ua
Перейти к навигацииПерейти к поиску
(Полностью удалено содержимое страницы)
 
(не показана 1 промежуточная версия этого же участника)
Строка 1: Строка 1:
  +
[[Категория:QoS]]
  +
[[Категория:Linux]]
  +
=HFSC=
  +
* http://ace-host.stuart.id.au/russell/files/tc/doc/sch_hfsc.txt
  +
* http://linux-ip.net/articles/hfsc.en/
  +
  +
<PRE>
  +
Tutorial
  +
========
  +
HFSC stands for Hierarchical Fair Service Curve. Unlike its cousins CBQ and HTB it has a rigorous mathematical foundation that delivers
  +
a guaranteed outcome. In practice that means it does a better job than either CBQ or HTB, both of which are essentially best effort guesses at solving
  +
  +
what is a surprisingly complex problem. Both of them get it wrong in subtle ways which shows up as them not meeting their bandwidth and latency guarantees.
  +
</PRE>
  +
  +
HFTC - это Иерархическая Честная Обслуживающая Кривая (что бы это ни значило). В отличии от своих двоюродных братьев CBQ и HTB у HFSC есть строгое математическое обоснование которое дает гарантированный результат. На практике это означает что HFSC делает работу лучше чем CBQ или HTB, которые по сути тратят услилия на решение неожиданно сложных проблем. Оба они (CBQ & HTB) делают это неправильно хитрыми способами что приводит к недополучению полосы пропускания или гарантированных задержек.
  +
<PRE>
  +
  +
The Service Curve
  +
-----------------
  +
Like CBQ and HTB, HFSC assumes the user has defined classes of traffic. One class might be "telnet traffic from customer A" which will be assured low latency, assuming it uses a low total bandwidth. HFSC expects you to use one of filters tc provides to decide which class each packet belongs to.
  +
</PRE>
  +
  +
Как и CBQ & HTB, HFSC предпологает что пользователь определил классы трафика. Один класс может быть "траффик телнета для пользователя А" который гарантирует низкую задержку пока этот класс использует малую полосу пропускания. HFTC ожидает что вы будетет использовать фильты которые предоставляет tc для того что бы определить к какому классу отнести пакет
  +
  +
  +
<PRE>
  +
HFSC provides three types of service to each class, or to put it another way HFSC provides three knobs you can use to define the behaviour of each class:
  +
</PRE>
  +
  +
HFSC предоставляет 3 типа обслуживания каждому классу, или если сказать это по-другому HFSC предоставляет рукоятки которые можно
  +
использовать для определния поведения каждого из классов:
  +
  +
<PRE>
  +
rt Real time. This gives a class a hard guarantees about its real time speed.
  +
</PRE>
  +
  +
Real time. Этот флаг (ручка? =) ) дает классу жесткие гарантии о его real time speed. (примечание: гарантированная скорость?)
  +
  +
<PRE>
  +
ls Link share. If the link has data queued up to be sent (ie it is backlogged), its share of the links capacity is in proportion to this speed. Every class must have a real time or link share definition.
  +
</PRE>
  +
Link share. Если у линка есть данные поставленные в очередь на отправку, емкость линка распределяется пропорционально этой скорости.
  +
Каждый класс должен быть определен как rt или ls (примечание: не помешал бы пример!)
  +
  +
  +
<PRE>
  +
ul Upper limit. The maximum speed link share can send at. It can only used if the class has a link share defined.
  +
</PRE>
  +
Upper limit. Максимальная скорость на котоой ls класс может посылать данные (примечание: ceil?)
  +
  +
<PRE>
  +
The two letter codes above, "rt", "ls", and "ul" is how you identify these three types of service to the tc command. There is
  +
a fourth one: "sc". "sc" is a shorthand for setting both "rt" and "ls" to the same thing.
  +
</PRE>
  +
Двухбуквенные коды выше "rt", "ls" и "ul" это описание как определить эти три уровня обслуживания для команды tc. Существует так же четвертый код: "sc". "sc" это сокращение от "setting both" - установка обоих "rt" и "ls" для одного (класса?)
  +
  +
<PRE>
  +
The speed allocated to each service is specified as a normal speed and an optional burst speed:
  +
  +
SERVICE [m1 BPS d SEC] m2 BPS
  +
</PRE>
  +
Скорость выделенная для каждого сервиса определяется как нормальная скорость и опционально - burst speed
  +
<PRE>
  +
Where:
  +
  +
SERVICE "ls", "rt", "ul" - ie whether this specifies the link sharing, real time or upper limit service.
  +
  +
m2 BPS This is the steady state speed, eg "m2 10kbps" means the steady state allowance for this service class is 10 kilo bytes/sec.
  +
  +
m1 BPS d SEC The burst speed. "m1 20kbps d 10ms" means this class is allowed a burst speed of 20kbps for 10ms, then it must run at its steady state speed.
  +
</PRE>
  +
Где
  +
* SERVICE "ls" "rt" "ul": - определяет тип сервиса
  +
* m2: это "скорость устойчивого состояния", например "m2 10kbps" означает разрешение устойчивого состояния этого класса в 10 кбит в сек. Измеряется в bps
  +
* m1 это burst speed "m1 20kbps d 10ms" означает что классу разрешено burst speed 20kbps на 10ms а затем должен использовать "скорость устойчивого состояни"
  +
<PRE>
  +
In the specification and tc call this is called a "service curve".It is a curve in the same sense that any mathematical function can be plotted as a curve. In the paper this curve is used as the basis for mathematical proofs about the performance of the scheduler. Unless you also want to do mathematical proofs avoid thinking avoid thinking of specification as a curve. It is a speed with an optional latency specification - nothing more.
  +
</PRE>
  +
В спецификации и вызовах tc это называют "сервисной кривой". Это кривая имеет тот же смысл что и математическая функция которая может быть изображена как кривая. Эта кривая используется как математическое доказательство производительности планировщика. Если вы не собираетесь делать математические доказательства не думайте о спецификации как о кривой - это скорость с дополнительной спецификацией задержки и ничего более.
  +
  +
<PRE>
  +
(There is another syntax supported which is documented in the reference section below - "umax BYTES dmax SEC rate BPS". "rate BPS" is identical in meaning to "m2 BPS". The definition of "m2 BPS" does *not* always express specify a burst rate and time. I have not found any other documentation of what it can yield, probably because it is so complex to explain. Such complexity is best avoided - don't use "umax ... dmax ...". The original terms came from the paper, but tc didn't follow the paper's definitions of them to the letter.)
  +
</PRE>
  +
(
  +
Существует другой синтаксис который описан в секции ниже: "umax BYTES dmax SEC rate BPS".
  +
* "rate BPS" соответвует "m2 BPS"
  +
* "m2 BPS" не всегда выражает burst rate и время. Я не нашел другой документации что это означает - вероятно это сложно пояснить. Таких сложностей лучше всего избежать - не используйте "umax ... dmax ...". Оригинальные термины пришли с бумаги но tc не следовал этим определениям.
  +
)
  +
  +
<PRE>
  +
Burst and Latency
  +
-----------------
  +
  +
The term "burst" hides more than it reveals. Yes, it specifies bursts, but it is not used to control bursts of data. It is used to control latency. This is not some minor quibble about naming conventions. Wrapping your head around this is fundamental to understanding HFSC.
  +
</PRE>
  +
Термин burst скрывает больше чем поясняет. Да он определяет burst но он не используется для burst данных а используется для контроля latency.
  +
Это не каламбур - это фундамент для понимания HFSC
  +
<PRE>
  +
But before going on, there is an even more important corollary to it being about latency. Most traffic doesn't care about latency. Web traffic doesn't, email doesn't. The two classes of traffic it might matter to are real time (like VOIP and NTP), and interactive traffic (like telnet, ssh, VNC, RDP and gaming). If you don't care about these do yourself a favour: don't use burst and skip this
  +
section entirely.
  +
</PRE>
  +
Прежде чем продолжить есть еше более важное замечание касающееся latency. Большая часть трафика не чувствительна к задержкам. Например web и email - не чувствительны. Real time имеет значение для VOIP NTP и интерактивного трафика - если вы не беспокоитесь об этом не используйте burst.
  +
<PRE>
  +
And a caveat. Any effort to control latency will fail abysmally (as in be a complete and utter waste of your time and your CPU's cycles) unless class doing the controlling is the slowest link in the chain. The moment any hop between source and destination starts buffering packets HFSC's efforts to constraint latency will be overwhelmed by delays caused by the buffering. In case it isn't obvious, this means you can never control ingress latency.
  +
</PRE>
  +
И напротив любые попытки контролировать задержки не пройдут. Это будет пустой тратой времени и циклов процесора если класс не контролирует самый едленный линк в цеочке. В момент когда хоп (роутрер) начинает буферизировать пакеты HFSC перестает контролировать задержку. Это неявно означает что вы никогда не сможете контролировать входную задержку (эту секцию написал Капитан очевидность)
  +
<PRE>
  +
To see how the burst speed effects latency, imagine we have two
  +
classes, say ssh and web, which we are happy to devote 50% of the
  +
link to over the long term.
  +
  +
When the link is idle the HFSC scheduler is given a 1499 byte web packet and a 1500 byte ssh
  +
packet to send. HFSC must decide which one to send one first, and
  +
after it does the other will be a backlog, awaiting it's turn to
  +
be sent.
  +
It is only if the link will become backlogged that the
  +
burst speed comes into play, and even then it is only for a
  +
short while - the length of time being specified by the SEC in
  +
"m1 BPS d SEC". In effect all it does is determine which of these
  +
packets get sent first. And that is how we get to latency -
  +
because whichever is sent first, ssh or web, will be perceived to
  +
have the lower latency.
  +
</PRE>
  +
Что бы увидеть как burst speed отражается на задержке представьте у нас есть 2 класса, скажем ssh и web которым мы рады выделить 50% полосы пропускания на длительный срок.
  +
<BR>Когда линк простаивает и планировщик HFSC получает на отправку 1499 байтовый пакет от web и 1500 байтовый пакет от класса ssh и HFSC должен решить который из пакетов отправить первым и после отправки другой пакет будет отложен ожидая своей очереди на отправку.
  +
  +
Только если есть отложенные данные burst speed вступакет в игру и даже тогда только на кроткое время - на время которое было определено в SEC в "m1 BPS d SEC". Эфект который жто дает - это определение какой иэ этих пакетов был отправлен первым. И это как мы получаем задержку потому что (пакет) отправленный первым будь то ssh или web будет иметь меньшую задержку.
  +
  +
<PRE>
  +
To figure out which packet to send HFSC calculates how long each will take to send, and sends the one that will finish sending first. The twist is it doesn't use link's raw speed to calculate this time, it uses the speed for the service. All three services (link share, real time and upper limit) do their calculations independently.
  +
</PRE>
  +
Что бы определить какой пакет отправлять HFSC вычисляет как долго займет отправка каждого и отравляет тот, отправка которого завершитья раньше.
  +
Обман в том что скорость линка не используется для вычисоения этого времени, а используется скорость определнная для сервиса. Все три сервиса (link share, real time и upper limit) делают вычисления независимо
  +
<PRE>
  +
Some examples will hopefully make this clear. Lets allocate web to classid 1:1, and ssh to classid 1:2, and say we have a 1mbps link.
  +
</PRE>
  +
Несколько примеров которые надеюсь сделают это все более понятным. Давайте создадим класс web (1:1) и ssh (1:2) и скажем что у нас 1мбитный линк (тут пояснить!!!!)
  +
<PRE>
  +
Example 1.
  +
  +
tc class add dev eth0 parent 1:0 classid 1:1 hfsc \
  +
ls m2 500kbps # web
  +
tc class add dev eth0 parent 1:0 classid 1:2 hfsc \
  +
ls m2 500kbps # ssh
  +
</PRE>
  +
Примечание:
  +
  +
<PRE>
  +
  +
We haven't set a burst speed, so the time it takes to send the
  +
packets is:
  +
  +
web 1499 / 500000 = 0.002998 sec
  +
ssh 1500 / 500000 = 0.003000 sec
  +
  +
So in this example web takes the shortest time to send, so it
  +
will it be send first and have the lower latency.
  +
</PRE>
  +
  +
  +
<PRE>
  +
Example 2.
  +
  +
Many would prefer ssh traffic to have lower latency than web.
  +
This will achieve it:
  +
  +
tc class add dev eth0 parent 1:0 classid 1:1 hfsc \
  +
ls m1 100kbps d 1.5ms m2 500kbps # web
  +
tc class add dev eth0 parent 1:0 classid 1:2 hfsc \
  +
ls m1 900kbps d 1.5ms m2 500kbps # ssh
  +
  +
Since this is the start of a backlog period it also marks the
  +
start of a bust period, so the bust speed in in effect. Thus
  +
we do that same calculation as in Example 1, but with burst
  +
speeds:
  +
  +
web 1499 / 100000 = 0.014990 sec
  +
ssh 1500 / 900000 = 0.001667 sec
  +
  +
In this scenario ssh will be sent first and thus appear to have
  +
the lower latency, which is what we wanted. In fact it because
  +
of the ratios chosen (100kbps for web versus 900kbps for ssh) a
  +
ssh packet would have to be 10 times larger than a web packet
  +
before it got sent second.
  +
</PRE>
  +
  +
<PRE>
  +
The burst must be long enough to send the entire packet. That is
  +
what determined the 1.5ms above. Our link runs at 1mbps, so it
  +
takes (1500 / 1000000 = 1.5ms) to send a Maximum Transmit Unit
  +
(MTU) sized packet, assuming the MTU is 1500 bytes.
  +
</PRE>
  +
  +
  +
<PRE>
  +
Also notice the burst speed for web doesn't look like a "burst"
  +
at all. It is far slower than the steady state speed web normally
  +
receives. This is because it really *is* being used to control
  +
latency, not to specify a burst.
  +
</PRE>
  +
  +
<PRE>
  +
Example 3.
  +
  +
That worked, but not as well as you hoped. You recall TCP can
  +
send multiple packets before waiting for a reply and you would
  +
like them all sent before web gets to send it's first packet.
  +
You also estimate you have 10 concurrent users all doing the same
  +
thing. From a packet capture you decide 4 sending packets before
  +
waiting for a reply is typical.
  +
  +
10 users by 4 packets each means 40 MTU sized packets. Thus you
  +
must adjust the burst speed, so ssh gets 40 times the speed of
  +
web and you must allow for 40 MTU sized packets in the burst
  +
time:
  +
  +
tc class add dev eth0 parent 1:0 classid 1:1 hfsc \
  +
ls m1 24kbps d 60ms m2 500kbps # web
  +
tc class add dev eth0 parent 1:0 classid 1:2 hfsc \
  +
ls m1 975kbps d 60ms m2 500kbps # ssh
  +
  +
Note that 975:24 is 40:1, and (40 * 1500 / 1mbps = 60).
  +
  +
If a class's burst speed is truly a burst (ie allowed it to send
  +
faster than it's normal steady state speed), it will only be
  +
allowed to do it if it has been sending at less than it's steady
  +
state speed for long enough to accrue the headroom required for the
  +
burst. To put it another way, a class will not be allowed a burst
  +
if over the long term it would mean it is running over it's steady
  +
state speed.
  +
</PRE>
  +
  +
<PRE>
  +
On the other hand if a classes "burst" is really a "go slow" than
  +
(like web's) it can be have its bandwidth pinched at any time by a
  +
class (like ssh) that is allowed a burst, thus giving it truly bad
  +
latency. In this sense ssh's good latency doesn't come for free.
  +
It gains it at the expense of web having bad latency. This will
  +
always be true, and is conceptually no different to the trade off
  +
you make for bandwidth. If you guarantee web good bandwidth over
  +
the long term, then some traffic (eg email) must be getting bad
  +
bandwidth when web is using it. Similarly web's latency will be
  +
bad when ssh is making use of its latency advantage. However,
  +
because HFSC guarantees that ssh will not exceed it's steady state
  +
bandwidth over the long term web will still average at least the
  +
bandwidth you allocated it. In the HFSC paper, they say the
  +
latency and bandwidth specifications are decoupled, meaning over
  +
the long term one has no effect on the other.
  +
</PRE>
  +
  +
<PRE>
  +
Finally, since HFSC favours packets that can be sent quickly it
  +
favours sending small packets before large ones. By happy
  +
coincidence this is generally what you want. VOIP, DNS, NTP, TCP
  +
startup and ACKs are all small. Since the advantage you get from
  +
giving a class a high burst is measured in milliseconds, not
  +
seconds, specifying a burst for link sharing often isn't worth the
  +
effort.
  +
</PRE>
  +
  +
  +
<PRE>
  +
The meaning of "real time"
  +
--------------------------
  +
  +
tc-hfsc's web page follows the terminology in the paper and says
  +
"real time" gives guaranteed bandwidth. Web pages talking about
  +
HFSC take this to imply link share does not give guaranteed
  +
bandwidth. This is true, but misleading. The "guarantees" given
  +
by the alternatives (like CBQ and HTB) are no better.
  +
</PRE>
  +
  +
  +
<PRE>
  +
The real time gives you something nothing else does: a hard
  +
deadline. Link share's will latency will on occasion be out by a
  +
100's of milliseconds, very occasionally seconds. But notice we
  +
are talking seconds and milliseconds here. If you do not have
  +
traffic that cares about millisecond delays then you don't care
  +
about real time. There are very few traffic classes that do care -
  +
only VOIP and maybe NTP spring to mind. In particular if you were
  +
a satisfied user of HTB or CBQ, you don't care about real time. If
  +
you don't care about real time do yourself a favour: don't use it,
  +
and stop reading this section.
  +
</PRE>
  +
  +
<PRE>
  +
Real time's guarantees do not come for free, nor are they what they
  +
seem.
  +
</PRE>
  +
  +
<PRE>
  +
The first issue is in order to give hard real time guarantees the
  +
real time service must take into account all data a class has sent
  +
since the link was brought up. If the class has has exceeded its
  +
real time steady state speed in the past (probably by using
  +
bandwidth when no one else wanted it) then it can be denied *any*
  +
bandwidth by the real time class until it gets under its long term
  +
steady state speed. In practice this probably won't be an issue,
  +
but being penalised for using bandwidth no one else wanted it
  +
usually thought of as "unfair" (yes this is a well defined term
  +
when dealing with quality of service), and thus the real time is an
  +
unfair allocator of bandwidth.
  +
</PRE>
  +
  +
<PRE>
  +
The second issue is in order to provide it's guarantees, real time
  +
may have keep the link idle in case a high priority packet needs to
  +
be sent in order to maintain latency guarantees. Ego there may be
  +
times when a packet is waiting to be sent, but the link is idle,
  +
thus the link will be artificially constrained to below its rated
  +
capacity. A service that always takes advantage of the links rated
  +
capacity is called work preserving. Link share is work preserving.
  +
Real time isn't. There is a corollary to this. Because real time
  +
delays sending packets it's accuracy can depend on the resultion of
  +
the kernels timers. For example, if your hardware + kernel only
  +
supprts a 100hz timer, the best latency resolution you can get is
  +
10 milliseconds. Common modern hardware provides very accurate
  +
timers, but it's worth your while checking if you are taking the
  +
road less travelled.
  +
</PRE>
  +
  +
<PRE>
  +
The third issue is real time provides a hard latency guarantee in
  +
the sense that latency is bounded to an absolute maximum. However
  +
that absolute maximum isn't necessarily what you specified when you
  +
said "m1 12500 d 2ms". You might expect that guarantees a 250
  +
byte packet will be sent within 2ms. It doesn't, unless you do not
  +
use link share or put any other qdisc between HFSC and the device.
  +
Recall that real time will deliberately hold the line idle in order
  +
to give it's guarantee. The issue is link share will sneak in and
  +
send a packet over that idle link while real time isn't watching.
  +
In practice this means real time's latency guarantees can be out by
  +
the time it takes to send one MTU sized packet. So the guarantee
  +
isn't 2ms. It's (2ms + time to send MTU sized packet). You can of
  +
course compensate for this by specifying the time as (2ms - time to
  +
send MTU sized packet), provided your link is fast enough to send
  +
MTU sized packet + 250 bytes within 2ms.
  +
</PRE>
  +
  +
</PRE>
  +
The fourth issue is real time says nothing about what to do with
  +
the links spare capacity. If you don't use link share as well,
  +
the guaranteed minimum also becomes a maximum.
  +
</PRE>
  +
  +
<PRE>
  +
The fifth issue is unlike link share, the bandwidths you give
  +
to real time must be accurate, and their sum must be below the
  +
links capacity. This means the sum of the burst speeds must be
  +
below the links capacity, and the sum of the steady state speeds
  +
must also be below the links capacity. Ensuring this if don't you
  +
use the same value for "d" or "dmax" in every class requires
  +
solving multiple linear equations.
  +
</PRE>
  +
  +
<PRE>
  +
The sixth issue is if you use up all your bandwidth with real time
  +
guarantees, then when the link is backlogged link share will get
  +
no share, as in will never send a packet and all that goes along
  +
with that - like TCP connections timing out and dieing.
  +
</PRE>
  +
  +
<PRE>
  +
In summary, real time good for one thing: guaranteeing hard
  +
latencies. So specifying a real time service without a burst time
  +
is a waste of time. The converse is also true: real time is
  +
absolutely useless for anything but guaranteeing hard latencies.
  +
Everything else is better done using link share.
  +
</PRE>
  +
  +
  +
<PRE>
  +
Using link share and upper limit
  +
--------------------------------
  +
  +
Under link share the speed a backlogged class can send depends on
  +
ratio of its speed to other backlogged classes. (If class isn't
  +
backlogged then by definition it's sending packets as fast as it
  +
wants, so link share doesn't need to arbitrate on it's behalf.)
  +
Link share does one thing: when the underlying device says its
  +
ready to send a packet, it looks at what classes that is furtherest
  +
away from the proportion of the link it should have, and sends its
  +
packet.
  +
</PRE>
  +
  +
<PRE>
  +
An example. Lets say we have classes A, B and C, who have link
  +
share specifications of:
  +
  +
class A: hfsc ls rate 10kbps
  +
class B: hfsc ls rate 20kbps
  +
class C: hfsc ls rate 30kbps
  +
  +
A and B are sending packets as fast as they can, so they will be
  +
backlogged. C is idle. The ratio of the speeds ratio of the
  +
backlogged classes, A and B, is 10 to 20, ie 1 to 2. So link share
  +
ensures for every byte A sends, B sends 2.
  +
  +
It's so simple it's important to note what this calculation doesn't
  +
depend on:
  +
</PRE>
  +
  +
<PRE>
  +
A. It doesn't depend on the speed of the underlying link. Link
  +
share doesn't need to know what that speed is, and the numbers
  +
you give it need have no relationship to that speed. Only the
  +
ratios matter.
  +
</PRE>
  +
  +
<PRE>
  +
B. This implies link share is immune to link speed changing during
  +
the day, as real internet connections often do.
  +
</PRE>
  +
  +
<PRE>
  +
C. Link share doesn't care who sends the packets - it or real
  +
time. They are all counted the same way.
  +
</PRE>
  +
  +
  +
There are some complications as well:
  +
  +
A. Link share will not give a class more than the upper limit.
  +
Real time does not respect upper limit, but since link share
  +
counts what real time sends going over the upper limit using
  +
real time will be penalised later by link share.
  +
  +
B. Link share only gets what's left over after real time sends
  +
it's packets. To link share the bandwidth used by real time
  +
looks like the link speed varying a bit faster than it would
  +
otherwise.
  +
  +
C. A class is allowed to exceed its steady state speed by using
  +
its burst speed when it first becomes backlogged. Once the
  +
burst allowance is exhausted it can not exceed the steady state
  +
speed until the backlog has been completely cleared.
  +
  +
And there is one negative. Link share will send packets as fast as
  +
the underlying device will let it. If the underlying device isn't
  +
the slowest thing between it and the destination link share won't
  +
be the thing controlling what packet gets sent next - it will
  +
be whatever scheduler is running on the slowest device. Typically
  +
on a Linux box the device HFSC is controlling is an Ethernet NIC
  +
that runs at giggabits per second. The slowest thing is probably
  +
the thing it is connected to - the modem, so unless do something
  +
the modem will managing traffic, not HFSC. If your Linux box isn't
  +
controlling an Ethernet NIC its most likely attempting to schedule
  +
traffic on a virtual device. Under Linux virtual network devices
  +
like tunnels, ifb (used for ingress scheduling), and tun/tap
  +
devices processing their packets as soon as they receive them. To
  +
a scheduler they appear to be infinitely fast. This means on a
  +
typical Linux setup not only will link share have absolutely no
  +
effect on the outgoing traffic, it's propensity to flood the link
  +
will destroys any chance real time had at controlling latency.
  +
  +
To fix this, you must limit the rate link share can send using
  +
upper limit. Remember, for link share to be effective the limit
  +
must be lower than the slowest link between the source and
  +
destination. Upper limit has one other use - constraining a
  +
customer to the bandwidth they paid for.
  +
  +
Finally, link share can't give hard latency guarantees. So its
  +
burst numbers don't mean "send an ssh packet within 10
  +
milliseconds". Instead they mean "if ssh has been quiet for a
  +
while, briefly increase the ratio at which ssh can send packets,
  +
so it gets better latency than competing classes". That said,
  +
if you follow these rules link share will get reasonably close
  +
to the latency you ask for:
  +
  +
a. For classes that use real time, you use exactly the same
  +
link share specification. Recall "sc" does this.
  +
  +
b. The total bandwidth you allocate for link share doesn't
  +
exceed the links capacity or upper limit. This must be
  +
true for both burst and steady state speeds.
  +
  +
  +
The Class Hierarchy
  +
-------------------
  +
  +
Like HTB and CBQ, HFSC allows you to create a hierarchy of classes.
  +
This is an example:
  +
  +
+1:0---------+
  +
| qdisc |
  +
| 1500kbit |
  +
+------------+
  +
/ \
  +
+-1:10------+ +-1:20------+
  +
| User A | | User B |
  +
| 50% share | | 50% share |
  +
| Unlimited | | 1000kbit |
  +
+-----------+ +-----------+
  +
/ \
  +
+-1:11-------+ +-1:12---------+
  +
| Web | | VOIP |
  +
| 100% share | | 20ms Latency |
  +
| Unlimited | | 100kbit |
  +
+------------+ +--------------+
  +
  +
Intuitively this does the following:
  +
  +
- User A can use all of the links bandwidth.
  +
- All User A's spare bandwidth goes towards web browsing.
  +
- User A has VOIP which must have low latency.
  +
- User B can use a maximum of 1000kbit.
  +
- User B gets 50/50 share of a congested link.
  +
  +
Up till now we have been discussed the leaf nodes sharing a common
  +
parent (inner node) interact. This is how the hierarchy effects
  +
each type HFSC service:
  +
  +
ls A parent enforces its link sharing allocation on its
  +
children. In other words the siblings get to fight over
  +
the proportion of the shared bandwidth allocated to their
  +
parent, and the parent and it's siblings get to fight over
  +
whatever their ancestors link sharing allocations are.
  +
  +
rt The hierarchy does not effect the bandwidth allocated by the
  +
real time service any way. rt specifications for inner nodes
  +
are ignored.
  +
  +
ul A parent enforces its upper limit allocation on its children.
  +
In other words the sum of the bandwidth used by children's
  +
link sharing service will never exceed any of their ancestors
  +
upper limits.
  +
  +
In practice, for most devices, you will need one class enforcing an
  +
upper limit so link share doesn't flood the link and thus destroy
  +
HFSC's ability to schedule. That class will be the root, of
  +
course.
  +
  +
With those definition in hand it this is how the above hierarchy
  +
is implemented:
  +
  +
#
  +
# Save typing when creating classes.
  +
#
  +
ca() {
  +
par=$1; class=$2; shift 2
  +
tc class add dev eth0 parent 1:$par classid 1:$class hfsc "$@"
  +
}
  +
#
  +
# Create the qdisc. Send unclassified traffic to 1:11.
  +
# If "default" isn't given unclassified traffic is dropped.
  +
#
  +
tc qdisc add dev eth0 root handle 1:0 hfsc default 11
  +
ca 0 1 ls m2 1500kbit ul m2 1500kbit # Stop link flooding
  +
#
  +
# Create classes representing each user.
  +
#
  +
ca 1 10 ls m2 750kbit # User A
  +
ca 1 20 ls m2 750kbit ul m2 1000kbit # User B
  +
#
  +
# User A's web traffic.
  +
#
  +
ca 10 11 ls m2 1500kbit # User A web
  +
#
  +
# VOIP. We said VOIP uses 100kbit, and required 20ms latency.
  +
# We also know that HFSC can break its rt latency guarantees
  +
# by one MTU packet. Assuming MTU is 1500 bytes, it takes:
  +
#
  +
# 1500 [byte] * 8 [bits/byte] / 100000 [bits/sec] = 12 msec
  +
#
  +
# to send an MTU sized packet. That means we must send the
  +
# VOIP packet within 8ms (= 20ms - 12ms) to meet the latency
  +
# requirements.
  +
#
  +
# The requirements didn't say how big a VOIP packet is, but given
  +
# it has to send a new packet every 20ms the biggest it could be
  +
# is:
  +
# 100000 [bit/sec] / 8 [bit/byte] * 0.02 [sec/pkt] = 250 byte/pkt
  +
# So, we have 8ms to send 250 bytes, which works out to be:
  +
# 250 [bytes] * 8 [bit/byte] / 0.008 [sec] = 250kbit
  +
#
  +
ca 10 12 rt m1 250kbit d 8ms m2 100kbit # User A VOIP
  +
  +
  +
  +
Reference
  +
  +
=========
  +
  +
Qdisc Parameters
  +
----------------
  +
  +
default <CLASSID>
  +
Send unclassified packets to <classid>. If not supplied
  +
unclassified packets are dropped.
  +
  +
Class Parameters
  +
----------------
  +
  +
ls <SERVICE-CURVE>
  +
Apportions how much of a backlogged link's spare capacity
  +
should be allocated to this class and its children if a packet
  +
isn't being sent under the rt <SERVICE-CURVE>. The bandwidth
  +
allocated by <SERVICE-CURVE> of all sibling classes wanting to
  +
share the excess capacity is summed, then the ratio of the
  +
links capacity after real time to the total bandwidth consumed
  +
this class and all it's offspring is equal to the ratio the
  +
classes <SERVICE-CURVE> bandwidth to the sum. When calculating
  +
<SERVICE-CURVE>, <BURST-SEC> is measured from the last time
  +
this class was backlogged, ie had packets waiting to be
  +
transmitted. Every leaf class must have a ls or rt (or both)
  +
<SERVICE-CURVE> specified. Every inner class must be given an
  +
ls <SERVICE-CURVE>.
  +
  +
rt <SERVICE-CURVE>
  +
<SERVICE-CURVE> sets the amount of bandwidth a leaf class is
  +
guaranteed. rt is ignored for inner classes. This amount of
  +
bandwidth is always available to it regardless of the other
  +
classes use of the link or the maximum bandwidth allowed by
  +
this class or its parents. When calculating <SERVICE-CURVE>,
  +
the calculation of the start time of <BURST-BPS> is complex.
  +
Roughly, it is the last time the class was backlogged and was
  +
under the guaranteed rate. Every leaf class must have a ls or
  +
rt <SERVICE-CURVE> specified.
  +
  +
ul <SERVICE-CURVE>
  +
For classes with a ls curve, set the maximum bandwidth this
  +
class and its children are permitted to use. This only effects
  +
sending packets under the ls <SERVICE-CURVE>, but packets set
  +
under the rt curve are included in the bandwidth calculation.
  +
When calculating <SERVICE-CURVE>, <BURST-SEC> is measured from
  +
the last time this class was backlogged, ie had packets waiting
  +
to be transmitted.
  +
  +
<SERVICE-CURVE>
  +
---------------
  +
  +
A service curve specifies how much bandwidth a class can use. It
  +
has an optional initial burst rate, followed by a steady state.
  +
The start time of the burst depends on the service type (rt, ul or
  +
ls). The are two syntaxes available for describing a service
  +
curve. In both you use the usual tc units to designate speed,
  +
data size and time.
  +
  +
[m1 <BURST-SPEED> d <BURST-SEC>] m2 <STEADY-SPEED>
  +
<BURST-SPEED> is the speed of the burst, and <BURST-SEC> is the
  +
length. Thereafter the class sends at <STEADY-SPEED>.
  +
  +
[umax <BURST-BYTES> dmax <BURST-DSEC>] rate <STEADY-SPEED>
  +
"rate" has an identical meaning to "m2" above, so it specifies
  +
the long term speed the class is entitled to. To determine what
  +
"umax ... dmax ..." means, an implied burst speed is calculated
  +
using "<BURST-BYTES> / <BURST-DSEC>". If this burst speed larger
  +
than <STEADY-SPEED> then this syntax is equivalent to "m1
  +
(<BURST-BYTES> / <BURST-DESC>) d <BURST-DSEC> m2 <STEADY-SPEED>"
  +
where the stuff in (...) is calculated. If the implied burst
  +
speed is not greater than <STEADY-SPEED> then during the burst
  +
period effective specification becomes "m1 0 d (<BURST-DSEC> -
  +
<BURST-BYTES> / <STEADY-SPEED>) m2 <STEADY-SPEED>". This means
  +
the class is not allowed to send any data at all during the
  +
burst period. The burst period is the time left over after
  +
if <BURST-BYTES> were sent at <STEADY-SPEED>.
  +
  +
  +
Classes. Traffic is assigned to HFSC classes using a filter.
  +
  +
Scheduling. The HFSC scheduler sends packets required to meet
  +
guaranteed bandwidth specified by rt, but only if those requirements
  +
would be broken by not sending them. In that case it sends the
  +
packet that will complete sending closest to its deadline. If not
  +
forced to send packets required by rt it will send a packet according
  +
to the ls criteria, choosing the packet that will complete sending
  +
closest to its deadline. For both rt and ls, time is measured with a
  +
clock running at one tick per bit per second of bandwidth allocated
  +
by the <SERVICE-CURVE>. Thus specifications with higher bandwidth
  +
appear to have their deadlines arrive sooner, and packets appear to
  +
be waiting longer and thus are sent sooner in actual time.
  +
  +
Policing. HFSC discard packets that aren't classified unless
  +
the default classid is specified to the qdisc.
  +
  +
Rate Limiting. HFSC restricts classes to the rate limit imposed by
  +
the ul specification, and the rate limit imposed by its ancestor
  +
classes on their children. Classes that have exceeded their
  +
real time <SERVICE-CURVE> in the past may be rate limited during if
  +
they don't also have a link share and the link is backlogged.
  +
  +
Classifier. The HFSC queuing discipline does not classify packets.
  +
</PRE>

Текущая версия на 08:40, 30 октября 2023

HFSC

Tutorial
========
HFSC stands for Hierarchical Fair Service Curve.  Unlike its  cousins CBQ and HTB it has a rigorous mathematical foundation that delivers 
a guaranteed outcome.  In practice that means it does a better job than either CBQ or HTB, both of which are essentially best effort guesses at solving 

what is a surprisingly complex problem.  Both of them get it wrong in subtle ways which shows up as them not meeting their bandwidth and latency guarantees.

HFTC - это Иерархическая Честная Обслуживающая Кривая (что бы это ни значило). В отличии от своих двоюродных братьев CBQ и HTB у HFSC есть строгое математическое обоснование которое дает гарантированный результат. На практике это означает что HFSC делает работу лучше чем CBQ или HTB, которые по сути тратят услилия на решение неожиданно сложных проблем. Оба они (CBQ & HTB) делают это неправильно хитрыми способами что приводит к недополучению полосы пропускания или гарантированных задержек.


The Service Curve
-----------------
Like CBQ and HTB, HFSC assumes the user has defined classes of traffic.  One class might be "telnet traffic from customer A" which will be assured low latency, assuming it uses a low total bandwidth.  HFSC expects you to use one of filters tc provides to decide which class each packet belongs to.

Как и CBQ & HTB, HFSC предпологает что пользователь определил классы трафика. Один класс может быть "траффик телнета для пользователя А" который гарантирует низкую задержку пока этот класс использует малую полосу пропускания. HFTC ожидает что вы будетет использовать фильты которые предоставляет tc для того что бы определить к какому классу отнести пакет


HFSC provides three types of service to each class, or to put it another way HFSC provides three knobs you can use to define the behaviour of each class:

HFSC предоставляет 3 типа обслуживания каждому классу, или если сказать это по-другому HFSC предоставляет рукоятки которые можно использовать для определния поведения каждого из классов:

  
    rt  Real time.  This gives a class a hard guarantees about its real time speed.

Real time. Этот флаг (ручка? =) ) дает классу жесткие гарантии о его real time speed. (примечание: гарантированная скорость?)

    ls  Link share.  If the link has data queued up to be sent (ie it is backlogged), its share of the links capacity is in proportion to this speed.  Every class must have a real time or link share definition.

Link share. Если у линка есть данные поставленные в очередь на отправку, емкость линка распределяется пропорционально этой скорости. Каждый класс должен быть определен как rt или ls (примечание: не помешал бы пример!)


   
    ul  Upper limit.  The maximum speed link share can send at.  It can only used if the class has a link share defined.

Upper limit. Максимальная скорость на котоой ls класс может посылать данные (примечание: ceil?)

        
  The two letter codes above, "rt", "ls", and "ul" is how you  identify these three types of service to the tc command.  There is
  a fourth one: "sc".  "sc" is a shorthand for setting both "rt" and "ls" to the same thing.

Двухбуквенные коды выше "rt", "ls" и "ul" это описание как определить эти три уровня обслуживания для команды tc. Существует так же четвертый код: "sc". "sc" это сокращение от "setting both" - установка обоих "rt" и "ls" для одного (класса?)

  The speed allocated to each service is specified as a normal speed  and an optional burst speed:

    SERVICE [m1 BPS d SEC] m2 BPS

Скорость выделенная для каждого сервиса определяется как нормальная скорость и опционально - burst speed

  Where:

    SERVICE     "ls", "rt", "ul" - ie whether this specifies the link sharing, real time or upper limit service.

    m2 BPS       This is the steady state speed, eg "m2 10kbps" means the steady state allowance for this service class is 10 kilo bytes/sec.

    m1 BPS d SEC The burst speed.  "m1 20kbps d 10ms" means this class is allowed a burst speed of 20kbps for 10ms, then it must run at its steady state speed.

Где

  • SERVICE "ls" "rt" "ul": - определяет тип сервиса
  • m2: это "скорость устойчивого состояния", например "m2 10kbps" означает разрешение устойчивого состояния этого класса в 10 кбит в сек. Измеряется в bps
  • m1 это burst speed "m1 20kbps d 10ms" означает что классу разрешено burst speed 20kbps на 10ms а затем должен использовать "скорость устойчивого состояни"
  In the specification and tc call this is called a "service curve".It is a curve in the same sense that any mathematical function can be plotted as a curve.  In the paper this curve is used as the  basis for mathematical proofs about the performance of the  scheduler.  Unless you also want to do mathematical proofs avoid thinking avoid thinking of specification as a curve.  It is a speed with an optional latency specification - nothing more.

В спецификации и вызовах tc это называют "сервисной кривой". Это кривая имеет тот же смысл что и математическая функция которая может быть изображена как кривая. Эта кривая используется как математическое доказательство производительности планировщика. Если вы не собираетесь делать математические доказательства не думайте о спецификации как о кривой - это скорость с дополнительной спецификацией задержки и ничего более.

  (There is another syntax supported which is documented in the  reference section below - "umax BYTES dmax SEC rate BPS".  "rate BPS" is identical in meaning to "m2 BPS".  The definition of  "m2 BPS" does *not* always express specify a burst rate and time.  I have not found any other documentation of what  it can yield, probably because it is so complex to explain.  Such complexity is best avoided - don't use "umax ... dmax ...".  The  original terms came from the paper, but tc didn't follow the  paper's definitions of them to the letter.)

( Существует другой синтаксис который описан в секции ниже: "umax BYTES dmax SEC rate BPS".

  • "rate BPS" соответвует "m2 BPS"
  • "m2 BPS" не всегда выражает burst rate и время. Я не нашел другой документации что это означает - вероятно это сложно пояснить. Таких сложностей лучше всего избежать - не используйте "umax ... dmax ...". Оригинальные термины пришли с бумаги но tc не следовал этим определениям.

)

  Burst and Latency
  -----------------

  The term "burst" hides more than it reveals.  Yes, it specifies  bursts, but it is not used to control bursts of data.  It is used  to control latency.  This is not some minor quibble about naming  conventions.  Wrapping your head around this is fundamental to  understanding HFSC.

Термин burst скрывает больше чем поясняет. Да он определяет burst но он не используется для burst данных а используется для контроля latency. Это не каламбур - это фундамент для понимания HFSC

  But before going on, there is an even more important corollary to  it being about latency.  Most traffic doesn't care about latency.  Web traffic doesn't, email doesn't.  The two classes of traffic it  might matter to are real time (like VOIP and NTP), and interactive  traffic (like telnet, ssh, VNC, RDP and gaming).  If you don't care  about these do yourself a favour: don't use burst and skip this
  section entirely.

Прежде чем продолжить есть еше более важное замечание касающееся latency. Большая часть трафика не чувствительна к задержкам. Например web и email - не чувствительны. Real time имеет значение для VOIP NTP и интерактивного трафика - если вы не беспокоитесь об этом не используйте burst.

  And a caveat.  Any effort to control latency will fail abysmally  (as in be a complete and utter waste of your time and your CPU's  cycles) unless class doing the controlling is the slowest link in  the chain.  The moment any hop between source and destination  starts buffering packets HFSC's efforts to constraint latency  will be overwhelmed by delays caused by the buffering.  In case it  isn't obvious, this means you can never control ingress latency.

И напротив любые попытки контролировать задержки не пройдут. Это будет пустой тратой времени и циклов процесора если класс не контролирует самый едленный линк в цеочке. В момент когда хоп (роутрер) начинает буферизировать пакеты HFSC перестает контролировать задержку. Это неявно означает что вы никогда не сможете контролировать входную задержку (эту секцию написал Капитан очевидность)

  To see how the burst speed effects latency, imagine we have two
  classes, say ssh and web, which we are happy to devote 50% of the
  link to over the long term.  

When the link is idle the HFSC  scheduler is given a 1499 byte web packet and a 1500 byte ssh
  packet to send.  HFSC must decide which one to send one first, and
  after it does the other will be a backlog, awaiting it's turn to
  be sent.  
It is only if the link will become backlogged that the
  burst speed comes into play, and even then it is only for a
  short while - the length of time being specified by the SEC in
  "m1 BPS d SEC".  In effect all it does is determine which of these
  packets get sent first.  And that is how we get to latency -
  because whichever is sent first, ssh or web, will be perceived to
  have the lower latency.

Что бы увидеть как burst speed отражается на задержке представьте у нас есть 2 класса, скажем ssh и web которым мы рады выделить 50% полосы пропускания на длительный срок.
Когда линк простаивает и планировщик HFSC получает на отправку 1499 байтовый пакет от web и 1500 байтовый пакет от класса ssh и HFSC должен решить который из пакетов отправить первым и после отправки другой пакет будет отложен ожидая своей очереди на отправку.

Только если есть отложенные данные burst speed вступакет в игру и даже тогда только на кроткое время - на время которое было определено в SEC в "m1 BPS d SEC". Эфект который жто дает - это определение какой иэ этих пакетов был отправлен первым. И это как мы получаем задержку потому что (пакет) отправленный первым будь то ssh или web будет иметь меньшую задержку.

  To figure out which packet to send HFSC calculates how long each  will take to send, and sends the one that will finish sending  first.  The twist is it doesn't use link's raw speed to calculate  this time, it uses the speed for the service.  All three services  (link share, real time and upper limit) do their calculations  independently.

Что бы определить какой пакет отправлять HFSC вычисляет как долго займет отправка каждого и отравляет тот, отправка которого завершитья раньше. Обман в том что скорость линка не используется для вычисоения этого времени, а используется скорость определнная для сервиса. Все три сервиса (link share, real time и upper limit) делают вычисления независимо

  Some examples will hopefully make this clear.  Lets allocate web to  classid 1:1, and ssh to classid 1:2, and say we have a 1mbps link.

Несколько примеров которые надеюсь сделают это все более понятным. Давайте создадим класс web (1:1) и ssh (1:2) и скажем что у нас 1мбитный линк (тут пояснить!!!!)

  Example 1.

      tc class add dev eth0 parent 1:0 classid 1:1 hfsc \
        ls m2 500kbps                                           # web
      tc class add dev eth0 parent 1:0 classid 1:2 hfsc \
        ls m2 500kbps                                           # ssh

Примечание:


    We haven't set a burst speed, so the time it takes to send the
    packets is:

      web       1499 / 500000 = 0.002998 sec
      ssh       1500 / 500000 = 0.003000 sec

    So in this example web takes the shortest time to send, so it
    will it be send first and have the lower latency.


  Example 2.

    Many would prefer ssh traffic to have lower latency than web.
    This will achieve it:

      tc class add dev eth0 parent 1:0 classid 1:1 hfsc \
        ls m1 100kbps d 1.5ms m2 500kbps                        # web
      tc class add dev eth0 parent 1:0 classid 1:2 hfsc \
        ls m1 900kbps d 1.5ms m2 500kbps                        # ssh

    Since this is the start of a backlog period it also marks the
    start of a bust period, so the bust speed in in effect.  Thus
    we do that same calculation as in Example 1, but with burst
    speeds:

      web     1499 / 100000 = 0.014990 sec
      ssh     1500 / 900000 = 0.001667 sec

    In this scenario ssh will be sent first and thus appear to have
    the lower latency, which is what we wanted.  In fact it because
    of the ratios chosen (100kbps for web versus 900kbps for ssh) a
    ssh packet would have to be 10 times larger than a web packet
    before it got sent second.
    The burst must be long enough to send the entire packet.  That is
    what determined the 1.5ms above.  Our link runs at 1mbps, so it
    takes (1500 / 1000000 = 1.5ms) to send a Maximum Transmit Unit
    (MTU) sized packet, assuming the MTU is 1500 bytes.


    Also notice the burst speed for web doesn't look like a "burst"
    at all. It is far slower than the steady state speed web normally
    receives.  This is because it really *is* being used to control
    latency, not to specify a burst.
  Example 3.

    That worked, but not as well as you hoped.  You recall TCP can
    send multiple packets before waiting for a reply and you would
    like them all sent before web gets to send it's first packet.
    You also estimate you have 10 concurrent users all doing the same
    thing.  From a packet capture you decide 4 sending packets before
    waiting for a reply is typical.

    10 users by 4 packets each means 40 MTU sized packets.  Thus you
    must adjust the burst speed, so ssh gets 40 times the speed of
    web and you must allow for 40 MTU sized packets in the burst
    time:

      tc class add dev eth0 parent 1:0 classid 1:1 hfsc \
        ls m1  24kbps d 60ms m2 500kbps                         # web
      tc class add dev eth0 parent 1:0 classid 1:2 hfsc \
        ls m1 975kbps d 60ms m2 500kbps                         # ssh

    Note that 975:24 is 40:1, and (40 * 1500 / 1mbps = 60).

  If a class's burst speed is truly a burst (ie allowed it to send
  faster than it's normal steady state speed), it will only be
  allowed to do it if it has been sending at less than it's steady
  state speed for long enough to accrue the headroom required for the
  burst.  To put it another way, a class will not be allowed a burst
  if over the long term it would mean it is running over it's steady
  state speed.
  On the other hand if a classes "burst" is really a "go slow" than
  (like web's) it can be have its bandwidth pinched at any time by a
  class (like ssh) that is allowed a burst, thus giving it truly bad
  latency.  In this sense ssh's good latency doesn't come for free.
  It gains it at the expense of web having bad latency.  This will
  always be true, and is conceptually no different to the trade off
  you make for bandwidth.  If you guarantee web good bandwidth over
  the long term, then some traffic (eg email) must be getting bad
  bandwidth when web is using it.  Similarly web's latency will be
  bad when ssh is making use of its latency advantage.  However,
  because HFSC guarantees that ssh will not exceed it's steady state
  bandwidth over the long term web will still average at least the
  bandwidth you allocated it.  In the HFSC paper, they say the
  latency and bandwidth specifications are decoupled, meaning over
  the long term one has no effect on the other.
  Finally, since HFSC favours packets that can be sent quickly it
  favours sending small packets before large ones.  By happy
  coincidence this is generally what you want.  VOIP, DNS, NTP, TCP
  startup and ACKs are all small.  Since the advantage you get from
  giving a class a high burst is measured in milliseconds, not
  seconds, specifying a burst for link sharing often isn't worth the
  effort.


  The meaning of "real time"
  --------------------------

  tc-hfsc's web page follows the terminology in the paper and says
  "real time" gives guaranteed bandwidth.  Web pages talking about
  HFSC take this to imply link share does not give guaranteed
  bandwidth.  This is true, but misleading.  The "guarantees" given
  by the alternatives (like CBQ and HTB) are no better.


  The real time gives you something nothing else does: a hard
  deadline.  Link share's will latency will on occasion be out by a
  100's of milliseconds, very occasionally seconds.  But notice we
  are talking seconds and milliseconds here.  If you do not have
  traffic that cares about millisecond delays then you don't care
  about real time.  There are very few traffic classes that do care -
  only VOIP and maybe NTP spring to mind.  In particular if you were
  a satisfied user of HTB or CBQ, you don't care about real time.  If
  you don't care about real time do yourself a favour: don't use it,
  and stop reading this section.
  Real time's guarantees do not come for free, nor are they what they
  seem.
  
  The first issue is in order to give hard real time guarantees the
  real time service must take into account all data a class has sent
  since the link was brought up.  If the class has has exceeded its
  real time steady state speed in the past (probably by using
  bandwidth when no one else wanted it) then it can be denied *any*
  bandwidth by the real time class until it gets under its long term
  steady state speed.  In practice this probably won't be an issue,
  but being penalised for using bandwidth no one else wanted it
  usually thought of as "unfair" (yes this is a well defined term
  when dealing with quality of service), and thus the real time is an
  unfair allocator of bandwidth.
  The second issue is in order to provide it's guarantees, real time
  may have keep the link idle in case a high priority packet needs to
  be sent in order to maintain latency guarantees.  Ego there may be
  times when a packet is waiting to be sent, but the link is idle,
  thus the link will be artificially constrained to below its rated
  capacity.  A service that always takes advantage of the links rated
  capacity is called work preserving.  Link share is work preserving.
  Real time isn't.  There is a corollary to this.  Because real time
  delays sending packets it's accuracy can depend on the resultion of
  the kernels timers.  For example, if your hardware + kernel only
  supprts a 100hz timer, the best latency resolution you can get is
  10 milliseconds.  Common modern hardware provides very accurate
  timers, but it's worth your while checking if you are taking the
  road less travelled.
  The third issue is real time provides a hard latency guarantee in
  the sense that latency is bounded to an absolute maximum.  However
  that absolute maximum isn't necessarily what you specified when you
  said "m1 12500 d 2ms".  You might expect that guarantees a 250 
  byte packet will be sent within 2ms.  It doesn't, unless you do not
  use link share or put any other qdisc between HFSC and the device.
  Recall that real time will deliberately hold the line idle in order
  to give it's guarantee.  The issue is link share will sneak in and
  send a packet over that idle link while real time isn't watching.
  In practice this means real time's latency guarantees can be out by
  the time it takes to send one MTU sized packet.  So the guarantee
  isn't 2ms.  It's (2ms + time to send MTU sized packet).  You can of
  course compensate for this by specifying the time as (2ms - time to
  send MTU sized packet), provided your link is fast enough to send
  MTU sized packet + 250 bytes within 2ms.
 The fourth issue is real time says nothing about what to do with
 the links spare capacity.  If you don't use link share as well,
 the guaranteed minimum also becomes a maximum.
  The fifth issue is unlike link share, the bandwidths you give
  to real time must be accurate, and their sum must be below the
  links capacity.  This means the sum of the burst speeds must be
  below the links capacity, and the sum of the steady state speeds
  must also be below the links capacity.  Ensuring this if don't you
  use the same value for "d" or "dmax" in every class requires
  solving multiple linear equations.
  The sixth issue is if you use up all your bandwidth with real time
  guarantees, then when the link is backlogged link share will get
  no share, as in will never send a packet and all that goes along
  with that - like TCP connections timing out and dieing.
  In summary, real time good for one thing: guaranteeing hard
  latencies.  So specifying a real time service without a burst time
  is a waste of time.  The converse is also true: real time is
  absolutely useless for anything but guaranteeing hard latencies.
  Everything else is better done using link share.


  Using link share and upper limit
  --------------------------------

  Under link share the speed a backlogged class can send depends on
  ratio of its speed to other backlogged classes.  (If class isn't
  backlogged then by definition it's sending packets as fast as it
  wants, so link share doesn't need to arbitrate on it's behalf.)
  Link share does one thing: when the underlying device says its
  ready to send a packet, it looks at what classes that is furtherest
  away from the proportion of the link it should have, and sends its
  packet.
  An example. Lets say we have classes A, B and C, who have link
  share specifications of:
  
    class A:    hfsc ls rate 10kbps
    class B:    hfsc ls rate 20kbps
    class C:    hfsc ls rate 30kbps

  A and B are sending packets as fast as they can, so they will be
  backlogged.  C is idle.  The ratio of the speeds ratio of the
  backlogged classes, A and B, is 10 to 20, ie 1 to 2.  So link share
  ensures for every byte A sends, B sends 2.

  It's so simple it's important to note what this calculation doesn't
  depend on:
  
  A.  It doesn't depend on the speed of the underlying link.  Link
      share doesn't need to know what that speed is, and the numbers
      you give it need have no relationship to that speed.   Only the
      ratios matter.
  B.  This implies link share is immune to link speed changing during
      the day, as real internet connections often do.
  C.  Link share doesn't care who sends the packets - it or real
      time.  They are all counted the same way.


 There are some complications as well:
 A.  Link share will not give a class more than the upper limit.
     Real time does not respect upper limit, but since link share
     counts what real time sends going over the upper limit using
     real time will be penalised later by link share.
 B.  Link share only gets what's left over after real time sends
     it's packets.  To link share the bandwidth used by real time
     looks like the link speed varying a bit faster than it would
     otherwise.
 C.  A class is allowed to exceed its steady state speed by using
     its burst speed when it first becomes backlogged.  Once the
     burst allowance is exhausted it can not exceed the steady state
     speed until the backlog has been completely cleared.
 And there is one negative.  Link share will send packets as fast as
 the underlying device will let it.  If the underlying device isn't
 the slowest thing between it and the destination link share won't
 be the thing controlling what packet gets sent next - it will
 be whatever scheduler is running on the slowest device.  Typically
 on a Linux box the device HFSC is controlling is an Ethernet NIC
 that runs at giggabits per second.  The slowest thing is probably
 the thing it is connected to - the modem, so unless do something
 the modem will managing traffic, not HFSC.  If your Linux box isn't
 controlling an Ethernet NIC its most likely attempting to schedule
 traffic on a virtual device.  Under Linux virtual network devices
 like tunnels, ifb (used for ingress scheduling), and tun/tap
 devices processing their packets as soon as they receive them.  To
 a scheduler they appear to be infinitely fast.  This means on a
 typical Linux setup not only will link share have absolutely no
 effect on the outgoing traffic, it's propensity to flood the link
 will destroys any chance real time had at controlling latency.
 To fix this, you must limit the rate link share can send using
 upper limit.  Remember, for link share to be effective the limit
 must be lower than the slowest link between the source and
 destination.  Upper limit has one other use - constraining a
 customer to the bandwidth they paid for.
 Finally, link share can't give hard latency guarantees.  So its
 burst numbers don't mean "send an ssh packet within 10
 milliseconds".  Instead they mean "if ssh has been quiet for a
 while, briefly increase the ratio at which ssh can send packets,
 so it gets better latency than competing classes".  That said,
 if you follow these rules link share will get reasonably close
 to the latency you ask for:
     a.  For classes that use real time, you use exactly the same
         link share specification.  Recall "sc" does this.
     b.  The total bandwidth you allocate for link share doesn't
         exceed the links capacity or upper limit.  This must be
         true for both burst and steady state speeds.


 The Class Hierarchy
 -------------------
 Like HTB and CBQ, HFSC allows you to create a hierarchy of classes.
 This is an example:
                        +1:0---------+
                        |    qdisc   |
                        |  1500kbit  |
                        +------------+
                       /              \
                 +-1:10------+  +-1:20------+
                 |  User A   |  |  User B   |
                 | 50% share |  | 50% share |
                 | Unlimited |  | 1000kbit  |
                 +-----------+  +-----------+
                /             \
        +-1:11-------+  +-1:12---------+
        |     Web    |  |     VOIP     |
        | 100% share |  | 20ms Latency |
        | Unlimited  |  |   100kbit    |
        +------------+  +--------------+
 Intuitively this does the following:
   - User A can use all of the links bandwidth.
   - All User A's spare bandwidth goes towards web browsing.
   - User A has VOIP which must have low latency.
   - User B can use a maximum of 1000kbit.
   - User B gets 50/50 share of a congested link.
 Up till now we have been discussed the leaf nodes sharing a common
 parent (inner node) interact.  This is how the hierarchy effects
 each type HFSC service:
   ls  A parent enforces its link sharing allocation on its
       children.  In other words the siblings get to fight over
       the proportion of the shared bandwidth allocated to their
       parent, and the parent and it's siblings get to fight over
       whatever their ancestors link sharing allocations are.
   rt  The hierarchy does not effect the bandwidth allocated by the
       real time service any way.  rt specifications for inner nodes
       are ignored.
   ul  A parent enforces its upper limit allocation on its children.
       In other words the sum of the bandwidth used by children's
       link sharing service will never exceed any of their ancestors
       upper limits.
 In practice, for most devices, you will need one class enforcing an
 upper limit so link share doesn't flood the link and thus destroy
 HFSC's ability to schedule.  That class will be the root, of
 course.
 With those definition in hand it this is how the above hierarchy
 is implemented:
   #
   # Save typing when creating classes.
   #
   ca() {
     par=$1; class=$2; shift 2
     tc class add dev eth0 parent 1:$par classid 1:$class hfsc "$@"
   }
   #
   # Create the qdisc.  Send unclassified traffic to 1:11.
   # If "default" isn't given unclassified traffic is dropped.
   #
   tc qdisc add dev eth0 root handle 1:0 hfsc default 11
   ca 0 1 ls m2 1500kbit ul m2 1500kbit        # Stop link flooding
   #
   # Create classes representing each user.
   #
   ca 1 10 ls m2 750kbit                       # User A
   ca 1 20 ls m2 750kbit ul m2 1000kbit        # User B
   #
   # User A's web traffic.
   #
   ca 10 11 ls m2 1500kbit                     # User A web
   #
   # VOIP.  We said VOIP uses 100kbit, and required 20ms latency.
   # We also know that HFSC can break its rt latency guarantees
   # by one MTU packet.  Assuming MTU is 1500 bytes, it takes:
   #
   #   1500 [byte] * 8 [bits/byte] / 100000 [bits/sec] = 12 msec
   #
   # to send an MTU sized packet.  That means we must send the
   # VOIP packet within 8ms (= 20ms - 12ms) to meet the latency
   # requirements.
   #
   # The requirements didn't say how big a VOIP packet is, but given
   # it has to send a new packet every 20ms the biggest it could be
   # is:
   #   100000 [bit/sec] / 8 [bit/byte] * 0.02 [sec/pkt] = 250 byte/pkt
   # So, we have 8ms to send 250 bytes, which works out to be:
   #   250 [bytes] * 8 [bit/byte] / 0.008 [sec] = 250kbit
   #
   ca 10 12 rt m1 250kbit d 8ms m2 100kbit     # User A VOIP


Reference

=

 Qdisc Parameters
 ----------------
   default <CLASSID>
     Send unclassified packets to <classid>.  If not supplied
     unclassified packets are dropped.
 Class Parameters
 ----------------
   
   ls <SERVICE-CURVE>
     Apportions how much of a backlogged link's spare capacity
     should be allocated to this class and its children if a packet
     isn't being sent under the rt <SERVICE-CURVE>.  The bandwidth
     allocated by <SERVICE-CURVE> of all sibling classes wanting to
     share the excess capacity is summed, then the ratio of the
     links capacity after real time to the total bandwidth consumed
     this class and all it's offspring is equal to the ratio the
     classes <SERVICE-CURVE> bandwidth to the sum.  When calculating
     <SERVICE-CURVE>, <BURST-SEC> is measured from the last time
     this class was backlogged, ie had packets waiting to be
     transmitted.  Every leaf class must have a ls or rt (or both)
     <SERVICE-CURVE> specified.  Every inner class must be given an
     ls <SERVICE-CURVE>.
     
   rt <SERVICE-CURVE>
     <SERVICE-CURVE> sets the amount of bandwidth a leaf class is
     guaranteed. rt is ignored for inner classes.  This amount of
     bandwidth is always available to it regardless of the other
     classes use of the link or the maximum bandwidth allowed by
     this class or its parents.  When calculating <SERVICE-CURVE>,
     the calculation of the start time of <BURST-BPS> is complex.
     Roughly, it is the last time the class was backlogged and was
     under the guaranteed rate.  Every leaf class must have a ls or
     rt <SERVICE-CURVE> specified.  
   ul <SERVICE-CURVE>
     For classes with a ls curve, set the maximum bandwidth this
     class and its children are permitted to use.  This only effects
     sending packets under the ls <SERVICE-CURVE>, but packets set
     under the rt curve are included in the bandwidth calculation.
     When calculating <SERVICE-CURVE>, <BURST-SEC> is measured from
     the last time this class was backlogged, ie had packets waiting
     to be transmitted.
 <SERVICE-CURVE>
 ---------------
 A service curve specifies how much bandwidth a class can use.  It
 has an optional initial burst rate, followed by a steady state.
 The start time of the burst depends on the service type (rt, ul or
 ls).  The are two syntaxes available for describing a service
 curve.  In both you use the usual tc units to designate speed,
 data size and time.
 [m1 <BURST-SPEED> d <BURST-SEC>] m2 <STEADY-SPEED>
   <BURST-SPEED> is the speed of the burst, and <BURST-SEC> is the
   length.  Thereafter the class sends at <STEADY-SPEED>.
 [umax <BURST-BYTES> dmax <BURST-DSEC>] rate <STEADY-SPEED>
   "rate" has an identical meaning to "m2" above, so it specifies
   the long term speed the class is entitled to.  To determine what
   "umax ... dmax ..." means, an implied burst speed is calculated
   using "<BURST-BYTES> / <BURST-DSEC>".  If this burst speed larger
   than <STEADY-SPEED> then this syntax is equivalent to "m1
   (<BURST-BYTES> / <BURST-DESC>) d <BURST-DSEC> m2 <STEADY-SPEED>"
   where the stuff in (...) is calculated.  If the implied burst
   speed is not greater than <STEADY-SPEED> then during the burst
   period effective specification becomes "m1 0 d (<BURST-DSEC> -
   <BURST-BYTES> / <STEADY-SPEED>) m2 <STEADY-SPEED>".  This means
   the class is not allowed to send any data at all during the
   burst period.  The burst period is the time left over after
   if <BURST-BYTES> were sent at <STEADY-SPEED>.


Classes. Traffic is assigned to HFSC classes using a filter.

Scheduling. The HFSC scheduler sends packets required to meet guaranteed bandwidth specified by rt, but only if those requirements would be broken by not sending them. In that case it sends the packet that will complete sending closest to its deadline. If not forced to send packets required by rt it will send a packet according to the ls criteria, choosing the packet that will complete sending closest to its deadline. For both rt and ls, time is measured with a clock running at one tick per bit per second of bandwidth allocated by the <SERVICE-CURVE>. Thus specifications with higher bandwidth appear to have their deadlines arrive sooner, and packets appear to be waiting longer and thus are sent sooner in actual time.

Policing. HFSC discard packets that aren't classified unless the default classid is specified to the qdisc.

Rate Limiting. HFSC restricts classes to the rate limit imposed by the ul specification, and the rate limit imposed by its ancestor classes on their children. Classes that have exceeded their real time <SERVICE-CURVE> in the past may be rate limited during if they don't also have a link share and the link is backlogged.

Classifier. The HFSC queuing discipline does not classify packets.