I wrote this article with inspiration after the RIPE71 conference at Bucharest. There were a lot of awesome speeches about routing, security, IPv6 but, actually, nobody cares about traffic engineering. I haven’t found companies who offer products for this direction and I haven’t seen any presentation about it.
On the other hand, as for me, this field is very important for future work of IETF, RIPE and overall Network Community. I’m going to try to describe what Traffic Engineering means, why it’s important and why we should kill old style routing completely.
Traffic Engineering is a process of managing your traffic. What does the word “management” mean here? I’d like to suggest several examples of the traffic engineering:
1) Management for achieving bigger bandwidth, more than two points
2) Management for achieving lower latency between two points
3) Management for achieving better reliability (without packet losses) between two points
We could implement these tasks inside our own network or outside. But actually, you are king in your own network and you could change topology, buy new hardware and fix anything you want. Actually internal traffic engineering basically concerns the network architecture. But external traffic engineering is a bit complicated direction. We have a lot of algorithm implementations or implementations for internal failover / load balancing / automatic traffic engineering (OSPF, RIP, iBGP, LACP, ECMP). But we, generally, have only a single way to do external traffic engineering. And I’m speaking about BGP.
I want to describe in details external or global traffic engineering. So we have two very different types of global traffic engineering; there are: incoming and outgoing ones.
Outgoing traffic engineering
Let’s discuss the outgoing traffic engineering because it’s quite simple. I want to describe a case when we have used three upstream connections with external world and without connection to traffic exchange points (they make our task much more complicated). What’s about their names? I like isp#1, isp#2 and isp#3.
In case when there’re only two upstream connections the traffic engineering is not so interesting and could be implemented in a very simple way (master/slave, round-robin flow balancing).
Ideally, your isp#1, isp#2 and isp#3 will have equal bandwidth, equal network link and equal reliability. Hah! Very nice promise but actually, it’s impossible.
In real world you will have really different quality characteristics for your existing upstreams. And thing going much harder if you have different link capacity for your upstreams.
Indeed, in all cases to me known, the network engineers use the full BGP from each upstream and move traffic engineering task to the BGP algorithm. And if you are lucky you could get enough distribution of bandwidth. But what you could do if you are not lucky and almost the whole traffic goes to the smallest, most expensive isp with very bad latency? It’s possible, really! Due to geographical or network architecture the worst path in some direction could become the best one in BGP terms.
The BGP has so many options for traffic engineering but for the Global Traffic Engineering we could use only these:
1) The shortest prefix is much more appreciate than the longer (if you have route up to /20 over isp#1 and route up to /24 over isp#2, the whole traffic will go to isp#2)
2) The shortest AS PATH is a great benefit! So in the MPLS networks in for some Internet Exchanges (in most cases it’s related to distributed IXP’s) the network with ASPATH 3 could be in multiple thousands kilometers away.
3) Local preference. We could use it for the custom override BGP’s best path calculation algorithm.
As you can see the best path selection algorithm is enough straight and not flexible. If you have so much routes with equal ASPATH it will be a rather difficult problem for you and you have to be enough smart for solving this problem.
Manual for outgoing traffic engineering implementation
There are two different ways to perform outgoing traffic engineering:
1) Balancing based on internal customer address. The way is based on routing (select upstream according to customer’s address in your network). For example, all customers from 18.104.22.168/24 can go to isp#1 and all customer from 22.214.171.124/24 will go to isp#2. It sounds great! But it could help if you have similar load from these customers.
2) Balancing, based on target address. You could find most popular prefixes for your outgoing traffic and spread them over all multiple uplinks manually.
So we could achieve load balancing of this way for outgoing traffic. But the case with latency optimization and fault mitigation is a little more complicated and could not be solved according to these simple rules.
So you have to perform all these steps manually. For example, your service is sending packets through isp#2 to prefix 126.96.36.199/24. Let’s assume the case when isp#2 is working really well but have some congestion to this prefix. What is the task of the in this situation? It could do nothing! Because the BGP thinks “it’s working because my peer is alive”.
So you have to check manually the multiple tracerouters, check the ping, multiple looking glasses and you will find the problem in few minutes… or few hours and fix it. It sounds bad, but it’s possible.
Manual for incoming traffic engineering
So, the things come to be much more complicated. But, we have really short list of options here.
1) If you want to disable the incoming traffic from some ISP you can disable announces directed to it of your own network.
2) If you want to reduce the incoming traffic to one of your subnet, you could prepend independently AS PATH to this network several times. So this option is not a universal remedy, in some cases isp could have things which ignores this prepends or filter them out. So, you could use only first option here.
3) Lesser network has bigger priority for incoming traffic. If you have /20 and /24 (from this /20) the subnet traffic will go over isp which announces /24 even with longest AS path.
4) Custom BGP community. Each ISP offers huge list of non standard BGP community for traffic engineering. For example, you could mark you prefix with this community and disable announces of your prefix somewhere in your upstream network.
5) Anycast. You could be amazed but anycast is definitely could be used as a load balancing protocol! It’s very hard and costly to be implemented, but it’s working.
You should keep in mind one thing. Incoming traffic engineering is REALLY TOO COMPLICATED! You should know everything about your neighbor networks to perform it in a right way.
Traffic engineering solutions
Almost all of them give an ability to opearte only outgoing traffic because it’s pretty simple and reliable. They are using active network monitoring with some traffic telemetry protocols (sFLOW, NetFLOW, SPAN). If they found some congestion or packet loss they automatically move your traffic away from the broken upstream. They spread your traffic according to upstream’s bandwidth; it is expensive and causes even latency. They are working really fast and can move network almost each second if it’s important.
If you are really smart networking guy you could implement some sort of this idea manually without any external toolkits.
They’re able to execute almost all your day to day job. But actually they are overriding standard BGP behavior. They are ignoring ASPATH length, they do not care about prefix length. So, if there is somebody from outside trying to implement traffic engineering against your network; I have really really really bad news for him. Manual incoming traffic engineering will not work for your network and nobody could move your traffic in wrong direction if you do not want it. And if you have multiple terabits of uploading…. it’s BAD news for all Internet.
I do not need Traffic Engineering solutions!
If you haven’t automatic traffic engineering solution in your network I have bad news for you. Your competitors offer more reliable, low latency and much times cheaper services to customers. And they never broke SLA. They do not have a big support staff for processing calls from angry customers who complains about Facebook or Twitter issues. And your customers will vote for their suggestion.
You can see that the world is moving fast to the automatic traffic engineering. And in few years everybody will use it. And we will break any ability to load the balance of incoming traffic (so, anycast approach could be used, but in reality it’s very expensive).
That’s really nice idea because TE solutions are working as awesome monitoring toolkits for your ISP. I truly believe that you definitely can solve problems if you find thousands of problem prefixes in TE toolkit report.
But incoming traffic engineering is still important but we need some external database where BGP could check information “how I should sent traffic to this prefix”. It’s not mandatory rule, definitely! We must not ask somebody’s permission. But we definitely need protocol for this case and ability to supply these data to router much more times smarter than routing decisions. So we even could inject this data to BGP routing table according to some standard.