Have you ever had your network not working properly with no idea about why that was happening?
We have all been there. In those moments, I wished I had a way to get more visibility into the network, especially now that cloud applications have become so popular. Nowadays, the typical Cloud application troubleshooting scenario is illustrated in the following picture.
Users complain about the Cloud Application not being accessible from their laptops. A good network engineer would normally follow these troubleshooting steps:
- check the LAN in the enterprise space ⇒ everything seems working fine
- check the WAN segment through the Internet ⇒ no problems reported on the connectivity
- check the Cloud data centre ⇒ no issue on that side
So what do we do now? Even though each segment of the network is reported working as expected, users still can’t access the Cloud application they need, and that’s all what it counts during critical outages. The real cause of the issue could be hiding in so many places, for example:
- the LAN could be busy with spikes of heavy traffic hence not able to provide enough bandwidth to access a Cloud application or it could be accessed by users via WiFi and the signal might be poor in some areas of the offices. DNS problems are another common cause of unreachability of cloud services.
- the WAN segment might have busy routes/nodes causing delays and packet drops or there might be ongoing BGP re-routing events impacting the traffic. Route hijacks and route leaks can also cause traffic black-holing in the Internet segment.
- the Cloud provider could be experiencing slow performances, application layer issues or being under DDoS attacks.
Anyone who has tried to troubleshoot such a scenario knows how hard can be to find out the root cause. There are infinite combinations of network and application issues that can create nasty incidents.
- require a massive amount of work in a very dynamic environment such as the modern Internet.
- outputs from different tool can be really hard to correlate, even when targeting the same outage.
- incidents must be captured in real time so it could be insanely hard catching the right moment.
- they don’t have automatic execution nor automatic storage of output, unless there’s a script running them.
In summary, troubleshooting an ongoing network or application issue can be a nightmare just using the common networking tools that we all know.
ThousandEyes Network Monitor Solution
It’s time to try ThousandEyes, the network monitoring solution for the cloud era!
ThousandEyes is a network intelligence platform that aim to deliver visibility into every network involved in the traffic forwarding from source to destination. This is what it promises:
- quickly pinpoint performance changes caused by device faults, congestion, DDoS attacks, hijacks, route leaks, DNS failures and service provider outages.
- provide visibility across any network segment, along with depth of insight that drives accurate diagnosis through network visualization with network intelligence.
- rapidly diagnose and find problems with real-time performance data, with highlights of important insights with historical comparisons.
- gain intelligence from a variety of OSI layers and vantage points.
- communicate the outcome using friendly and customisable reports.
- embedded alerting system for automatic monitoring and integration with other platforms.
- collaboration tools to share interactive sets of data with third parties.
How it works
ThousandEyes relies on Smart Agents deployed across the Internet and within your organisation. The agents can be of two types:
- Cloud Agents: deployed all around the world and managed by ThousandEyes, are an easy out-of-the-box solution for monitoring your target servers and applications from 100+ agent locations distributed in 40+ countries (full map).
- Enterprise Agents: installed in your corporate space, they generate traffic directly from inside your network. They can be run as Virtual Appliances in pretty much any hypervisor (e.g. VirtualBox, VMware, Hyper-V) or installed on your Linux server (e.g. Ubuntu, CentOS, RHEL).
Agents are used to generate synthetic traffic probing your target server, meanwhile collecting information on your network and application:
- Network topology: nodes and links with respective packet loss and latency.
- Network metrics: packet loss, latency, jitter, bandwidth.
- BGP metrics: prefix reachability, path changes and updates.
- Application metrics (HTTP Server): availability, response time, throughput.
- Application metrics (DNS): availability, response time, resolution trace.
- Application metrics (Web Transaction): completion, duration.
- Application metrics (Page Load): load time.
- Voice metrics: packet loss, latency, jitter, MOS.
ThousandEyes is able to pinpoint problems hop-by-hop across all enterprise and service provider networks. The collected topology information are used to depict the paths taken by the traffic from the source agents to the target destination. The following picture is showing how the path visualisation is highlighting in red the components (e.g. nodes, links, etc..) having issues, also providing detailed information about that:
In order to monitor the layer-3 (IPv4 and IPv6 connectivity) and layer-7 (application), ThousandEyes offers a variety of test types:
- DNS Server
- DNS Trace
- DNS SEC
- HTTP Server
- Page Load
- Web transaction
Let’s now see ThousandEyes in action!
RouterFreak’s ThousandEyes Review
ThousandEyes relies on agents to generate the synthetic traffic used to probe the selected target. As mentioned in the previous section, there’s the option of using Cloud Agents (managed by ThousandEyes and distributed around the globe) or to install Enterprise Agents in your own network.
Let’s first have a look at Cloud agents, the world distribution map is shown in the following picture:
Cloud Agents are a super easy way to start monitoring with ThousandEyes. They are immediately available once you create your account, and they are completely managed by ThousandEyes meaning you don’t need to do anything, just use them! Cloud agents are a great way to test your server or application from different locations in the world checking the user experience from all over.
For example, let’s say you host your servers in California but you have users accessing from Europe and Asia: using Cloud Agents, in a matter of minutes you’ll see how the geographical distance affects the network latency of long connections.
Another nice application for Cloud Agents is testing an anycast infrastructure where the routing is based on the topologically nearest node.
Last but not least, we also liked a lot the possibility of testing DNS servers and DNS name resolutions from so many locations in the world: this is a key point for Content Delivery Networks where is always critical making sure the traffic is routed to the closest server without spinning around the globe before reaching it.
Enterprise agents are deployed in your corporate network, either in a data centre or even on a simple laptop. They come in two flavours:
- Virtual Appliance: the easy way, just download the virtual machine image and import it in your favourite hypervisor. The download file can be .OVA (for VMware, VirtualBox, etc..) or .ZIP (for Hyper-V).
- Linux Package: for the real techies, install the agent package on your Linux server but make sure you got one of the supported distribution (Ubuntu, CentOS and RHEL).
For monitoring purposes, there is no differences between Virtual Appliance and Linux Package: they all embed the same software and the same testing capabilities. At Router Freak we installed an agent in our data centre, see this screenshot:
The agent is online but there is a warning: when this screenshot was taken, we were experiencing issues with our internal NTP server so the time was not synced properly. It was nice seeing the agent promptly reporting the issue from its side. Time synchronisation is a key metric for the accuracy of the data collected by the agents, so make sure you always got a perfectly working NTP infrastructure.
Any Enterprise Agent is able to automatically update its components to the latest available version. In the screenshot above we see the Agent, BrowserBot and Virtual Appliance packages versions currently installed on our Enterprise Agent. In case some packages could not update to the latest, we should see a warning here similarly to the Agent System Time one.
Enterprise agents are a great way to generate the traffic from inside your own network.
For example, if you want to simulate the connections initiated by users working from you corporate office, you should install an Enterprise Agent in the same network. In this way all the probe connections will be initiated from your office to the Web service under testing, and the Path Visualisation will be able to display also the nodes belonging to your corporate space.
Another nice application of Enterprise Agents is with Voice testing: you can install one agent in each of your branch offices, and then monitoring the voice traffic between them mimicking a real VoIP between office workers.
Also your office WiFi connection can be tested, just install an Enterprise Agent on a laptop connected to the wireless network and check the network metrics in reaching a target!
Creating tests in ThousandEyes is a pretty straightforward task that hardly needs explanation. In the next screenshot is presented the mask to create a new test.
First we need to provide a test name, then the URL (or IP address) of the target to monitor. The interval value determines how often the test is executed. The agents dropdown menu allows to select from what agent the test will be executed. Lastly, the alerts section enables alerts based on specific metrics (presented later in this article). That’s all folks!
HTTP Server Test
The HTTP Server test is used to monitor the OSI layer-7 (i.e. application) of the Web Server in terms of availability, response time and throughput. Here is a sample screenshot from a test monitoring our website www.routerfreak.com :
We can see that on December 13th at 22:30 CET there was a drop in availability from 100% to 85%, the reason being one of the agent not able to complete the transaction. Looking at the table at the bottom of the screenshot, we see the Amsterdam agent returning a “receive” error, also hitting the 5 seconds timeout. The HTTP connection from Amsterdam agent to the Router Freak server failed during the receive phase, slowing down to the point of hitting the test configured timeout.
The (probably) most powerful tool of ThousandEyes is the possibility of correlating data from different layers. In the next picture we switched from HTTP Server layer to the Network layer – specifically to the Path Visualisation view. We can see here that on the path from the Amsterdam agent to the server there are items marked in red: a link with high latency and a node with significant packet drop. These issues most likely affected the HTTP connection, causing the receive errors explained in the previous paragraph.
The Path Visualisation view is a great tool for immediate pinpoint of issues. Looking at it, we exactly know which link had high latency and what node was dropping packets. In addition, we have historical data before and after the problem, so to have a benchmark of the working conditions when there was no issue ongoing.
A BGP test is used to monitor a specific IPv4 or IPv6 prefix in terms of reachability, path changes and routing updates. This time the BGP data is not provided by agents, but sourced from BGP public monitors. Here is a screenshot showing the BGP view:
In the center, the light blue circle represent the origin Autonomous System of our prefix, in this case AS36351. All around we see the connected Autonomous Systems (grey circles), up to reaching the green dots that are the public BGP monitors distributed around the world. Since we see all of them green, it means that there are no issues with our monitored BGP prefix.
In combination with alerts (presented later in this review), the BGP test is a very powerful tool to get alerted in case of hijacks, peering changes and route leaks.
Page Load Test
The Page Load test is useful to get nice insights about a specific web page and all its embedded components. The target of the test is a web page and here is an example of Page Load view:
The metric here is the page load time, also measured in terms of DOM load. What we really liked about the Page Load test is the Waterfall view, where all the components of the page are listed with the respective DNS time, SSL time, blocking time, connect time, send time, wait time and receive time (highlighted with different colours in the horizontal bars):
This view is particularly useful for web developers because from here we can see how the page objects are loaded from the different domains. In just a glimpse it’s also possible seeing which components are slowing down the overall page load, and which items are failing and why.
VoIP (Voice over Internet Protocol) converts the sound of your voice into data packets, which are sent over the Internet to be converted back into original sound once they reach the destination. Typically, VoIP calls are divided into two phases: Session Connection and Voice Data Transfer. ThousandEyes is able to simulate the second phase, which is the Real Time Transport Protocol (RTP) voice data streams. Let’s have a look at the voice metrics because they are different from the usual network ones:
- Mean Opinion Score (MOS): Perceived voice quality where individual transmission parameters are transformed into different individual “impairment factors”.
- Loss: A voice stream of UDP packets with RTP as payload is sent to the target and packet loss is calculated from the number of packets received by the target.
- Discards: delayed packets that are discarded when they reach the destination, expressed as a percentage of packets sent
- Latency: average latency between endpoints.
- Packet Delay Variation (PDV): measurement of variation in delay from sender to receiver.
Here is a sample screenshot from a voice test from the Amsterdam enterprise agent to a New York cloud agent:
Similarly to other tests, we notice the possibility of digging down to other layers such as Path Visualisation and BGP Route Visualisation. The voice test can be carried out between two Enterprise agents or a combination of Enterprise and Cloud agents.
ThousandEyes embeds a very powerful and customisable alerting system. Alert Rules can be created and tailored on customer needs, then assigned to tests. For customers who want simplicity in alert configuration and management, the ThousandEyes platform ships with default Alert Rules configured and enabled for each test.
Alert notifications are delivered via email to the alert recipients defined in the Alert Rule. Alerts can optionally be configured to send another email once the alert has cleared. It is possible integrating alerts via Webhooks or Pager Duty, so to allow the push/pull mechanisms with other monitoring platform. Another options to poll for active alerts is through the ThousandEyes Application Program Interface (API).
ThousandEyes provides a reporting tool which provides both on-demand and scheduled reports. Data from the tests can be organised and customised using several widgets such as tables, graphs and time series. Reports can be delivered by email, and also shared among teams.
Note that each report permit the user to select a time frame of interest, so the report page will then be populated with that period’s test data according to the time frame configuration.
Help & Support
ThousandEyes claims to provide a top notch support through their Customer Success Centre. Assistance is given via the usual ticketing system, or the website live chat. We had a couple of interactions with the support guys, and we confirm their prompt actions in providing the requested information. As illustrated in the following image, the Help & Support is accessible from the web user interface clicking the ‘?’ icon on top right corner, or using the live chat at the bottom right (see red box highlights).
Something we really liked about the support was its comprehensiveness: not only on issues related the ThousandEyes web interface, but also professional assistance on network and application problems! The ThousandEyes support guys are skilled Network Engineers able to act as a virtual extension of your NOC Team.
If you want to monitor your networks and applications in a quick, effective and straightforward way, ThousandEyes is definitely a great solution. The web interface is clean and intuitive to use, also for Network Engineers more used to run pings and traceroutes!
The Cloud Agents are ready to be used out-of-the-box, but also the Enterprise Agents are easy enough to deploy and configure.
We really liked the monitoring capabilities of ThousandEyes, especially because is able to provide testing from both the public Internet (via Cloud Agents) and from the inside the corporate (via Enterprise Agents).
The greatest thing is definitely the visual approach in presenting the data, with clear red highlights on the faulty components of the connection pipeline, from source to destination.
There are plenty of tests to cover pretty much all the monitoring needs, from layer-3 (IP connectivity) to layer-7 (application).
The test results can be integrated with other monitoring solution via Application Program Interface (API): it’s RESTful and allows pulling data out of the ThousandEyes app with any script or programming language supporting GET/POST operations.
Something we’d like to see, hopefully in the future, is a bidirectional Path Visualization: at the moment the routes are displayed only from agent to target, but it would be really cool also seeing the way backward (from target to agents). The Internet IP routing is very dynamic, so most likely the returning packets are not following the same routes as the forwarding ones.
One more thing that we hope will be added is the possibility of testing the Session Connection in Voice tests. Right now ThousandEyes is only able to generate RTP traffic between the two voice endpoints, but not simulating the session establishment phase.
ThousandEyes definitely comes with and interesting and extremely simple to use approach to monitoring networks and cloud applications.
Router Freak definitely recommends this tool, and for whoever is still skeptical we suggest taking advantage of the free 15 days Trial offered by ThousandEyes: this offers full Pro Features for the trial period, and then the account will stay free forever in Lite mode (limited featured). So why not to give it a try?