A nurse in a hospital is far too busy to watch every patient every minute. She relies on telemetry to monitor their vital signs, such as their blood pressure, and alert her if their condition worsens.
Telemetry systems automatically collect data from sensors, whether they are attached to a patient, a jet engine or an application server. It then sends that information to a central site for performance monitoring and to identify problems.
Telemetry was developed to automatically measure industrial, scientific and military data from remote locations. These included tracking how a missile performed in flight or the temperatures in a blast furnace. In the world of IT and security, telemetry data monitors metrics such as application downtime, database errors, or network connections. This data is the raw material for observability – understanding how well applications and services are working, and how users interact with them.
When telemetry monitors physical objects, it relies on sensors that measure characteristics such as temperature, pressure or vibration. When telemetry is used to monitor IT systems, software agents gather digital data about performance, uptime and security. They send that data to collectors that process the data and transmit it for storage or analysis.
Telemetry data can be produced in multiple forms by different types of agents. It must thus be “normalized” or made to fit a standard structure for use by any analytic tool. Historically, normalization was done through a schema-on-write process, which required knowing the required format in advance and enforcing that schema before the data was logged. That process is no longer viable given the volume, variety and velocity of data produced by IT infrastructures. A more popular current approach is schema-on-read. This converts data into the required format before it is stored and analyzed.
The information produced by IT telemetry data depends on the system being tracked and how the data is used. For servers, the data might include how close processors and memory are to being overloaded. For networks, it might be latency and bandwidth. For applications and databases, it might be uptime and response time. Telemetry designed to detect attacks may include tracking the number of incoming requests to a server, changes to the configuration of an application or a server, or the number or type of files being created or accessed.
Telemetry data comes in three forms
The data gathered by telemetry can provide a real-time view of application performance, so teams can perform root cause analysis on problems, prevent bottlenecks, and identify security threats. For security monitoring, unusual network traffic patterns might indicate a denial of service attack. Unusual requests for data from an unknown application or repeated unsuccessful attempts to log into a user account may also signal an attempted hack.
Telemetry data can also be used to track how users are interacting with applications and systems. Such user behavior testing can help improve user interfaces and compare whether tweaks to applications and websites can increase user engagement or sales. Telemetry data can also help cut costs. By identifying and eliminating underused assets, such as cloud servers that are no longer needed, or helping plan and budget for infrastructure needs by identifying usage trends.
Telemetry from devices on the Internet of Things can do everything from tracking shipments to preventive equipment maintenance. This data can also enable new business models in which a company sells performance, maintenance or production data from equipment in the field.
Telemetry aids both monitoring and observability. Monitoring is the continuous observation of a system to detect and send alerts about abnormal behavior. Observability is understanding the internal state of a system and predicting how it will behave in the future.
Telemetry is the “input” end of an observability pipeline that delivers the real-time data administrators need to fine-tune IT systems. Its automated data collection, along with AI-aided analytics, helps run critical IT infrastructure more effectively and efficiently than unaided humans. The real-time performance insights it enables tell administrators about current or pending problems before they affect users or customers.
The user experience data it provides includes how often users engage with an application, how long they engage with it, and what features they use most. Telemetry also provides developers and system administrators information about the configuration of hardware and the causes of crashes. Insights like these allow software developers to improve applications to run more quickly and efficiently, so they can develop updates that will improve them, reducing the need for costly manual testing.
Security-related telemetry data helps protect a company’s revenue and reputation by finding possible breaches, and triggering defenses before they impact the business. Using artificial intelligence and machine learning to analyze security data and suggest countermeasures can increase security while minimizing costs.
Modern IT infrastructures generate very large data streams in a variety of formats. Not all of this data is critical or even important. It’s easy for system administrators and other IT staff to be overwhelmed by this data, and for storage costs to rise to unacceptable levels.
System administrators and software developers must thus decide what data is most important and how to transmit, format and analyze it. Each data transmission method has its pluses and minuses. One option is sending telemetry data directly from the application being monitored. This eliminates the need to run additional software and to manage ports or processes. But if the sending application is complex and generates lots of data, sending that data could bog down the application or network being monitored.
System administrators and software developers must also find ways to minimize the cost of storing telemetry data. One option is to store all the data in a data lake, retrieving only what is needed for analysis when it is needed. Another challenge is how to gather and analyze information from older devices and applications that may not support telemetry. One example is networks that provide performance and health data using the Simple Network Management Protocol.
Another challenge is finding, acquiring, and deploying analytical tools, including those using artificial intelligence and machine learning, that can sift through Tbytes of data to uncover the incidents and trends that require further attention.
Telemetry often relies on software agents running on the source systems to gather the data. In other cases, the source would be an application programming interface (API) to an application or monitoring tool. Connectors then manage the flow of data to multiple destinations and convert it to the protocols and data formats used by various analytical tools. Telemetry data also requires a storage site. This might be a data lake, a time-series database, or a security information and event management (SIEM) system.
Given the wide variety of sources of telemetry data, it can be useful to look for tools that comply with the OpenTelemetry Protocol, which describes the encoding, transport, and delivery mechanism of telemetry data between telemetry sources and destinations.
One such tool is Cribl Stream – a vendor-agnostic observability pipeline. It provides out-of-the-box integrations between more than 80 pairs of data sources and integration tools, as well as data stores. It also allows organizations to convert data from one format to another on the fly. That means the data is ready for real-time analytics when it arrives at its destination.
Organizations can easily add new data sources such as data lakes and new destinations such as AI analytics tools using a drag-and-drop interface. Cribl Stream can cope with even the heaviest data loads, having been tested with volumes of more than 20Pbytes per day. Monitoring tools make it easy to assure the right data is reaching the right destination.
These features, and more, make it easier to convert the raw material of telemetry data into insights that help keep the business running.
The move to the cloud, mobile-first computing, and pandemic-driven remote working have made IT infrastructure more critical. But it has also made it complex and distributed. When manual troubleshooting is no longer enough, telemetry is the first step towards the observability required to proactively assure quality service to customers, employees, and business partners.