Tracing with Avro
- by Jon Zuanich
- September 03, 2010
- 3 comments
Written by Patrick Wendell, an amazing summer intern with Cloudera and an Avro Committer.
In my summer internship project at Cloudera, I added RPC tracing as a first-order feature of Apache Avro. Avro is a platform for data storage and exchange that caters to data-intensive, dynamic applications. My project focused on Avro’s RPC functionality.
It is common knowledge that tracing in distributed systems can be difficult. In user-facing web services, a front-end function may recursively trigger several function calls to mid and back-tier services. In offline processing, data-center storage layers may distribute data across several hosts, querying one or many of them when a client requests a file. In either case, the inter-dependency of components makes it difficult to pinpoint the source of a slowdown or hang-up when they inevitably occur.
AvroTrace is designed as a first responder for diagnosing problems in distributed systems that use Avro for RPC transport. It has two components, a real-time monitoring dashboard and an offline trace analyzer. Both run as low-overhead Avro plugins which store and propagate tracing meta-data among RPC clients and servers. The monitoring dashboard is accessible via a web interface on any Avro server, delivering a “snapshot” of the most recent RPC activity. The offline analysis tool offers a basic interface for collecting, aggregating, and analyzing this data to identify problem spots. It is largely based on Google’s Dapper tracing infrastructure, which is itself inspired by X-Trace and other academic tracing research.
Below is an example trace analysis of a recursive RPC call pattern. In the example application, one remote call, getFile() triggers two other RPC’s, getFileContents() and getFileMeta(). Avro’s tracing has detected this particular pattern and offers a dashboard view summarizing average timing and payload data. It is also showing detailed graphs for one of the specific nodes in this pattern, getFileContents() presenting a visual history of timing (top) and payload (bottom) analytics.
Turnkey tracing is just one of many reasons to use Avro. I recently became a committer on the Avro project and I look forward to supporting and improving trace functionality in the coming months!
*Click on any of the graphs or stats for a larger version
Learn more about Avro and other Hadoop projects at Hadoop World!
-
Jonas / September 15, 2010 / 3:23 AM
Very cool.
When is this available?
Is it production ready?
Is it possible to add custom annotations like in Dapper?
Are you consolidating the stats like in Dapper?
Thanks. -
Philip Zeyliger / September 16, 2010 / 11:11 AM
Hi Jonas,
This was work done as part of https://issues.apache.org/jira/browse/AVRO-595 and made it into the recently-built Avro 1.4.0 release.
Thus far, it’s only been used internally, so it’s still in the early phases. I encourage you to try it!
There’s underlying support for custom annotations, but there isn’t a fronted API for annotations yet.
Whereas Dapper relies on BigTable to aggregate statistics, Avro doesn’t rely on any particular data store. Instead there is a basic mechanism for pulling tracing data from each node. Once the tracing data is in the same place, similar logic to Dapper is used to infer common trace patterns.
Cheers,
– Philip (and Patrick)
- Overview
- Downloads
- Learn Hadoop
- Get Support
-
Blog
- Avro (11)
- Careers (10)
- CDH (21)
- Cloudera Manager (2)
- Cloudera's Service And Configuration Manager (2)
- Community (79)
- Connector (5)
- Data Collection (13)
- Distribution (32)
- Flume (6)
- General (217)
- Guest (34)
- Hadoop (135)
- HBase (29)
- HDFS (24)
- Hive (21)
- MapReduce (35)
- Oozie (4)
- Pig (15)
- Sqoop (8)
- Testing (4)
- Training (18)
- Use Case (9)
- Whirr (1)
- ZooKeeper (8)
- Archives by Month


Filed under
Share this post