Rapleaf processes the many feeds of data that it collects and synthesizes all of that data into a single, accurate view using Hadoop. Log messages are sent through Scribe and loaded into the Hadoop Distributed File System (HDFS). Log data is loaded into Hadoop every ten minutes, amounting to 1-2 TB each day. Other data sources load hourly or daily. Rapleaf has other jobs that run periodically on the logs to compute stats and make sure everything is running correctly.
This browser does not support inline PDF's. Please download the pdf to view it.