Your browser is out of date

Update your browser to view this website correctly. Update my browser now


Aspire Content Processing

Solutions Gallery >  Aspire Content Processing

Solution overview

Aspire is designed to acquire data from one or more content repositories (such as file systems, relational databases, cloud storage or content management systems), extract metadata and text from the documents, process the content and metadata as needed, and then publish each document, together with its metadata, to a search engine or other application.

Aspire uses Apache Felix (an open source implementation of OSGi) to install, start, stop, update, and uninstall Aspire components and applications without requiring a reboot, supporting improved uptime and making system administration easier. Each individual piece of processing functionality within Aspire is a modular component that can be used by itself, or in conjunction with other components to create an Aspire application.

Aspire is designed to provide:

• Performance and reliability by supporting:

    o Distributed processing and automatic threading 

    o The ability to split document processing jobs into sub jobs that can run in parallel

    o High availability via Zookeeper 

    o Intelligent preprocessing prior to indexing

    o Optimized unstructured content for indexing and search 

    o Consistent, rich content (including metadata) to improve performance and user satisfaction       

• Ease of administration: 

    o Making dynamic configuration changes

    o Dynamically adding new components 

    o Web-based administration interface for managing servers and content sources

• A strong developer environment: 

    o Rich built-in JSON and XML processing methods, including XPath, XSLT

    o Use of scripting to build complex processing components 

    o Hierarchical component configuration

    o Sharing and loading component code 

    o Ability to write to HDFS and then create and run map/reduce jobs on that data to support "big data"    

• Accuracy analysis and improvement controls 

    o Monitor user activity with integration

    o Compute accuracy metrics 

    o Thesauri

    o Best Bets

Aspire deployments can be divided into four high-level functional areas:

• Administration supports installing, starting, stopping, updating, uninstalling, and securing Aspire applications. 

• Content access refers to the features to access the documents and
associated metadata from the content source as part of creating a Content
Source. A Content Source in Aspire is a configuration of components for
accessing, processing and publishing content. The applications that perform
content access functions are called Aspire Connectors. These use the
supported application programming interfaces of target repositories to
access content, metadata, and security credentials.

• Content processing refers to the features for analyses, augmenting, and
transforming content in a Content Source.

• Publisher refers to the features responsible for pushing the processed
text to the target system in a Content Source.

Key highlights

Modernize architecture

About Aspire

Search Technologies, now part of Accenture Analytics, is the leading Technology Services firm specializing in the design, implementation, and management of search and bit data analytics solutions. Both search and big data require a deep understanding of the nature of structured and unstructured content, and how to extract knowledge and business value from the data.

We have delivered results for over 800 customers including industry leaders in e-commerce, publishing, media, financial services, professional staffing, manufacturing, as well as the government sector. Our expert engineers and unique technical assets help us to deliver customized search and big data analytics solutions that are easier to use, less expensive, more powerful, more reliable, and most importantly, aligned with your business objectives.

Postive business outcomes

Poor quality content, especially metadata, is a leading cause of user dissatisfaction and underperformance in search applications. Aspire specifically handles unstructured data, providing a powerful solution for connectivity, cleansing, normalization, enhancement, analysis and publishing of human-generated content to search engines and big data applications.

Required capabilities

Can be deployed on-premises or in the cloud

Metrics and proof points

  • Scalability to billions of records
  • Content processing for a wide range of content types (500+)

Solution Benefits

Number of connectors:

Website (HTTP)

  • RDB via Table
  • RDB via Snapshots

Relational Databases

  • RDB via Table
  • RDB via Snapshots


  • RightNow
  • Salesforce

File Systems

  • Amazon S3
  • Basic File Systems
  • CIFS


  • FTP
  • Feed One
  • Jira Issues
  • RSS

Content Management Systems

  • Documentum
  • SharePoint 2007/2010
  • SharePoint 2013 (On premise)
  • SharePoint Online (O365)

Staging Repository

  • File System


  • Apache Subversion (SVN)
  • Atlassian Confluence
  • eRoom
  • IBM Connections
  • JIVE
  • Jira
  • SocialCast
  • TeamForge

Learn more about the solution

Yes, I would like to be contacted by Cloudera for newsletters, promotions, events and marketing activities. Please read our privacy and data policy.
Yes, I consent to my information being shared with Cloudera's solution partners to offer related products and services. Please read our privacy and data policy.

I agree to Cloudera's terms and conditions.

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extention blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.