Aspire Content Processing
Aspire is designed to acquire data from one or more content repositories (such as file systems, relational databases, cloud storage or content management systems), extract metadata and text from the documents, process the content and metadata as needed, and then publish each document, together with its metadata, to a search engine or other application.
Aspire uses Apache Felix (an open source implementation of OSGi) to install, start, stop, update, and uninstall Aspire components and applications without requiring a reboot, supporting improved uptime and making system administration easier. Each individual piece of processing functionality within Aspire is a modular component that can be used by itself, or in conjunction with other components to create an Aspire application.
Aspire is designed to provide:
• Performance and reliability by supporting:
o Distributed processing and automatic threading
o The ability to split document processing jobs into sub jobs that can run in parallel
o High availability via Zookeeper
o Intelligent preprocessing prior to indexing
o Optimized unstructured content for indexing and search
o Consistent, rich content (including metadata) to improve performance and user satisfaction
• Ease of administration:
o Making dynamic configuration changes
o Dynamically adding new components
o Web-based administration interface for managing servers and content sources
• A strong developer environment:
o Rich built-in JSON and XML processing methods, including XPath, XSLT
o Use of scripting to build complex processing components
o Hierarchical component configuration
o Sharing and loading component code
o Ability to write to HDFS and then create and run map/reduce jobs on that data to support "big data"
• Accuracy analysis and improvement controls
o Monitor user activity with integration
o Compute accuracy metrics
o Best Bets
Aspire deployments can be divided into four high-level functional areas:
• Administration supports installing, starting, stopping, updating, uninstalling, and securing Aspire applications.
• Content access refers to the features to access the documents and
associated metadata from the content source as part of creating a Content
Source. A Content Source in Aspire is a configuration of components for
accessing, processing and publishing content. The applications that perform
content access functions are called Aspire Connectors. These use the
supported application programing interfaces of target repositories to
access content, metadata, and security credentials.
• Content processing refers to the features for analyses, augmenting, and
transforming content in a Content Source.
• Publisher refers to the features responsible for pushing the processed
text to the target system in a Content Source.
Search Technologies, now part of Accenture Analytics, is the leading Technology Services firm specializing in the design, implementation, and management of search and bit data analytics solutions. Both search and big data require a deep understanding of the nature of structured and unstructured content, and how to extract knowledge and business value from the data.
We have delivered results for over 800 customers including industry leaders in e-commerce, publishing, media, financial services, professional staffing, manufacturing, as well as the government sector. Our expert engineers and unique technical assets help us to deliver customized search and big data analytics solutions that are easier to use, less expensive, more powerful, more reliable, and most importantly, aligned with your business objectives.
Postive business outcomes
Poor quality content, especially metadata, is a leading cause of user dissatisfaction and underperformance in search applications. Aspire specifically handles unstructured data, providing a powerful solution for connectivity, cleansing, normalization, enhancement, analysis and publishing of human-generated content to search engines and big data applications.
Can be deployed on-premise or in the cloud
Metrics and proof points
- Scalability to billions of records
- Content processing for a wide range of content types (500+)
Number of connectors:
- RDB via Table
- RDB via Snapshots
- RDB via Table
- RDB via Snapshots
- Amazon S3
- Basic File Systems
- Feed One
- Jira Issues
Content Management Systems
- SharePoint 2007/2010
- SharePoint 2013 (On premise)
- SharePoint Online (O365)
- File System
- Apache Subversion (SVN)
- Atlassian Confluence
- IBM Connections