Aspire Content Processing
Aspire is designed to acquire data from one or more content repositories (such as file systems, relational databases, cloud storage or content management systems), extract metadata and text from the documents, process the content and metadata as needed, and then publish each document, together with its metadata, to a search engine or other application.
Aspire uses Apache Felix (an open source implementation of OSGi) to install, start, stop, update, and uninstall Aspire components and applications without requiring a reboot, supporting improved uptime and making system administration easier. Each individual piece of processing functionality within Aspire is a modular component that can be used by itself, or in conjunction with other components to create an Aspire application.
Aspire is designed to provide:
• Performance and reliability by supporting:
o Distributed processing and automatic threading
o The ability to split document processing jobs into sub jobs that can run in parallel
o High availability via Zookeeper
o Intelligent preprocessing prior to indexing
o Optimized unstructured content for indexing and search
o Consistent, rich content (including metadata) to improve performance and user satisfaction
• Ease of administration:
o Making dynamic configuration changes
o Dynamically adding new components
o Web-based administration interface for managing servers and content sources
• A strong developer environment:
o Rich built-in JSON and XML processing methods, including XPath, XSLT
o Use of scripting to build complex processing components
o Hierarchical component configuration
o Sharing and loading component code
o Ability to write to HDFS and then create and run map/reduce jobs on that data to support "big data"
• Accuracy analysis and improvement controls
o Monitor user activity with integration
o Compute accuracy metrics
o Best Bets
Aspire deployments can be divided into four high-level functional areas:
• Administration supports installing, starting, stopping, updating, uninstalling, and securing Aspire applications.
• Content access refers to the features to access the documents and
associated metadata from the content source as part of creating a Content
Source. A Content Source in Aspire is a configuration of components for
accessing, processing and publishing content. The applications that perform
content access functions are called Aspire Connectors. These use the
supported application programing interfaces of target repositories to
access content, metadata, and security credentials.
• Content processing refers to the features for analyses, augmenting, and
transforming content in a Content Source.
• Publisher refers to the features responsible for pushing the processed
text to the target system in a Content Source.
With more than 800 customers worldwide, Search Technologies is the leading trusted and independent technology services firm specializing in the design, implementation, and management of search and big data analytics applications. Our experienced team and unique technical assets help us deliver customized search and analytics applications that are easier to use, less expensive, more powerful, and more reliable. To learn more, visit www.searchtechnologies.com.
Postive business outcomes
Poor quality content, especially metadata, is a leading cause of user dissatisfaction and underperformance in search applications. Aspire specifically handles unstructured data, providing a powerful solution for connectivity, cleansing, normalization, enhancement, analysis and publishing of human-generated content to search engines and big data applications.
Can be deployed on-premise or in the cloud
Metrics and proof points
- Scalability to billions of records
- Content processing for a wide range of content types (500+)
Number of connectors:
- RDB via Table
- RDB via Snapshots
- RDB via Table
- RDB via Snapshots
- Amazon S3
- Basic File Systems
- Feed One
- Jira Issues
Content Management Systems
- SharePoint 2007/2010
- SharePoint 2013 (On premise)
- SharePoint Online (O365)
- File System
- Apache Subversion (SVN)
- Atlassian Confluence
- IBM Connections