Content Crawler
A content crawler Thing is used to call a service on another entity. A content crawler is used to retrieve data and store the data in the Data Table of the content crawler Thing.
On a separate entity from the content crawler Thing, you must define a service that fetches data and returns an infotable of that data back to the content crawler. The content crawler then maps the incoming fields and the tags to the fields used in the Data Shape for the content crawler. Each row is added as a new entry to the Data Table on the content crawler Thing. The index of the content crawler’s Data Table works in the same way as a Data Table entity.
Creating a Content Crawler
To retrieve data from the Data Table of an entity into the Data Table of the content crawler Thing, do the following:
1. Create a Data Shape and define fields to use in a Data Table. To create a Data Shape from Composer, browse Modeling > Data Shapes, and then click the New button.
a. Enter a name and description.
b. In the Field Definitions area, click the Add button.
c. In the new field definition pane, enter appropriate information, and then click .
2. Create a data table with the Data Shape created in the previous step. To create a Data Table from Composer, browse Data Storage > Data Tables, and then click the New button.
a. Select a Data Table template, and then click OK.
b. Enter the name, description, and select the Data Shape you created in the previous step.
c. In the Services area, create a custom service by clicking Add.
d. In the Output area, select INFOTABLE from the drop-down list.
e. Select the Data Shape created in the previous step.
f. Set the Infotable Type as Is Content Crawler Entry, and then click Done.
3. Create a new Data Shape for the content crawler thing.
* 
You can create a new content crawler-specific Data Shape, or you can use the same Data Shape that was used in the Data Table created in step 1. Although this step is optional, we will use a new Data Shape for the content crawler Thing in this example.
a. Create a new content crawler thing:
i. From Composer, browse Modeling > Things, and then click the New button.
ii. Enter a name, and in the Base Thing Template field, select Content Crawler.
iii. In the Data Shape field, select the Data Shape you created in the previous step, and then click Save.
Content Crawler Configuration
The Configuration area for the content crawler Thing contains configuration tables that allow you to map fields from the retrieved data.
The Field to Tag Mappings configuration table maps the values of a field to tags in a data tag vocabulary.
When the data tag vocabulary is dynamic, any value that is mapped from the data has a term automatically entered into the vocabulary.
When the data tag vocabulary is not dynamic, any value that is mapped from the data has a pre-defined term representing the value to be mapped properly.
For example: TestingVocab:false;TestingVocab:iAmAString. The first part is the boolProp value, and the second part is the stringProp value.
The Index Settings configuration for a Data Table allows you to define additional table indexes. This is similar to a relational database table, where in addition to the primary key (the primary key is defined in the Data Shape), you need to query the table based on other fields. You should create an index for each set of filter criteria commonly used. Doing so has a significant impact on query performance.
The Field to Field Mappings configuration table maps the fields from the retrieved data to the fields defined on the Data Shape of the content crawler Thing.
* 
If the same Data Shape is used on the content crawler Thing and for the infotable returned from the content crawler service, field mappings are handled automatically.
Content Crawler Services
The following services are unique to the content crawler Thing:
CrawlEntries — Purges all of the Data Table entries for the content crawler, and then executes GetExternalContent.
GetExternalContent — Executes the service defined on the General Information area of the content crawler Thing. An infotable of retrieved values is returned from the service. No modifications to the Data Table for the content crawler are performed.
GetExternalContentDetail — Retrieves a specific content item by key.
Was this helpful?