Key Analytics Infotables

ThingWorx Analytics Data > Handling Data via ThingWorx Things > Key Analytics Infotables

In ThingWorx Analytics Server, two key infotables are in use for providing dataset and metadata information to a job request. Regardless of how the request is submitted (via a REST call, a mashup, or a service), the dataset and metadata must be provided in the form of an infotable.

An infotable is an instance of a data shape that includes data. Two commonly–used infotables are the following:

• AnalyticsDatasetRef – This infotable references a specific dataset, including a URI and the data format (both will vary depending on whether the data is a stored dataset or in-place data provided with a request).

• AnalyticsDatasetMetadata – This infotable is a data shape containing the machine learning characteristics that describe a dataset, including fieldName, dataType, opType, and more.

Metadata only needs to be provided if you are accessing data that is not already in the appropriate microservice, that is, the data URI does not use the scheme dataset://.

AnalyticsDatasetRef

The AnalyticsDatasetRef contains the following fields:

• datasetUri – A string that points to the location of the data you want to include in the request. Data can be accessed from a variety of locations. The syntax of this field includes two components separated by a colon. On the left side of the colon is a scheme, which indicates the source of the data (such as a stored dataset or in-place data provided directly to a request). On the right side of the colon is a path, which provides the specific location. The possible schemes include:

◦ thingworx:/ – Points to a ThingWorx file repository, such as AnalyticsUploadStorage, where data can be uploaded and stored.

◦ thingworxs:/ – Functions the same as the thingworx:/ scheme but is used when ThingWorx Foundation server is accessed over TLS (HTTPS).

◦ file:/ – Points to the ThingWorx Analytics Server file system where data can be loaded directly and accessed in place. Accessing data from the file system is useful for small datasets with rapidly changing data.

◦ body:/ – Points to in-place data that can be supplied directly as part of an API request body. This method is useful for scoring or model evaluation jobs. Data supplied this way is not stored.

◦ dataset:/ – Points to a dataset created and stored in a microservice.

• format – A string that indicates the storage format of the data. Supported values include:

◦ csv – For use with the thingworx:/, thingworxs:/, body:/ or file:/ schemes

◦ parquet – For use with the dataset:/ scheme

Format values must be indicated in lower case.

• filter – A string that contains clause conditions for an SQL WHERE statement to describe the characteristics of the data that should be included. It has the effect of removed rows of data from the dataset (does not remove columns).

• exclusions – A list of strings that remove specific fields from the dataset. It has the effect of removed columns from each row in the dataset (does not remove rows).

• data – An untyped infotable that must be provided in order to pass data as part of a request body. A data infotable is used only when the datasetUri parameter is set to body:/.

The data parameter can accept any infotable that includes only primitive base type fields (STRING, INTEGER, NUMBER). It must not contain additional nested infotables and it must match the corresponding metadata infotable. The data infotable must include at least the following:

◦ A Data Shape that defines each column of the data, including field names and base types. For additional information about creating and working with Data Shapes, see the Data Shapes section of the ThingWorx Foundation Help Center.

◦ Rows of actual data. A row entry must be specified for every record of data.

For a sample of a javascript service that creates a data infotable as part of an AnalyticsDatasetRef table, see Sample Javascript Service.

• metadata – A string that points to the metadata infotable and must be provided in order to pass metadata as part of a request body.

AnalyticsDatasetMetadata

This infotable is a static data shape containing the machine learning characteristics that describe a dataset. It includes the following fields:

• fieldName – A string that provides the field name for a column of data.

• dataType – A string that indicates the format of the data in this field. Acceptable values include:

◦ STRING

◦ BOOLEAN

◦ DOUBLE

◦ INTEGER

◦ LONG

◦ DATETIME

For information about specific data types, see OpType DataType Combinations.

• opType – A string that indicates how the data behaves. Acceptable values include:

◦ CONTINUOUS

◦ CATEGORICAL

◦ BOOLEAN

◦ ORDINAL

◦ ENTITY_ID

◦ TEMPORAL

◦ TEXT

For information about specific op types, see OpType DataType Combinations.

• min – For continuous values, this field represents the lowest expected value.

This is an informational field.

When submitting metadata in a JSON file, both the min and the max values are nested in a range parameter. However, in this Infotable, when querying metadata in ThingWorx, these parameters are flattened out so that min and max values are returned without the range field.

• max – For continuous values, this field represents the highest expected value.

This is an informational field.

• values – For ordinal and categorical values, this field contains a list of possible values. For ordinal, the values must be listed in the correct order. For categorical, the order of values doesn’t matter.

• timeSamplingInterval – For time series datasets, this value indicates the time interval between adjacent rows of data. If the dataset does not adhere to the specified interval, an error will occur.

• isStatic – For time series datasets, this flag can be used to indicate that a field should not change over time. When set to true, this field will not undergo any time series transformations. Setting this flag where appropriate will improve performance.

• missingValueTreatment – Indicates how a missing dataset value should be handled for a specific field. This field is optional but required in order to impute missing values. Acceptable values include:

◦ AS_IS

◦ AS_VALUE

◦ AS_MEAN

◦ AS_MEDIAN

◦ AS_LAST

For more information about the options available for imputing missing values, see Handling Missing Dataset Values.

• missingValueReplacement – Indicates what value should replace all missing values for a specific field. This field is required when the value for missingValueTreatment is AS_VALUE. Do not use it for other missing value treatments.

• missingValueIndicators – Indicates a list of values that should be interpreted as missing for a specific field. By default, null entries are considered missing. This field is optional but can only be used when a missing value treatment is specified in missingValueTreatment.

Indicators must be of the same dataType as the field.

Was this helpful?