Prepare Data and Metadata

ThingWorx Analytics Data > Prepare Data and Metadata

Preparing data for ThingWorx Analytics includes both a CSV file containing the raw data and a JSON file defining the metadata structure of the data. The metadata can either be provided as a JSON file or auto-detected from the data file.

Data

Keep in mind the following when preparing data for upload:

• CSV files must include headers. Make sure there are no spaces before or after the column header names.

• For time series data, temporal and entity ID columns are optional. Ensure that data is streamed such that the timestamps increase in regular increments and any gaps that exist are generally incidental and small. To view a sample of time series data, download this Sample Time Series CSV File from the Reference Documents section of the PTC eSupport Portal.

• ThingWorx Analytics does not have a limit on the number of rows or columns in a dataset. However, the amount of memory and processing power available to the application might constrain the number of rows or columns.

• Field names that include square brackets or back ticks (`) can cause problems for distribution queries and should be avoided.

• For ThingWorx Analytics 9.2 and later, there are no restrictions for using numbers in field names. For releases earlier than 9.2, problems can be caused when the first character of a field name is a number. To avoid these problems, observe the rules for Data Field Names that Start with a Number.

Metadata

Metadata can either be provided in a pre-created JSON file, or a DetectMetadata service can be used to infer the metadata from the CSV data. When metadata detection is enabled, you can specify a data URI during data upload and an AnalyticsDatasetMetadata infotable is automatically returned. You can review and modify the infotable before using it.

If uploading the metadata in a JSON file, the file must be formatted as outlined in the following chart. Links to metadata samples follow the chart.

Optional parameters can be set to null or omitted (both will have the same effect).

Parameter

Description

Required/Optional

fieldName

The exact name of the field as it appears in the dataset.

Required

values

A list of the acceptable values for the field.

For Ordinal opTypes, the values must be presented in the correct order.

Required if the opType is Ordinal

Optional for Categorical opType

Do not use for Boolean, Continuous, and Text.

range

For a Continuous field, defines the minimum and maximum values the field can accept. For informational purposes only. Should be specified as:

"range": {"min": <value>, "max": <value>}

When querying metadata in ThingWorx, the Infotable is flattened and the Min and Max values are returned without the range field.

Optional

dataType

Describes what type of data the field contains. Options include: STRING, BOOLEAN, DOUBLE, INTEGER, LONG, DATETIME.

These options must be entered as all uppercase values. Lowercase values will lead to errors.

Select the most accurate data type. For information about which data types can be used with which op type, see OpType DataType Combinations

Required

opType

Describes how the data in the field can be used. Options include: CONTINUOUS, CATEGORICAL, BOOLEAN, ORDINAL, ENTITY_ID, TEMPORAL, INFORMATIONAL, TEXT.

These options must be entered as all uppercase values. Lowercase values will lead to errors.

For information about which opTypes can be used with which dataTypes, see OpType DataType Combinations.

Required

timeSamplingInterval

An integer representing the time between observations in a temporal field.

Optional if the opType is Temporal

Do not use for other opTypes

isStatic

A flag indicating whether or not the value in a temporal field can change over time. Marking a field as static reduces training time by removing redundant data points for fields that do not change.

Optional

missingValueTreatment

Indicates how a missing dataset value should be handled for a specific field. Options include: AS_IS, AS_VALUE, AS_MEAN, AS_MEDIAN, AS_LAST. For more information about the options available for imputing missing values, see Handling Missing Dataset Values.

Optional but required in order to impute missing values.

missingValueReplacement

Indicates what value should replace all missing values for a specific field. Can only be used when the value for missingValueTreatment is AS_VALUE.

Required when AS_VALUE is selected for the missing value treatment. Do not use for other missing value treatments.

missingValueIndicators

Indicates a list of values that should be interpreted as missing for a specific field. By default, null entries are considered missing.

Indicators must be of the same dataType as the field.

Optional but can only be used when a missing value treatment is specified in missingValueTreatment.

To view metadata samples, see the following:

• Non-time series metadata sample

• Time series metadata sample

Was this helpful?