ThingWorx Analytics Data > Prepare Data and Metadata
Prepare Data and Metadata
Preparing data for ThingWorx Analytics includes both a CSV file containing the raw data and a JSON file defining the metadata structure of the data. The metadata can either be provided as a JSON file or auto-detected from the data file.
Keep in mind the following when preparing data for upload:
CSV files must include headers. Make sure there are no spaces before or after the column header names.
For time series data, temporal and entity ID columns must be included. In addition, ensure that data is streamed such that the timestamps increase in regular increments and any gaps that exist are generally incidental and small. To view a sample of time series data, download this Sample Time Series CSV File from the Reference Documents section of the PTC eSupport Portal.
ThingWorx Analytics does not have a limit on the number of rows or columns in a dataset. However, the amount of memory and processing power available to the application might constrain the number of rows or columns.
To avoid problems that can be caused when the first character of a field name is a number, observe the rules for Data Field Names that Start with a Number.
Metadata can either be provided in a pre-created JSON file, or a DetectMetadata service can be used to infer the metadata from the CSV data. When metadata detection is enabled, you can specify a data URI during data upload and an AnalyticsDatasetMetadata infotable is automatically returned. You can review and modify the infotable before using it.
If uploading the metadata in a JSON file, the file must be formatted as outlined in the following chart. Links to metadata samples follow the chart.
Optional parameters can be set to null or omitted (both will have the same effect).
The exact name of the field as it appears in the dataset.
A list of the acceptable values for the field.
For Ordinal opTypes, the values must be presented in the correct order.
Required if the opType is Ordinal
Optional for Categorical opType
Do not use for Boolean and Continuous
For a Continuous field, defines the minimum and maximum values the field can accept. For informational purposes only. Should be specified as:
"range": {"min": <value>, "max": <value>}
When querying metadata in ThingWorx, the Infotable is flattened and the Min and Max values are returned without the range field.
Describes what type of data the field contains. Options include: STRING, DOUBLE, BOOLEAN, INTEGER.
These options must be entered as all uppercase values. Lowercase values will lead to errors.
Select the most accurate dataType. Selecting the String dataType for numeric data can lead to undesirable results.
Selecting the Integer dataType for a Continuous goal does not indicate that the scores output during Training will also be integers. Because the validation process cannot accept integers, the dataType for Continuous goals is converted internally from Integer to Double. In the resulting PMML output, scores are reported as more accurate floating point numbers.
Describes how the data in the field can be used. Options include: CONTINUOUS, CATEGORICAL, ORDINAL, BOOLEAN, TEMPORAL, ENTITY_ID.
These options must be entered as all uppercase values. Lowercase values will lead to errors.
For information about which opTypes can be used with which dataTypes, see OpType DataType Combinations.
An integer representing the time between observations in a temporal field.
Required if the opType is Temporal
Do not use for other opTypes
A flag indicating whether or not the value in a temporal field can change over time. Marking a field as static reduces training time by removing redundant data points for fields that do not change.
To view metadata samples, see the following:
Was this helpful?