ThingWorx Analytics Data
Data Loading and Storage
ThingWorx Analytics data is not stored in a database, but rather, is persisted directly to a file system which is optimized for ThingWorx Analytics. When data is uploaded, it’s converted to an optimized Parquet format and stored directly in the file system. There are no limitations on the number of data columns the system can handle.
This procedure streamlines the dataset creation tasks. CSV data and JSON metadata can be uploaded in a single job. The metadata can either be uploaded as a pre-created JSON file or it can be inferred automatically from the CSV data itself. The dataset is optimized automatically when it’s created. When new data is appended to an existing dataset, a new partition is added and reoptimization is optional.
Data access is URI-driven and can be accessed from the following locations:
• AnalyticsUploadStorage – a ThingWorx file repository where data can be uploaded and stored (thingworx://AnalyticsUploadStorage/)
• ThingWorx Analytics Server file system – data can be loaded directly from a location that is readable by the Analytics Server and accessed in place, useful for small datasets with rapidly changing data (file:/)
• API request body – data can be supplied as part of an API call, useful for scoring or model evaluation jobs or for appending small amounts of data to an existing dataset (body:/)
• ThingWorx Analytics Server dataset – a dataset created in ThingWorx Analytics (dataset:/<jobID>)
Data Shapes
In ThingWorx, a Data Shape is a way of defining a set of data so that it can be more easily consumed. ThingWorx supports a number of Data Shape base types, but the infotable is the type ThingWorx Analytics Server uses to handle both data inputs and outputs. Instead of referencing a dataset in the URL of an API call, the dataset information is provided as an input parameter in the request body. Two key Data Shapes for referencing datasets in ThingWorx Analytics are the following:
• AnalyticsDatasetRef – An infotable that references a specific dataset. It includes a URI and the data format (both will vary depending on whether the data is a stored dataset or in-place data provided with a request).
• AnalyticsDatasetMetadata – An infotable that contains the machine learning characteristics that describe a dataset. It includes fieldName, dataType, opType, and more.
For more detailed information about each of these infotables, see
Key Analytics Infotables.