FactoryTalk Analytics Integration > Asynchronous Prediction Custom Processor
Asynchronous Prediction Custom Processor
Overview
The Asynchronous Prediction custom processor can be built into a DataFlowML pipeline and used to run batch scoring jobs against ThingWorx Analytics models. The data evaluated by the scoring job must be in CSV format and must be available from an HDFS location. The HDFS location must be provided to the processor as a configuration parameter.
When the Asynchronous Prediction processor is built into a DataFlowML pipeline and launched, the scoring job runs and a Job ID is output. The Job ID is returned to your pipeline. To view the actual prediction results, enter this Job ID as input to the Scoring Result custom processor.
Uploading and Configuring the Processor
To use the Asynchronous Prediction processor, add it to a pipeline in DataFlowML and configure it with parameters as described below.
1. In DataFlowML, select Data Pipeline from the left navigation panel (). The Pipeline Definition page opens.
2. In the panel on the right, ensure that the Auto Inspection option is enabled. The default is enabled:
3. Upload the JAR file containing the custom processors as follows:
Click the Upload option (). A file selection dialog box opens.
File Selection dialog box
Select the JAR file that contains the custom processors.
Click OK. The JAR file is uploaded.
* 
This JAR file must be uploaded once for each pipeline that you create.
4. Navigate to the Processors tab and select the Custom processor. Drag is to the pipeline page on the left to add it to the pipeline.
5. When the Custom processor has been added to the pipeline, right-click on the processor icon (). The Configuration Settings – Custom dialog box opens.
config settings dialog box
6. On the Configuration tab, enter the following Implementation Class value to identify the Custom processor as the Asynchronous Prediction processor:
com.thingworx.analytics.rockwell.processor.AsyncPredictionProcessor
7. Click Add Configuration. A parameter row with two columns is added.
8. In the left column enter a key and in the right column, enter a corresponding value. For a list of the required and optional configuration parameters, see the charts below.
9. Repeat steps 7 and 8 until all necessary parameters are added.
completed config settings dialog box
10. After all parameters for the processor are added, click Next. The Add Notes tab is displayed.
11. Add any notes about the configuration and click Save. The processor configuration is saved and the dialog box closes.
Required Configuration Parameters
Key
Value
Implementation Class
com.thingworx.analytics.rockwell.processor.AsyncPredictionProcessor
TWA_PREDICTION_IP
The IP address of your ThingWorx Analytics Prediction microserver.
TWA_PREDICTION_PORT
The port where ThingWorx Analytics Prediction microserver is connected.
To locate the port number, navigate to your ThingWorx Analytics Server installation directory and open the config/system-environment-variables.properties file. Port numbers are listed for each microservice.
TWA_USE_PROXY
false = not installed behind a reverse proxy, true = is installed behind a reverse proxy
GOAL_FIELD
The name of the goal data field on which scoring runs.
MODEL_ID*
The Model Result ID that is output when a model is trained in ThingWorx Analytics.
This parameter cannot be configured until after a training job has been run manually in Analytics Builder. To find the Model Job ID, navigate to the Models page in Analytics Builder, select the model, and click View to open the Model Results page.
CSV_PATH*
The location in HDFS and the name of a CSV file that is available for scoring.
METADATA_PATH
The location and name for a JSON metadata file. The metadata file can be accessed from any location supported by ThingWorx Analytics Server. Example:
hdfs://metadata.json
* 
This parameter becomes unnecessary if the optional HAS_HEADER parameter (see the chart below) is set to true.
* These parameters can be provided as configuration parameters or they can be passed to the processor in a message sent from another component. If values for these parameters are provided in both ways, the message processing takes precedence.
Optional Configuration Parameters
Key
Value
CATEGORICAL_LIMIT
Limits the number of values returned when scoring with a categorical goal field. Scoring with a categorical goal results in a prediction for every value category. Using a goal with many categories (such as postal code) can greatly affect performance time. This parameter allows you to limit a categorical goal to the top N values returned.
CAUSAL_TECHNIQUE
The technique used to measure the influence of each field on the goal in a scored record. Options include:
Full Range – Searches for the fields that, when changed, show the largest overall variation in the prediction values. Measures the distance across the range of values from the minimum to the maximum.
Distance from Max – Searches for the fields that, when changed, increase the value of the prediction the most. Measures the distance from the current value to the maximum value.
Distance from Min – Searches for the fields that, when changed, decrease the value of the prediction the most. Measures the distance from the current value to the minimum value.
HAS_HEADER
false = the CSV data file does not include a header row (Default), true = the CSV data file includes a header row
* 
When this parameter is set to true, the required METADATA_PATH parameter (see the chart above) becomes unnecessary.
IDENTIFIER_FIELDS
Any additional fields you want to include in the scoring job to help identify which row of data a score applies to. Use a comma to separate each field.
IMPORTANT_FIELD_COUNT
The number of important fields you want returned in the results. The most influential fields for each record, up to the selected number, are returned with the scoring job results. A weight of influence is also included for each important field returned
PREFERRED_CATEGORICAL_VALUES
Limits the number of values returned when scoring with a categorical goal field. Scoring with a categorical goal results in a prediction for every value category. Using a goal with many categories (such as postal code) can greatly affect performance time. This parameter allows you to specify a limited set of categorical goal values to score. Use a comma to separate each value.
TAGS
String values that can be leveraged for search and filter purposes. Use a comma to separate each tag.
TWA_PREDICTION_PROXY_PATH
A URL for the reverse proxy, if in use. Optional parameter. Default path is /prediction.
TWA_PREDICTION_USE_SSL
false = running on HTTP, true = running on HTTPS
Processor Input
There are no inputs required for the Asynchronous Prediction processor.
Processor Output
The Asynchronous Prediction processor outputs a Job ID and a Status URI in the form shown below. The results are returned to your pipeline for use in other processors. To look up the actual prediction scores, enter this Job ID as input to the Scoring Result custom processor.
{
"jobId": "",
"statusUri": ""
}
Was this helpful?