Configure HDFS
For DataFlowML and ThingWorx Analytics to communicate, both need access to the same HDFS location. Through the HDFS connection, the custom processors in DataFlowML pipelines can interact with ThingWorx Analytics to send data, launch processes, and receive results. To support the capability for ThingWorx Analytics to communicate with HDFS, some configuration is necessary.
* 
The HDFS location cannot be accessed from ThingWorx APIs.
HDFS supports working in either an unauthenticated environment or in a Kerberos-authenticated environment. The configuration required for each varies, so choose the appropriate section below and follow the listed steps.
* 
If you choose to work unauthenticated, anyone who knows your HDFS URI can access your system.
Before Configuration
Before you can begin configuring ThingWorx Analytics microservers to support HDFS, ensure that the following requirements have been met:
Apache Hadoop is installed and configured according to the Analytics DataFlowML Installation Guide.
Your Hadoop directory is configured with global read permissions.
Kerberos properties are configured according to the Analytics DataFlowML Installation Guide. Necessary only to work in an authenticated environment.
* 
As you work your way through the configuration process, note the following about the examples provided in the sections below:
All file paths and commands are listed for a Linux format, exept where specified. If you work in a Windows operating system, adjust your file paths and commands accordingly.
The HDFS configuration is flexible and the examples below are intended for reference only. For clarity, file paths and file names that you need to replace with your own installation-specific values are presented in bold type face.
Services to Configure
In the configuration procedures, you are asked to modify each ThingWorx Analytics service that requires access to HDFS. The list of services that require access includes the following:
twas-analytics-worker-1
twas-analytics-worker-2
twas-analytics-worker-3
twas-clustering-ms
twas-data-ms
twas-edge-ms
twas-predictive-ms
twas-prescriptive-ms
twas-profiles-ms
twas-results-ms
twas-signals-ms
twas-training-ms
twas-validation-ms
* 
By default, ThingWorx Analytics is installed with three analytics workers. However, your installation might include fewer or more workers.
Related Links
Was this helpful?