Data Cleansing: About the Tools Available
A time-consuming part of the data loading process is reviewing and cleansing the data and ensuring that it meets the data format requirements specified by the appropriate RBInfo files and DTDs.
* 
Data files must be in XML format and must comply with the DTD supplied with the Windchill installation (available from the standardX20 directory). For information about creating and validating XML data files, see Validating the XML Format.
This section provides the pros and cons of different tool options for cleansing the data.
Text Editors: Pros and Cons
Text editors, such as WordPad, Notepad, and TextPad, are simple to use and readily available. You might choose to use one of these if additional data conversion is not necessary and the size of the data file is small. They provide a quick view of what is about to be loaded.
Pros
Simple to use
Available on most operating systems
Provide a quick review of data
Minimal learning curve
Can be used for simple search and replace
No additional coding or preparation is required to view data in this tool
Works for XML, CSV, and other text files
Cons
Data validation not available
Additional data transformation is not available
Not viable for large amounts of data
Microsoft Excel: Pros and Cons
Customers may have routines to produce data in flat-file format separated by a delimiter, such as a comma. Microsoft Excel is a good tool to use to view this data in neatly organized rows and columns.
Pros
Simple to use
Available on most Windows operating systems
Provides a quick review of data
Can be used for simple find-and-replace operations
Can develop formulas to validate data
Minimal learning curve
Provides the ability to manually move columns, which limits additional data transformation
Provides the ability to filter and sort data. This provides an easy approach to identifying duplicate parts and documents as well as producing a list of items to add to resource bundles.
Data consistency
Cons
Data must be parsed into columns manually (using the “Text To Columns” functionality)
Limited to 65,536 rows of data per worksheet
Not available when working on non-Windows platforms
Requires that data be in row and column format
Custom Code: Pros and Cons
Custom developed code can greatly enhance the load process. Developing XML transforms, Java programs, or both allows the customer to continually deliver data in the same format regardless of the desired load file format. With the introduction of contexts, custom-built code can be used to separate data into groups of load files as is now required.
Custom code is typically used in conjunction with another tool, such as a text editor.
This approach is most likely to be used when the load file format is different from the format of the data that was provided.
Pros
Can be developed with special rules for data validation
Transforms data from neutral (customer-supplied) to load-ready format
Allows data segregation
Consistent and explainable outcomes, unlike manual manipulation involved with the use of a tool such as Excel where there is the potential for error
Transportable and shareable
Runs on multiple operating systems
Cons
Tightly coupled to input and output. A change in the input or output requires a change to the custom code.
No user interface to review the data
Usually requires the use of an additional tool for reviewing data
Multiple skill sets are required
Combination of manual and programmatic efforts, after code is developed
¿Fue esto útil?