Example: NaN Detection and Removal
Use the markNaN function to mark outliers as NaN (Not a Number) in data sets. Use the IsNaN, matchNaN and filterNaN functions to manage NaNs.
1. Read a file containing the number of sunspots recorded over the last three centuries, and plot the data.
2. Use the ThreeSigma function to find the indices of the outliers.
The outliers are the number of sunspots recorded during the following years:
3. Use the markNaN function to mark as NaN the outliers in column 1 of the data.
Replacing data with NaNs indicates that a measurement was made, but the rows containing NaNs can be filtered out before processing.
4. Use the matchNaN function to find the indices of the spots that have been marked as NaN.
The data in rows 257 and 278 has been replaced by the built-in constant NaN:
5. Use the IsNaN function to check if the year 1957 has been marked as NaN in the Data and in the MarkedData sets.
6. Plot the new data set, and compare it with the old set.
The outliers from the original data set are not highlighted in blue, since the plot skips the NaN recorded in the MarkedData set.
7. Use the filterNaN function to filter the matrix MarkedData set to remove the rows containing NaNs.
8. Use the rows function to calculate the number of rows in the Data and FilteredData sets.
The number of rows in FilteredData has decreased by two.
9. Use the mean function to calculate the mean of the MarkedData and the FilteredData sets.
Statistics can be collected for the FilteredData set, but not for the MarkedData set.