Functions > Data Analysis > Outliers and NaN > Example: NaN Detection and Removal
  
Example: NaN Detection and Removal
Use the markNaN function to mark outliers as NaN (Not a Number) in data sets. Use the IsNaN, matchNaN and filterNaN functions to manage NaNs.
1. Read a file containing the number of sunspots recorded over the last three centuries, and plot the data.
Click to copy this expression
Click to copy this expression
2. Use the ThreeSigma function to find the indices of the outliers.
Click to copy this expression
The outliers are the number of sunspots recorded during the following years:
Click to copy this expression
Click to copy this expression
3. Use the markNaN function to mark as NaN the outliers in column 1 of the data.
Click to copy this expression
Click to copy this expression
Replacing data with NaNs indicates that a measurement was made, but the rows containing NaNs can be filtered out before processing.
4. Use the matchNaN function to find the indices of the spots that have been marked as NaN.
Click to copy this expression
The data in rows 257 and 278 has been replaced by the built-in constant NaN:
5. Use the IsNaN function to check if the year 1957 has been marked as NaN in the Data and in the MarkedData sets.
Click to copy this expression
Click to copy this expression
6. Plot the new data set, and compare it with the old set.
Click to copy this expression
The outliers from the original data set are not highlighted in blue, since the plot skips the NaN recorded in the MarkedData set.
7. Use the filterNaN function to filter the matrix MarkedData set to remove the rows containing NaNs.
Click to copy this expression
8. Use the rows function to calculate the number of rows in the Data and FilteredData sets.
Click to copy this expression
Click to copy this expression
The number of rows in FilteredData has decreased by two.
9. Use the mean function to calculate the mean of the MarkedData and the FilteredData sets.
Click to copy this expression
Click to copy this expression
Statistics can be collected for the FilteredData set, but not for the MarkedData set.