NegativeArraySizeException Error Message Generated by CORRECT_SPELLING Function

Problem 

When using the CORRECT_SPELLING function, the following exception is generated:

!message:ComputationException: clean_final: =CORRECT_SPELLING(#MySheet.dictionary;#MySheet.words_to_correct) failed with NegativeArraySizeException:

This exception might be observed in the Datameer GUI or it may be observed in the Job Trace logs with more detail. Here is an example of the errors observed in the Job Trace:

!message:ComputationException: clean_final: =CORRECT_SPELLING(#MySheet.dictionary;#MySheet.words_to_correct) failed with NegativeArraySizeException:
!Error Repeated:7 times
...
!stack:datameer.dap.common.exception.ComputationException: clean_final: =CORRECT_SPELLING(#MySheet.dictionary;#MySheet.words_to_correct) failed with NegativeArraySizeException:
        at datameer.dap.common.formula.RecordContext.createComputationException(RecordContext.java:128)
        at datameer.dap.common.formula.lazy.RecordEvalSequence.toComputationException(RecordEvalSequence.java:123)
        at datameer.dap.common.formula.lazy.RecordEvalSequence.moveToNext(RecordEvalSequence.java:135)
        at datameer.dap.common.formula.lazy.ExpressionEvaluator2$2.computeNext(ExpressionEvaluator2.java:114)
        at datameer.dap.common.formula.lazy.ExpressionEvaluator2$2.computeNext(ExpressionEvaluator2.java:111)
        at datameer.dap.sdk.sequence.Sequence$Simple.moveToNext(Sequence.java:157)
        at datameer.dap.sdk.sequence.Sequence$13.moveToNext(Sequence.java:602)
        at datameer.dap.common.graphv2.hadoop.MrJobKeyValueMapper.run(MrJobKeyValueMapper.java:76)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 Caused by: java.lang.NegativeArraySizeException
        at org.apache.lucene.search.spell.SpellChecker.formGrams(SpellChecker.java:519)
        at org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:425)
        at org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:366)
        at org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:249)
        at datameer.das.plugin.textmining.preprocess.CorrectSpellingFunction$SpellingCorrector.correctSpelling(CorrectSpellingFunction.java:80)
        at datameer.das.plugin.textmining.preprocess.CorrectSpellingFunction$SpellingCorrector.compute(CorrectSpellingFunction.java:68)
        at datameer.dap.common.formula.lazy.EvalSequence$5$1.computeValue(EvalSequence.java:111)
        at datameer.dap.common.formula.lazy.SingleEvalSequence.currentValue(SingleEvalSequence.java:31)
        at datameer.dap.common.formula.lazy.ArgumentsEvalSequence.currentValue(ArgumentsEvalSequence.java:118)
        at datameer.dap.common.formula.lazy.EvalSequence.currentIsError(EvalSequence.java:47)
        at datameer.dap.common.formula.lazy.RecordEvalSequence.moveToNext(RecordEvalSequence.java:134)
        ... 12 more

Cause

This is a known defect with the CORRECT_SPELLING function. Specifically, the CORRECT_SPELLING function generates this error when the "Tokenized Text" LIST object contains an empty string as a list element.

Internally, this is known as DAP-23452. 

Resolution

To work-around this issue, ensure that the LIST of "Tokenized Text" passed into the CORRECT_SPELLING function does not contain any empty string values. For example, use the following function to wrap around the "Tokenized Text" to remove any empty string elements from the LIST:

=REMOVE(<CurrentLIST>;"")

In a future release of Datameer, the CORRECT_SPELLING function will not generate this exception for empty strings (*).

For more information, please contact Datameer Support.