I'm currently looking at determining the Levenshtein distance (coded in C#) between two words based on a series of pre-determined phonological correspondences. The difference between two characters is based on a 0-1 scale. This works quite well for substitutions.
The problem arises with insertions and deletions. Under normal circumstances, adding or deleting a character would give a score of 1, but some characters in this specific language (e.g. the 'h') can be added or deleted without causing any problems, and therefore they should get a lower score (of, say, 0.1).
Is it possible to pre-determine the score for insertions/deletions independently from substitutions (because substitution of this 'h' would still require a high score), and to set a score for a specific character's insertion/deletion? Currently, the algorithm seems to override any set value and to be determined to add a score of "1" to each insertion/deletion of 'h'.
question from:
https://stackoverflow.com/questions/65891368/how-to-preset-a-score-for-a-specific-type-of-insertion-deletion-in-levenshtein 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…