I've been trying to implement my own solution to recover the alignment of two strings with edit distance. For instance, given the two strings "hitter" and "hotel", I'd like to recover the "h-otel" = "hitter" alignment, where "-" indicates an insertion, that underlies the calculation of the distance. My code so far however only gets to the cost matrix part of it, as shown below:
leven <- function(str1, str2){
str1_len <- nchar(str1)
str2_len <- nchar(str2)
d <- matrix(nrow = nchar(str1)+1,
ncol = nchar(str2)+1)
d[,1] <- seq(from = 0, to = nchar(str1), by = 1)
d[1,] <- seq(from = 0, to = nchar(str2), by = 1)
for (j in 1:str2_len){
for (i in 1:str1_len){
if(substr(str1,i,i) == substr(str2,j,j)){
cost <- 0
}
else{
cost <- 1
}
d[i+1,j+1] <- min(d[i,j+1]+1,
d[i+1,j]+1,
d[i,j]+cost)
}
}
d
}
leven("hotel", "hitter")
From what I understand, the key is to somehow retrace one's steps back to the beginning starting from the bottom-right corner of the matrix, but I'm a bit lost as to how in specific this should be done. Any help would be greatly appreciated; thanks in advance!
Edit 1: I've come to know that this might be called the Needleman-Wunsch algorithm, which is used to align DNA sequences. For instance, given the strings "GCATGCU" and "GATTACA", we can expect the output of "GCATG-CU" and "G-ATTACA", again with "-" as insertions, where the two strings are now aligned.
question from:https://stackoverflow.com/questions/66059512/how-to-recover-alignment-from-edit-distance-in-r