Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I want to be able to fuzzy match one column and exact match another column.

Say I df1 looks like this:

enter image description here

And df2 looks like this:

enter image description here

I want to fuzzy match the "Name" but exact match the "Year." So "Ashley" and "Ashlee" would be a match. This is what I have so far:

res <- fuzzy_left_join(
  df,
  df2,
  by=c("Year","Name"),
  list(`==`, function(x,y) stringdist(tolower(x), tolower(y), method="lv") <= 3)
)
res %>% 
  select(Year = Year.x, everything(), - Year.y)

It appears to be over-matching, though. Not sure what's going on.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
847 views
Welcome To Ask or Share your Answers For Others

1 Answer

It seems you are on the right track (hard to tell without your data or you showing us your result!)

The fuzzyjoin will provide all answers with string distance <=3, which may be the "overmatching" you describe.

You can use %>% group_by(Year,Name) %>% slice_min(dist) to get the best answer according to distance.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...