Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am fetching distinct words in a string column of a DataTable (.dt) and then replacing the unique values with another value, so essentially changing words to other words. Both approaches listed below work, however, for 90k records, the process is not very fast. Is there a way to speed up either approach?

The first approach, is as follows:

   'fldNo is column number in dt
   For Each Word As String In DistinctWordList
      Dim myRow() As DataRow
      myRow = dt.Select(MyColumnName & "='" & Word & "'")
      For Each row In myRow
         row(fldNo) = dicNewWords(Word)
      Next
   Next

A second LINQ-based approach is as follows, and is actually not very fast either:

   Dim flds as new List(of String)
   flds.Add(myColumnName)
   For Each Word As String In DistinctWordsList
     Dim rowData() As DataRow = dt.AsEnumerable().Where(Function(f) flds.Where(Function(el) f(el) IsNot DBNull.Value AndAlso f(el).ToString = Word).Count = flds.Count).ToArray
     ReDim foundrecs(rowData.Count)
     Cnt = 0
     For Each row As DataRow In rowData
       Dim Index As Integer = dt.Rows.IndexOf(row)
       foundrecs(Cnt) = Index + 1 'row.RowId
       Cnt += 1
     Next
     For i = 0 To Cnt
       dt(foundrecs(i))(fldNo) = dicNewWords(Word)
     Next 
   Next
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
252 views
Welcome To Ask or Share your Answers For Others

1 Answer

So you have your dictionary of replacements:

Dim d as New Dictionary(Of String, String)
d("foo") = "bar"
d("baz") = "buf"

You can apply them to your table's ReplaceMe column:

Dim rep as String = Nothing
For Each r as DataRow In dt.Rows
  If d.TryGetValue(r.Field(Of String)("ReplaceMe"), rep) Then r("ReplaceMe") = rep 
Next r

On my machine it takes 340ms for 1 million replacements. I can cut that down to 260ms by using column number rather than name - If d.TryGetValue(r.Field(Of String)(0), rep) Then r(0) = rep

Timing:

    'setup, fill a dict with string replacements like "1" -> "11", "7" -> "17"
    Dim d As New Dictionary(Of String, String)
    For i = 0 To 9
        d(i.ToString()) = (i + 10).ToString()
    Next

    'put a million rows in a datatable, randomly assign dictionary keys as row values
    Dim dt As New DataTable
    dt.Columns.Add("ReplaceMe")
    Dim r As New Random()
    Dim k = d.Keys.ToArray()
    For i = 1 To 1000000
        dt.Rows.Add(k(r.Next(k.Length)))
    Next

    'what range of values do we have in our dt?
    Dim minToMaxBefore = dt.Rows.Cast(Of DataRow).Min(Function(ro) ro.Field(Of String)("ReplaceMe")) & " - " & dt.Rows.Cast(Of DataRow).Max(Function(ro) ro.Field(Of String)("ReplaceMe"))

    'it's a crappy way to time, but it'll prove the point
    Dim start = DateTime.Now

    Dim rep As String = Nothing
    For Each ro As DataRow In dt.Rows
        If d.TryGetValue(ro.Field(Of String)("ReplaceMe"), rep) Then ro("ReplaceMe") = rep
    Next

    Dim ennd = DateTime.Now

    'what range of values do we have now
    Dim minToMaxAfter = dt.Rows.Cast(Of DataRow).Min(Function(ro) ro.Field(Of String)("ReplaceMe")) & " - " & dt.Rows.Cast(Of DataRow).Max(Function(ro) ro.Field(Of String)("ReplaceMe"))


    MessageBox.Show($"min to max before of {minToMaxBefore} became {minToMaxAfter} proving replacements occurred, it took {(ennd - start).TotalMilliseconds} ms for 1 million replacements")

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...