Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I've been trying to write down a list intersection algorithm in python that takes care of repetitions. I'm a newbie to python and programming so forgive me if this sounds inefficient, but I couldn't come up with anything else. Here, L1 and L2 are the two lists in question, and L is the intersection set.

  1. Iterate through L1
  2. Iterate through L2
  3. If element is in L1 and in L2
  4. add it to L
  5. remove it from L1 and L2
  6. iterate through L
  7. add elements back to L1 and L2

I'm 100% sure this is not the algorithm used within Mathematica to evaluate list intersection, but I can't really come up with anything more efficient. I don't want to modify L1 and L2 in the process, hence me adding back the intersection to both lists. Any ideas? I don't want to make use of any built in functions/data types other than lists, so no import sets or anything like that. This is an algorithmic and implementation exercise, not a programming one, as far as I'm concerned.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
203 views
Welcome To Ask or Share your Answers For Others

1 Answer

Here is a faster solution:

def intersect_sorted(a1, a2):
  """Yields the intersection of sorted lists a1 and a2, without deduplication.

  Execution time is O(min(lo + hi, lo * log(hi))), where lo == min(len(a1),
  len(a2)) and hi == max(len(a1), len(a2)). It can be faster depending on
  the data.
  """
  import bisect, math
  s1, s2 = len(a1), len(a2)
  i1 = i2 = 0
  if s1 and s1 + s2 > min(s1, s2) * math.log(max(s1, s2)) * 1.4426950408889634:
    bi = bisect.bisect_left
    while i1 < s1 and i2 < s2:
      v1, v2 = a1[i1], a2[i2]
      if v1 == v2:
        yield v1
        i1 += 1
        i2 += 1
      elif v1 < v2:
        i1 = bi(a1, v2, i1)
      else:
        i2 = bi(a2, v1, i2)
  else:  # The linear solution is faster.
    while i1 < s1 and i2 < s2:
      v1, v2 = a1[i1], a2[i2]
      if v1 == v2:
        yield v1
        i1 += 1
        i2 += 1
      elif v1 < v2:
        i1 += 1
      else:
        i2 += 1

It runs in O(min(n + m, n * log(m))) time where n is the minimum of the lengths and m is the maximum. It iterates over both lists at the same time, skipping as many elements in the beginning as possible.

An analysis is available here: http://ptspts.blogspot.ch/2015/11/how-to-compute-intersection-of-two.html


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...