Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have two matrices, say one with 2 x 3 dimensions and the other one with 3 x 2.

a = [[1, 0, 1],   
     [1, 0, 1]]

b = [[1, 0],   
     [1, 0],
     [1, 0]]

I would like to return a 2x2 matrix c that's a sum of element-wise logical or operation between a and b.

So the result would be

c = [[3,2]
     [3,2]]

Are there any packages out there to do these operations efficiently? With very large matrices with hundreds of thousands of dimensions, looping through the elements/vectors is really slow.

This is relatively easy to return a 2x2 matrix d that is a result of addition of element-wise logical and operation between a and b.

d = np.dot(a,b) would accomplish this. I'm wondering if there are any packages that are counterparts of np.dot with logic or instead.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
126 views
Welcome To Ask or Share your Answers For Others

1 Answer

Is an object oriented approach acceptable for you ?

#!/usr/bin/env python

from __future__ import absolute_import
from __future__ import print_function
import numpy


class PseudoBinary(object):
    def __init__(self,i):
        self.i = i

    def __mul__(self,rhs):
        return PseudoBinary(self.i or rhs.i)

    __rmul__ = __mul__
    __imul__ = __mul__

    def __add__(self,rhs):
        return PseudoBinary(self.i + rhs.i)

    __radd__ = __add__
    __iadd__ = __add__

    def __str__(self):
        return str(self.i)

    __repr__ = __str__



a = [[PseudoBinary(1), PseudoBinary(0), PseudoBinary(1)],
     [PseudoBinary(1), PseudoBinary(0), PseudoBinary(1)]]

b = [[PseudoBinary(1), PseudoBinary(0)],
     [PseudoBinary(1), PseudoBinary(0)],
     [PseudoBinary(1), PseudoBinary(0)]]

c = numpy.dot(a,b)
print(c)

Prints

[[3 2]
 [3 2]]

I have spent some time with measuring, understanding the performance of this approach. Long story short: this numpy.dot with custom objects is several orders of magnitude slower than regular matrix multiplication of integers.

I am not 100% sure of the root cause of this difference. I have asked a specific question about the reasons of slowness.

The performance graph is like the following: enter image description here

In this plot red curve (base) is the base measurement of numpy.dot(..,..) call with integer matrixes. Blue curve (setOr) is the approach suggested in @vortex's answer. The green curve is numpy.dot() performance using matrixes of custom objects. As you can see the numpy.dot with custom objects is very very slow. I got these numbers in a MacBook Air (13-inch, Early 2014), 1.7 GHz Intel Core i7, 8 GB 1600 MHz DDR3

The code that executes the performance measurement and prints the plot is this: (Tested in python 2.7.10)

#!/usr/bin/env python

# A possible answer and performance analysis for a stackoverflow
# question. https://stackoverflow.com/q/45682641/5771861

from __future__ import absolute_import
from __future__ import print_function
import numpy
import time
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.ticker as tick
import random
import datetime
import timeit

class PseudoBinary(object):
    def __init__(self,i):
        self.i = i

    def __mul__(self,rhs):
        return PseudoBinary(self.i or rhs.i)

    __rmul__ = __mul__
    __imul__ = __mul__

    def __add__(self,rhs):
    return PseudoBinary(self.i + rhs.i)

    __radd__ = __add__
    __iadd__ = __add__

    def __str__(self):
        return "P"+str(self.i)

    __repr__ = __str__

class TestCase(object):
    def __init__(self,n):
        self.n = n

        # Only use square matrixes
        rows = self.n
        cols = self.n

    self.base = numpy.array([[random.getrandbits(1) for x in range(cols)] 
        for y in range(rows)])
    self.pseudo = numpy.array(
        [[PseudoBinary(v) for v in row] for row in self.base])

    @staticmethod
    def printMatrix(m):
        for row in m:
            for v in row:
                print(v,end=" ")
            print("")

    def print(self):
        print("base")
        TestCase.printMatrix(self.base)
        print("pseudo")
        TestCase.printMatrix(self.pseudo)

class TestRes(object):

   def __init__(self):
      self.res = []

   def append(self,v):
      self.res.append(v)

   def mean(self):
      return sum(self.res)/float(len(self.res))

def runWithTime(f,count,msg):
    start = time.time()
    for i in xrange(count):
       f()
    end = time.time()
    elapsed = end-start
    print(msg,"took",str(datetime.timedelta(seconds=end-start)),"seconds")
    return elapsed

def measureAndPrint(execCount):
   random.seed(1)

   print("Start to initialize test data")
   start = time.time()
   sizes = [1, 4, 8, 16, 32]
   testCases = [TestCase(n) for n in sizes]
   end = time.time()
   print("Test data initialization complete in ",
      str(datetime.timedelta(seconds=end-start)))

   measCount = 4

   baseResults = {}
   pseudoResults = {}
   setOrResults = {}

   for tc in testCases:
       print("Test case for",tc.n)

       def base():
       rv = numpy.dot(tc.base,tc.base)
       return rv

       res = TestRes()
       for i in xrange(measCount):
      t = runWithTime(base,execCount,"base")
      res.append(t)
       baseResults[tc.n] = res

       def pseudo():
      rv = numpy.dot(tc.pseudo,tc.pseudo)
      return rv

       res = TestRes()
       for i in xrange(measCount):
      t = runWithTime(pseudo,execCount,"pseudo")
      res.append(t)
       pseudoResults[tc.n] = res

       ones = numpy.ones(tc.n)
       dotInput = ones-tc.base
       def setOr():
      rv = ones*tc.n-numpy.dot(dotInput,dotInput)
      return rv

       res = TestRes()
       for i in xrange(measCount):
      t = runWithTime(setOr,execCount,"setOr")
      res.append(t)
       setOrResults[tc.n] = res

   return baseResults,pseudoResults,setOrResults

def isClose(a, b, rel_tol=1e-09, abs_tol=0.0):
    # https://stackoverflow.com/a/33024979/5771861
    return abs(a-b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)

def formatSeconds(s):
    # A concise printer for a time duration in millisecond accuracy.
    # For example 3 d 12 h 4 m 5 s 234 mi
    def maybeStr(fmt,x):
        # If x is non-zero return the formatted string with x
        if isClose(x,0):
            return ""
        else:
            return fmt.format(x)

    seconds, fraction = divmod(s, 1)
    days, seconds = divmod(seconds, 86400)
    hours, seconds = divmod(seconds, 3600)
    minutes, seconds = divmod(seconds, 60)
    milli = int(fraction * 1000)

    rv = maybeStr("{} d ",days) 
       + maybeStr("{} h ",hours) 
       + maybeStr("{} m ",minutes) 
       + maybeStr("{} s ",seconds) 
       + maybeStr("{} milliS ",milli) 

    if rv=="":
        return "0"
    else:
        return rv

def plotResults(results,color,label):
   # Get the key and values in the same order.
   res = sorted(results.items())
   xx = [x for (x,y) in res]
   yy = [y.mean() for (x,y) in res]
   plt.semilogy(xx,yy,color,label=label)
   plt.scatter(xx,yy,c=color)

   # Add an annotation to each measurement data point.
   for x,y in res:
      yValue = y.mean()
      plt.annotate(str(formatSeconds(yValue)),(x,yValue))

multiplicationCount = 1000
baseResults,pseudoResults,setOrResults = measureAndPrint(multiplicationCount)

plotResults(baseResults,"r","base")
plotResults(pseudoResults,"g","pseudo")
plotResults(setOrResults,"b","setOr")
plt.legend(loc="upper left")
plt.title("numpy.dot() performance measurements")
plt.ylabel("Mean seconds taken by {} multiplications".format(multiplicationCount))
plt.xlabel("Dimension of square matrix")

def yFmt(val,pos):
   return formatSeconds(val)

axes = plt.gca()
yaxis = axes.get_yaxis()
yaxis.set_major_formatter(tick.FuncFormatter(yFmt))

plt.show()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...