Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am looking to optimize the small code below:

def update_users_genre_lang_score(cursor):
    cursor.execute("select user_id,playDuration,lang,genre from sd_archive_track_clicks where playDuration > 15 and user_id!=0 and genre!=0 and lang!=0 and lang <21 and genre <24 and playDate > '2016-10-01'order by playDate desc")
    db.commit()
    numrows = int(cursor.rowcount)
    tracks_played= cursor.fetchall()

    genre_list=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]
    lang_list=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
    #initialization part
    user_genre_score = {}
    user_lang_score = {}

    for track in tracks_played:
        user_genre_score[track['user_id']]={}
        user_lang_score[track['user_id']]={}
        for genre in genre_list:
            user_genre_score[track['user_id']][genre]=0
        for lang in lang_list:
            user_lang_score[track['user_id']][lang]=0

    #initialization part end
    for track in tracks_played:
        user_genre_score[track['user_id']][track['genre']]=int(user_genre_score[track['user_id']][track['genre']]) + 1
        user_lang_score[track['user_id']][track['lang']]=int(user_lang_score[track['user_id']][track['lang']]) + 1

Is there any way I can optimize the initialization step?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
242 views
Welcome To Ask or Share your Answers For Others

1 Answer

You may get some speedup by creating default dicts and copying them to your records. Here is sample code with some comments...

def update_users_genre_lang_score(cursor):
    # you are asking for a lot of stuff but only using a little. Is this
    # stuff consumed in this function?
    cursor.execute("select user_id,playDuration,lang,genre from sd_archive_track_clicks where playDuration > 15 and user_id!=0 and genre!=0 and lang!=0 and lang <21 and genre <24 and playDate > '2016-10-01'order by playDate desc")
    # what is the commit for?
    # db.commit()
    numrows = int(cursor.rowcount)
    tracks_played= cursor.fetchall()
    #print tracks_played

    genre_list=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]
    genre_default = {genre:0 for genre in genre_list}
    lang_list=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
    lang_default = {lang:0 for lang in lang_list}

    #initialization part

    user_genre_score = {}
    user_lang_score = {}
    for track in tracks_played:
            user_id = track['user_id']
            user_genre_score[user_id]=genre_default.copy()
            user_lang_score[user_id]=lang_default.copy()

    #initialization part end

    # this seems like an expensive way to initialize to 1 instead of 0...
    # am i missing something?!
    for track in tracks_played:
        user_genre_score[track['user_id']][track['genre']] += 1
        user_lang_score[track['user_id']][track['lang']] += 1

UPDATE

You could initialize with collections.defaultdict so that items are generated dynamically as you touch them. This saves you from revisiting the nodes for each time user_id appears in the rows.

import collections

def update_users_genre_lang_score(cursor):
    cursor.execute("select user_id,playDuration,lang,genre from sd_archive_track_clicks where playDuration > 15 and user_id!=0 and genre!=0 and lang!=0 and lang <21 and genre <24 and playDate > '2016-10-01'order by playDate desc")
    # what is the commit for?
    # db.commit()
    numrows = int(cursor.rowcount)
    tracks_played= cursor.fetchall()
    #print tracks_played

    #initialization part

    # this creates a two level nested dict ending in an integer count 
    # that generates items dynamically
    user_genre_score = collections.defaultdict(lambda: collections.defaultdict(int))
    user_lang_score = collections.defaultdict(lambda: collections.defaultdict(int))

    #initialization part end

    for track in tracks_played:
            user_genre_score[track['user_id']][track['genre']] += 1
            user_lang_score[track['user_id']][track['lang']] += 1

How It Works

defaultdict can make your brain explode - fair warning. With dict, accessing a non-existent key raises KeyError. But with defaultdict, it calls an initializer you supply and creates a key for you. When you call int() you get a 0.

>>> int() 
0

So if we make it the initializer, you get 0 when you first access a new key

>>> d1 = collections.defaultdict(int)
>>> d1
defaultdict(<class 'int'>, {})
>>> d1['user1']
0
>>> d1
defaultdict(<class 'int'>, {'user1': 0})

If you increment a new key, python first gets the item which does the initialization

>>> d1['user2'] += 1
>>> d1
defaultdict(<class 'int'>, {'user1': 0, 'user2': 1})

But you need two levels of dicts..., so have the outer one create inner defaultdict

>>> d2 = collections.defaultdict(lambda:collections.defaultdict(int))
>>> d2['user1']
defaultdict(<class 'int'>, {})
>>> d2['user1']['genre1']
0
>>> d2
defaultdict(<function <lambda> at 0x7efedf493bf8>, {'user1': defaultdict(<class 'int'>, {'genre1': 0})})
>>> d2['user1']['genre2'] += 1
>>> d2
defaultdict(<function <lambda> at 0x7efedf493bf8>, {'user1': defaultdict(<class 'int'>, {'genre1': 0, 'genre2': 1})})

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...