Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am trying to write a simple script that will take csv as an input and write it in a single spreadsheet document. Now I have it working however the script is slow. It takes around 10 minutes to write cca 350 lines in two worksheets.

Here is the script I have:

#!/usr/bin/python
import json, sys
import gspread
from oauth2client.client import SignedJwtAssertionCredentials

json_key = json.load(open('client_secrets.json'))
scope = ['https://spreadsheets.google.com/feeds']

# change to True to see Debug messages
DEBUG = False

def updateSheet(csv,sheet):
    linelen = 0
    counter1 = 1 # starting column in spreadsheet: A
    counter2 = 1 # starting row in spreadsheet: 1
    counter3 = 0 # helper for iterating through line entries
    credentials = SignedJwtAssertionCredentials(json_key['client_email'], json_key['private_key'], scope)

    gc = gspread.authorize(credentials)

    wks = gc.open("Test Spreadsheet")
    worksheet = wks.get_worksheet(sheet)
    if worksheet is None:
        if sheet == 0:
            worksheet = wks.add_worksheet("First Sheet",1,8)
        elif sheet == 1:
            worksheet = wks.add_worksheet("Second Sheet",1,8)
        else:
            print "Error: spreadsheet does not exist"
            sys.exit(1)

    worksheet.resize(1,8)

    for i in csv:
        line = i.split(",")
        linelen = len(line)-1
        if (counter3 > linelen):
            counter3 = 0
        if (counter1 > linelen):
            counter1 = 1

        if (DEBUG):
            print "entry length (starting from 0): ", linelen
            print "line: ",  line
            print "counter1: ", counter1
            print "counter3: ", counter3
        while (counter3<=linelen):
            if (DEBUG):
                print "writing line: ", line[counter3]
            worksheet.update_cell(counter2, counter1, line[counter3].rstrip('
'))
            counter3 += 1
            counter1 += 1

        counter2 += 1
        worksheet.resize(counter2,8)

I am sysadmin so I apologize in advance for shitty code.

Anyway, the script will take line by line from csv, split by comma and write cell by cell, hence it takes time to write it. The idea is to have cron execute this once a day and it will remove older entries and write new ones -- that's why I use resize().

Now, I am wondering if there is a better way to take whole csv line and write it in the sheet with each value in it's own cell, avoiding writing cell by cell like I have now? This would significantly reduce time it takes to execute it.

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
614 views
Welcome To Ask or Share your Answers For Others

1 Answer

Yes, this can be done. I upload in chunks of 100 lines by 12 rows and it handles it fine - I'm not sure how well this scales though for something like a whole csv in one go. Also be aware that the default length of a sheet is 1000 rows and you will get an error if you try to reference a row outside of this range (so use add_rows beforehand to ensure there is space). Simplified example:

data_to_upload = [[1, 2], [3, 4]]

column_names = ['','A','B','C','D','E','F','G','H', 'I','J','K','L','M','N',
                'O','P','Q','R','S','T','U','V','W','X','Y','Z', 'AA']

# To make it dynamic, assuming that all rows contain same number of elements
cell_range = 'A1:' + str(column_names[len(data_to_upload[0])]) + str(len(data_to_upload)) 

cells = worksheet.range(cell_range)

# Flatten the nested list. 'Cells' will not by default accept xy indexing.
flattened_data = flatten(data_to_upload)

# Go based on the length of flattened_data, not cells. 
# This is because if you chunk large data into blocks, all excess cells will take an empty value
# Doing the other way around will get an index out of range
for x in range(len(flattened_data)):
    cells[x].value = flattened_data[x].decode('utf-8')

worksheet.update_cells(cells)

If your rows are of different lengths then clearly you would need to insert the appropriate number of empty strings into cells to ensure that the two lists don't get out of sync. I use decode for convenience because I kept crashing with special characters so seems best to just have it in.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...