Look at the following:
/home/kinka/workspace/py/tutorial/tutorial/pipelines.py:33: Warning: Incorrect string
value: 'xF0x9Fx91x8AxF0x9F...' for column 't_content' at row 1
n = self.cursor.execute(self.sql, (item['topic'], item['url'], item['content']))
The string 'xF0x9Fx91x8A
, actually is a 4-byte unicode: u'U0001f62a'
. The mysql's character-set is utf-8 but inserting 4-byte unicode it will truncate the inserted string.
I googled for such a problem and found that mysql under 5.5.3 don't support 4-byte unicode, and unfortunately mine is 5.5.224.
I don't want to upgrade the mysql server, so I just want to filter the 4-byte unicode in python, I tried to use regular expression but failed.
So, any help?