Python 3.6中 'utf-8' codec can't decode byte invalid start byte?

Question

Welcome To Ask or Share your Answers For Others

Python 3.6中 'utf-8' codec can't decode byte invalid start byte?

asked Feb 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

Python 3.6中，网页信息解析失败，试了很多种编码，查看网页的编码方式也是utf-8。
错误信息：'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte?
还有就是第一个print终端里打印出来的unicode内容是[b'x1fx8bx08x00x...]这种格式的，之前也有过这种情况，一个print打2个变量，就是b'x, 如果分来2行打又变回了汉字。是因为什么原因呢？

# -*- coding: utf-8 -*-
import json , sqlite3
import urllib.request

url = ('http://wthrcdn.etouch.cn/weather_mini?city=%E4%B8%8A%E6%B5%B7')
resp = urllib.request.urlopen(url)
content = resp.read()

print(content)
print(type(content))
print(content.decode('utf-8'))

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

428 views

1 Answer

深蓝 · Answer 1 · 2021-02-16T17:44:58+0000

图片描述

看了一下网站返回的是gzip压缩过的数据，所以要进行解码

# coding=utf-8
from io import BytesIO
import gzip
import urllib.request

url = ('http://wthrcdn.etouch.cn/weather_mini?city=%E4%B8%8A%E6%B5%B7')
resp = urllib.request.urlopen(url)
content = resp.read() # content是压缩过的数据

buff = BytesIO(content) # 把content转为文件对象
f = gzip.GzipFile(fileobj=buff)
res = f.read().decode('utf-8')
print(res)

图片描述

Categories

Python 3.6中 'utf-8' codec can't decode byte invalid start byte?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags