Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm using: Module: Request -- Simplified HTTP request method to scrape a webpage with accented characters á é ó ú ê ? etc.

I've already tried encoding: utf-8 with no success. I'm still getting this ??? characters in the result.

request.get({
    uri: url,
    encoding: 'utf-8'
    // ...

Is there any configuration to fix it?

I don't know if it is an issue, but I filled one for this module. No answers yet. :/

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
457 views
Welcome To Ask or Share your Answers For Others

1 Answer

Since binary is deprecated it seems like a better idea to use iconv and correctly handle the decoding:

var request = require("request"), iconv  = require('iconv-lite');
var requestOptions  = { encoding: null, method: "GET", uri: "http://something.com"};

request(requestOptions, function(error, response, body) {
    var utf8String = iconv.decode(new Buffer(body), "ISO-8859-1");
    console.log(utf8String);
});

The important part is to set the encoding on the HTTP request to be null encoding: null.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...