Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I run a Stanford CoreNLP Server with the following command:

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer

I try to parse the sentence Who was Darth Vader’s son?. Note that the apostrophe behind Vader is not an ASCII character.

The online demo successfully parse the sentence:

screenshot of the online webserver of CoreNLP

The server I run on localhost fails:

screenshot of the localhost webserver of CoreNLP

I also tried to perform the query using Python.

import requests
url = 'http://localhost:9000/'
sentence = 'Who was Darth Vader’s son?'
r=requests.post(url, params={'properties' : '{"annotators": "tokenize,ssplit,pos,ner", "outputFormat": "json"}'}, data=sentence.encode('utf8'))
tree = r.json()

The last command raises an exception:

ValueError: Invalid control character at: line 1 column 1172 (char 1171)

However, I noticed occurrences of the character x00 in the text (i.e. r.text). If I remove them, the json parsing succeeds:

import json
tree = json.loads(r.text.replace('x00', ''))

Finally, r.encoding is ISO-8859-1, even though I did not use the option -strict to run the server. Note that it does not change anything if I manually replace it by UTF-8.

If I run the same code replacing url = 'http://localhost:9000/' by url = 'http://corenlp.run/', then everything succeeds. The call r.json() returns a dict, r.encoding is indeed UTF-8, and no character x00 is in the text.

What is wrong with the CoreNLP server I run?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
174 views
Welcome To Ask or Share your Answers For Others

1 Answer

This is a known bug with the 3.6.0 release. If you build the server from GitHub, it should work properly with UTF-8 characters. Setting the appropriate Content-Type header in the request will also fix this issue (see https://github.com/stanfordnlp/CoreNLP/issues/125).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...