I have a .txt file with thousands of lines. In this file, I have a meta information about research articles. Every paper has information about Published year (PY), Title (TI), DOI number (DI), Publishing Type (PT) and Abstract (AB). So, the information of almost 300 papers exist in the text file. The format of information about first two article is as follows.
PT J
AU Filieri, Raffaele
Acikgoz, Fulya
Ndou, Valentina
Dwivedi, Yogesh
TI Is TripAdvisor still relevant? The influence of review credibility,
review usefulness, and ease of use on consumers' continuance intention
SO INTERNATIONAL JOURNAL OF CONTEMPORARY HOSPITALITY MANAGEMENT
DI 10.1108/IJCHM-05-2020-0402
EA NOV 2020
PY 2020
AB Purpose - Recent figures show that users are discontinuing their usage
of TripAdvisor, the leading user-generated content (UGC) platform in the
tourism sector. Hence, it is relevant to study the factors that
influence travelers' continued use of TripAdvisor.
Design/methodology/approach - The authors have integrated constructs
from the technology acceptance model, information systems (IS)
continuance model and electronic word of mouth literature. They used
PLS-SEM (smartPLS V.3.2.8) to test the hypotheses using data from 297
users of TripAdvisor recruited through Prolific.
Findings - Findings reveal that perceived ease of use, online consumer
review (OCR) credibility and OCR usefulness have a positive impact on
customer satisfaction, which ultimately leads to continuance intention
of UGC platforms. Customer satisfaction mediates the effect of the
independent variables on continuance intention.
Practical implications - Managers of UGC platforms (i.e. TripAdvisor)
can benefit from the findings of this study. Specifically, they should
improve the ease of use of their platforms by facilitating travelers'
information searches. Moreover, they should use signals to make credible
and helpful content stand out from the crowd of reviews.
Originality/value - This is the first study that adopts the IS
continuance model in the travel and tourism literature to research the
factors influencing consumers' continued use of travel-based UGC
platforms. Moreover, the authors have extended this model by including
new constructs that are particularly relevant to UGC platforms, such as
performance heuristics and OCR credibility.
ZR 0
ZA 0
Z8 0
ZS 0
TC 0
ZB 0
Z9 0
SN 0959-6119
EI 1757-1049
UT WOS:000592516500001
ER
PT J
AU Li, Yelin
Bu, Hui
Li, Jiahong
Wu, Junjie
TI The role of text-extracted investor sentiment in Chinese stock price
prediction with the enhancement of deep learning
SO INTERNATIONAL JOURNAL OF FORECASTING
VL 36
IS 4
BP 1541
EP 1562
DI 10.1016/j.ijforecast.2020.05.001
PD OCT-DEC 2020
PY 2020
AB Whether investor sentiment affects stock prices is an issue of
long-standing interest for economists. We conduct a comprehensive study
of the predictability of investor sentiment, which is measured directly
by extracting expectations from online user-generated content (UGC) on
the stock message board of Eastmoney.com in the Chinese stock market. We
consider the influential factors in prediction, including the selections
of different text classification algorithms, price forecasting models,
time horizons, and information update schemes. Using comparisons of the
long short-term memory (LSTM) model, logistic regression, support vector
machine, and Naive Bayes model, the results show that daily investor
sentiment contains predictive information only for open prices, while
the hourly sentiment has two hours of leading predictability for closing
prices. Investors do update their expectations during trading hours.
Moreover, our results reveal that advanced models, such as LSTM, can
provide more predictive power with investor sentiment only if the inputs
of a model contain predictive information. (C) 2020 International
Institute of Forecasters. Published by Elsevier B.V. All rights
reserved.
CT 14th International Conference on Services Systems and Services
Management (ICSSSM)
CY JUN 16-18, 2017
CL Dongbei Univ Finance & Econ, Sch Management Sci & Engn, Dalian, PEOPLES
R CHINA
HO Dongbei Univ Finance & Econ, Sch Management Sci & Engn
SP Tsinghua Univ; Chinese Univ Hong Kong; IEEE Syst Man & Cybernet Soc
ZA 0
TC 0
ZB 0
ZS 0
Z8 0
ZR 0
Z9 0
SN 0169-2070
EI 1872-8200
UT WOS:000570797300025
ER
Now, I want to extract the abstract of each article and store it in the data frame. To extract the abstract I have the following code, which gives me the first match of abstract.
f = readLines("sample.txt")
#extract first match....
pattern <- "AB\s*(.*?)\s*ZR"
result <- regmatches(as.String(f), regexec(pattern, as.String(f)))
result[[1]][2]
[1] "Purpose - Recent figures show that users are discontinuing their usage
of TripAdvisor, the leading user-generated content (UGC) platform in the
tourism sector. Hence, it is relevant to study the factors that
influence travelers' continued use of TripAdvisor.
Design/methodology/approach - The authors have integrated constructs
from the technology acceptance model, information systems (IS)
continuance model and electronic word of mouth literature. They used
PLS-SEM (smartPLS V.3.2.8) to test the hypotheses using data from 297
users of TripAdvisor recruited through Prolific.
Findings - Findings reveal that perceived ease of use, online consumer
review (OCR) credibility and OCR usefulness have a positive impact on
customer satisfaction, which ultimately leads to continuance intention
of UGC platforms. Customer satisfaction mediates the effect of the
independent variables on continuance intention.
Practical implications - Managers of UGC platforms (i.e. TripAdvisor)
can benefit from the findings of this study. Specifically, they should
improve the ease of use of their platforms by facilitating travelers'
information searches. Moreover, they should use signals to make credible
and helpful content stand out from the crowd of reviews.
Originality/value - This is the first study that adopts the IS
continuance model in the travel and tourism literature to research the
factors influencing consumers' continued use of travel-based UGC
platforms. Moreover, the authors have extended this model by including
new constructs that are particularly relevant to UGC platforms, such as
performance heuristics and OCR credibility."
The problem is, I want to extract all the abstracts but the pattern would be different for most of the abstracts. So the specific pattern for all the abstract is that I should extract text starting from AB and every next line having space in the front. Any body can help me in this regard?