I have a large, weighted and stratified dataset containing among other things co2 emissions per household. The variable total_co2
equals the sum elec_co2 + gas_co2 + oil_co2
. Before moving to more complex relationships I tried to construct a seemingly obvious model.
svy <- svydesign(id=~i_psu,
strata=~i_strata,
weights=~i_hhdenui_xw,
data=df1)
model <- svyglm(total_co2 ~ elec_co2, svy)
summ(model)
MODEL INFO:
Observations: 6826
Dependent Variable: total_co2
Type: Survey-weighted linear regression
MODEL FIT:
R2 = 0.31
Adj. R2 = -1.74
Standard errors: Robust
---------------------------------------------------
Est. S.E. t val. p
----------------- --------- ------- -------- ------
(Intercept) 1962.48 83.68 23.45 0.00
elec_co2 1.27 0.05 23.98 0.00
---------------------------------------------------
Estimated dispersion parameter = 4390445
While the R^2
indicates some explanatory power, the Adj. R^2
is negative, which is usually interpreted as indicating the opposite. How is this possible in such a simple relationship? Where does the negative value come from and how should I interpret it?
Here is a simple plot of the data.
question from:https://stackoverflow.com/questions/65832794/negative-adjusted-r-squared-in-svyglm