I'm practising my importing and cleaning skills and have reached a bit of a quagmire. I've been importing from here. The importing works and I have been able to drop na's. However, the issue is that certain observations are written in such a way (for example 13.7 (2016)
). Because of how they're written they're read in as strings and even if they weren't they would contain false information.
I want to get rid of the year observations which are in the parentheses but preserve the data observation itself.
At present here is my code:
#Declare Missing Variables
missing_values = ['?', np.nan]
#Read Data
dfs = pd.read_html('https://en.wikipedia.org/wiki/List_of_countries_by_firearm-related_death_rate', na_values=missing_values)
#Set Dataset and Drop Variables
df = dfs[3]
df_drops = df[['Year', 'Undetermined', 'Sources and notes']]
df.drop(df_drops, inplace = True, axis=1)
print(df)
# pd.to_numeric(df['Guns per 100 inhabitants'])
Any help appreciated!