I have a data set of medicine names in a column. I am trying to extract the name ,strength and unit of each medicine from this data. The term MG and ML are the qualifiers of strength in the setup. For example, let us consider the following given data set for the names of the medicines.
Medicine name
----------------------
FALCAN 150 MG tab
AUGMENTIN 500MG tab
PRE-13 0.5 ML PFS inj
NS.9%w/v 250 ML, Glass Bottle
I want to extract the following information columns from this data set,
Name | Strength |Unit
---------| ---------|------
FALCAN | 150 |MG
AUGMENTIN| 500 |MG
PRE-13 | 0.5 |ML
NS.9%w/v | 250 |ML
I have tried grepl
etc command and could not find a good solution. I have around >12000 data to identify. Data does not follow a fixed pattern, and at few places MG and strength does not have a space in between such as 300MG. ,