Short version
For those that don't want to read through my "case", this is the essence:
- What is the recommended way of minimizing the chances of new packages breaking existing code, i.e. of making the code you write as robust as possible?
What is the recommended way of making the best use of the namespace mechanism when
a) just using contributed packages (say in just some R Analysis Project)?
b) with respect to developing own packages?
How best to avoid conflicts with respect to formal classes (mostly Reference Classes in my case) as there isn't even a namespace mechanism comparable to
::
for classes (AFAIU)?
The way the R universe works
This is something that's been nagging in the back of my mind for about two years now, yet I don't feel as if I have come to a satisfying solution. Plus I feel it's getting worse.
We see an ever increasing number of packages on CRAN, github, R-Forge and the like, which is simply terrific.
In such a decentralized environment, it is natural that the code base that makes up R (let's say that's base R and contributed R, for simplicity) will deviate from an ideal state with respect to robustness: people follow different conventions, there's S3, S4, S4 Reference Classes, etc. Things can't be as "aligned" as they would be if there were a "central clearing instance" that enforced conventions. That's okay.
The problem
Given the above, it can be very hard to use R to write robust code. Not everything you need will be in base R. For certain projects you will end up loading quite a few contributed packages.
IMHO, the biggest issue in that respect is the way the namespace concept is put to use in R: R allows for simply writing the name of a certain function/method without explicitly requiring it's namespace (i.e. foo
vs. namespace::foo
).
So for the sake of simplicity, that's what everyone is doing. But that way, name clashes, broken code and the need to rewrite/refactor your code are just a matter of time (or of the number of different packages loaded).
At best, you will know about which existing functions are masked/overloaded by a newly added package. At worst, you will have no clue until your code breaks.
A couple of examples:
- try loading RMySQL and RSQLite at the same time, they don't go along very well
- also RMongo will overwrite certain functions of RMySQL
- forecast masks a lot of stuff with respect to ARIMA-related functions
- R.utils even masks the
base::parse
routine
(I can't recall which functions in particular were causing the problems, but am willing to look it up again if there's interest)
Surprisingly, this doesn't seem to bother a lot of programmers out there. I tried to raise interest a couple of times at r-devel, to no significant avail.
Downsides of using the ::
operator
- Using the
::
operator might significantly hurt efficiency in certain contexts as Dominick Samperi pointed out. - When developing your own package, you can't even use the
::
operator throughout your own code as your code is no real package yet and thus there's also no namespace yet. So I would have to initially stick to thefoo
way, build, test and then go back to changing everything tonamespace::foo
. Not really.
Possible solutions to avoid these problems
- Reassign each function from each package to a variable that follows certain naming conventions, e.g.
namespace..foo
in order to avoid the inefficiencies associated withnamespace::foo
(I outlined it once here). Pros: it works. Cons: it's clumsy and you double the memory used. - Simulate a namespace when developing your package. AFAIU, this is not really possible, at least I was told so back then.
- Make it mandatory to use
namespace::foo
. IMHO, that would be the best thing to do. Sure, we would lose some extend of simplicity, but then again the R universe just isn't simple anymore (at least it's not as simple as in the early 00's).
And what about (formal) classes?
Apart from the aspects described above, ::
way works quite well for functions/methods. But what about class definitions?
Take package timeDate with it's class timeDate
. Say another package comes along which also has a class timeDate
. I don't see how I could explicitly state that I would like a new instance of class timeDate
from either of the two packages.
Something like this will not work:
new(timeDate::timeDate)
new("timeDate::timeDate")
new("timeDate", ns="timeDate")
That can be a huge problem as more and more people switch to an OOP-style for their R packages, leading to lots of class definitions. If there is a way to explicitly address the namespace of a class definition, I would very much appreciate a pointer!
Conclusion
Even though this was a bit lengthy, I hope I was able to point out the core problem/question and that I can raise more awareness here.
I think devtools and mvbutils do have some approaches that might be worth spreading, but I'm sure there's more to say.
See Question&Answers more detail:os