Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am planning out some work to introduce Dependency Injection into what is currently a large monolithic library in an attempt to make the library easier to unit-test, easier to understand, and possibly more flexible as a bonus.

I have decided to use NInject, and I really like Nate's motto of 'do one thing, do it well' (paraphrased), and it seems to go particularly well within the context of DI.

What I have been wondering now, is whether I should split what is currently a single large assembly into multiple smaller assemblies with disjoint feature sets. Some of these smaller assemblies will have inter-dependencies, but far from all of them, because the architecture of the code is pretty loosely coupled already.

Note that these feature sets are not trivial and small unto themselves either... it encompasses things like client/server communications, serialisation, custom collection types, file-IO abstractions, common routine libraries, threading libraries, standard logging, etc.

I see that a previous question: What is better, many small assemblies, or one big assembly? kind-of addresses this issue, but with what seems to be even finer granularity that this, which makes me wonder if the answers there still apply in this case?

Also, in the various questions that skirt close to this topic a common answer is that having 'too many' assemblies has caused unspecified 'pain' and 'problems'. I would really like to know concretely what the possible down-sides of this approach could be.

I agree that adding 8 assemblies when before only 1 was needed is 'a bit of a pain', but having to include a big monolithic library for every application is also not exactly ideal... plus adding the 8 assemblies is something you do only once, so I have very little sympathy for that argument (even tho I would probably complain along with everyone else at first).

Addendum:
So far I have seen no convinging reasons against smaller assemblies, so I think I will proceed for now as if this is a non-issue. If anyone can think of good solid reasons with verifiable facts to back them up I would still be very interested to hear about them. (I'll add a bounty as soon as I can to increase visibility)

EDIT: Moved the performance analysis and results into a separate answer (see below).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
193 views
Welcome To Ask or Share your Answers For Others

1 Answer

Since the performance analysis has become a little lengthier than expected, I've put it into it's own separate answer. I will be accepting Peter's answer as official, even though it lacked measurements since it was most instrumental in motivating me to perform the measurements myself, and since it gave me the most inspiration for what might be worth measuring.

Analysis:
The concrete downsides mentioned so far seem to all focus on performance of one kind of another but actual quantitative data was missing, I have done some measurements of the following:

  • Time to load solution in the IDE
  • Time to compile in the IDE
  • Assembly load time (time it takes the application to load)
  • Lost code optimisations (time it takes an algorithm to run)

This analysis completely ignores the 'quality of the design', which some people have mentioned in their answers, since I do not consider the quality a variable in this trade-off. I am assuming that the developer will first and foremost let their implementation be guided by the desire to get the best possible design. The trade-off here is whether it is worthwhile aggregating functionality into larger assemblies than the design strictly calls for, for the sake of (some measure of) performance.

Application structure:
The application I built is somewhat abstract because I needed a large number of solutions and projects to test with, so I wrote some code to generate them all for me.

The application contains 1000 classes, grouped into 200 sets of 5 classes that inherit from each other. Classes are named Axxx, Bxxx, Cxxx, Dxxx and Exxx. Classes A is completely abstract, B-D are partially abstract, overriding one of the methods of A each, and E is concrete. The methods are implemented so that a call of one method on instances of E will perform multiple calls up the hierarchy chain. All method bodies are simple enough that they should theoretically all inline.

These classes were distributed across assemblies in 8 different configurations along 2 dimensions:

  • Number of assemblies: 10, 20, 50, 100
  • Cutting direction: across the inheritance hierarchy (none of A-E are ever in the same assembly together), and along the inheritance hierarchy

The measurements are not all exactly measured; some were done by stopwatch and have a larger margin of error. The measurements taken are:

  • Opening the solution in VS2008 (stopwatch)
  • Compiling the solution (stopwatch)
  • In IDE: Time between start and first executed line of code (stopwatch)
  • In IDE: Time to instantiate one of Exxx for each of the 200 groups in the IDE (in code)
  • In IDE: Time to execute 100,000 invocations on each Exxx in the IDE (in code)
  • The last three 'In IDE' measurements, but from the prompt using the 'Release' build

Results:
Opening the solution in VS2008

                               ----- in the IDE ------   ----- from prompt -----
Cut    Asm#   Open   Compile   Start   new()   Execute   Start   new()   Execute
Across   10    ~1s     ~2-3s       -   0.150    17.022       -   0.139    13.909
         20    ~1s       ~6s       -   0.152    17.753       -   0.132    13.997
         50    ~3s       15s   ~0.3s   0.153    17.119    0.2s   0.131    14.481
        100    ~6s       37s   ~0.5s   0.150    18.041    0.3s   0.132    14.478

Along    10    ~1s     ~2-3s       -   0.155    17.967       -   0.067    13.297
         20    ~1s       ~4s       -   0.145    17.318       -   0.065    13.268
         50    ~3s       12s   ~0.2s   0.146    17.888    0.2s   0.067    13.391
        100    ~6s       29s   ~0.5s   0.149    17.990    0.3s   0.067    13.415

Observations:

  • The number of assemblies (but not the cutting direction) seems to have a roughly linear impact on the time it takes to open the solution. This does not strike me as surprising.
  • At about 6 seconds, the time it takes to open the solution does not seem to me an argument to limit the number of assemblies. (I did not measure whether associating source control had a major impact on this time).
  • Compile time increases a little more than linearly in this measurement. I imagine most of this is due to the per-assembly overhead of compilation, and not inter-assembly symbol resolutions. I would expect less trivial assemblies to scale better along this axis. Even so, I personally don't find 30s of compile time an argument against splitting, especially when noting that most of the time only some assemblies will need re-compilation.
  • There appears to be a barely measurable, but noticeable increase in start-up time. The first thing the application does is output a line to the console, the 'Start' time is how long this line took to appear from start of execution (note these are estimates because it was too quick to measure accurately even in worst-case).
  • Interestingly, it appears that outside the IDE assembly loading is (very slightly) more efficient than inside the IDE. This probably has something to do with the effort of attaching the debugger, or some such.
  • Also note that re-start of the application outside the IDE reduced the start-up time a little further still in the worst-case. There may be scenarios where 0.3s for start-up is unacceptable, but I cannot imagine this will matter in many places.
  • Initialisation and execution time inside the IDE are solid regardless of the assembly split-up; this may be a case of the fact that it needs to debug causing it to have an easier time at resolving symbols across assemblies.
  • Outside the IDE, this stability continues, with one caveat... the number of assemblies does not matter for the execution, but when cutting across the inheritance hierarchy, the execution time is a fraction worse than when cutting along. Note that the difference appears too small to me to be systemic; it probably is extra time it takes the run-time once to figure out how to do the same optimisations... frankly although I could investigate this further, the differences are so small that I am not inclined to worry too much.

So, from all this it appears that the burden of more assemblies is predominantly borne by the developer, and then mostly in the form of compilation time. As I already stated, these projects were so simple that each took far less than a second to compile causing the per-assembly compilation overhead to dominate. I would imagine that sub-second assembly compilation across a large number of assemblies is a strong indication that these assemblies have been split further than is reasonable. Also, when using pre-compiled assemblies the major developer argument against splitting (compilation time) would also disappear.

In these measurements I can see very little if any evidence against splitting into smaller assemblies for the sake of run-time performance. The only thing to watch out for (to some extent) is to avoid cutting across inheritance whenever possible; I would imagine that most sane designs would limit this anyway because inheritance would typically only occur within a functional area, which would normally end up within a single assembly.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...