Specifications
    Page 120 /148 
4.1.2  The Fundamental Premise 
It is useful to recall the fundamental premise that is always made when adding 
redundancy to a system to make it more reliable. This premise consists of two 
parts: The first is that when redundant copies of a component are added, there 
are no significant common-mode failures that affect the redundant copies. The 
second is that the complexity of the control mechanism needed to resolve the 
operation of the redundant copies is small enough that it does not have a material 
negative impact on system reliability. 
The first part of the premise has important implications for hardware and 
software. For hardware, the primary implication is that physical separation and 
loose coupling of redundant components generally results in a more reliable 
system because there are fewer common-mode faults. For software, the primary 
implication is that identical components exposed to the same inputs will crash 
identically and therefore have no value in improving a system’s reliability. The 
only time redundant software components will help is either if the components are 
implemented differently, or if they are exposed to independent inputs making it 
unlikely they will crash at the same time. 
The second part of the premise implies that complex control schemes for 
coordinating redundancy are not worthwhile. In fact, unless the state space of the 
control mechanism can be fully characterized and exhaustively tested, it is likely 
that the net effect of the redundancy will be to make the system less reliable. 
4.1.3  The Juniper Approach 
The Mxxx were architected, designed, and implemented with a single overriding 
goal in mind:  to  build no-compromise routers to run the Internet backbone. From 
choice of technology, hardware components, architectural tradeoffs, technology 
partners, operating system, algorithms, management infrastructure and user 
interface, all were made with the goal of building the best possible machine given 
the state of the art. 
Simplicity, speed, high integration, and modular design form the basis for the 
reliability of a single M20 or M40 within the network. Replication of M20s and 
M40s such that primary and secondary routers do not see the same traffic is the 
basis for network-level reliability. 
4.1.4  Operator Errors 
The structure and user interface of the management software aids significantly in 
the reliable operation of the Juniper Networks routers. The system has specific 
features to minimize disruptions due to operator errors that in the past have been 
known to cause failures, and provides assistance in recovering from failures due 
to unpredictable errors.  
For example, configuration changes are made using an interactive editor that 
allows the state transition due to each change to be deferred until all changes 
have been entered. The system then checks the set of changes for correct 
semantics and either performs the changes or notifies the operator, as 
appropriate. In any event, the set of changes is performed in an all-or-nothing 
manner such that the system is never left in an inconsistent state. Operators may 
also play non-destructive "what-if" games with some of the more complex 
portions of system configuration. For example, a new routing policy can be tried 
out to determine what the operational effect will be before actually activating the 
policy. 
Finally, the system provides mechanisms to authenticate and manage change 
control and to help in problem diagnosis and  recovery when things go wrong. 
Each operator may be assigned a different set of privileges that give permission 
to perform some classes of operations but not others. For example, an operator 
tasked with interface installation may be prohibited from modifying routing 
configuration. There is a sophisticated revision control mechanisms to enable the 
operation staff to revert as well as audit problematic configuration changes. 
Operational staff can determine exactly who made a particular change, what the 
change was, and when it was activated, thereby allowing preventive measures to 
be taken to avoid recurrence. 
4.1.5  Software Errors 
Two strategies are used to avoid software errors and limit their damage when 
they do occur. The first is to partition the system into a  number of modular 










