A humble suggestion for robust system - Linuxers

6 Dec 2002


      greetings and !
I am an IT professional for last 18 years having varied experience 
in h/w, applications, network on various platforms. I am currently 
overseeing the IT operations of a leading construction company in 
Mumbai.
After following linux in general and open source software in 
particular, I believe one question would have crossed people's 
mind at least once:
Can there ever, single image deployable distribution, be made?
In other words, can a single distribution fit all?
In the process of following up various development in the OSS and 
after a study of REAL and PRACTICAL computing requirements, I 
found out that there is very little integrated approach to the 
open source computing architecture.
I have tried my best to define one purely from Indian context.
I hereby submit it for review of the linux user group.
I hope you all find it meaningful, practical and easy.
I welcome all questions, comments and criticisms.
Many of the technologies are alrady available.
I am ready to work with others to make it a reality.
Rajagopal S. Iyer
---------- begin Architecture definition -----------------
yajur aNOTHER jOINT uNDERTAKING rESULT - veda eXTENSIBLE 
dISTRIBUTED
aRCHITECTURE
Proposed Computing Environment
By Rajagopal S. Iyer, Thane, Mumbai. Phone: (9122) 547 3884
(Sr. Co-ordinator, Facilities, ISD, Lok Group)
e-mail :  rajsand@hindunet.com
Copyright (C) 16th February 2002, Rajagopal S. Iyer. All rights 
Reserved by
the author.
Organisations and People who run it don't require computers:
        They require their familiar, private,
         powerful, secure computing environment
         anywhere, anytime.
This architecture keeps the human in the centre of the design. 
Care has
been taken to ensure that Most of the ideas mentioned here can 
be
implemented with off the shelf technology with realtime 
acceptable
performance for the user.
A Vertically integrated machine with this architecture is not in 
existance
but can be built by doing modifying various underlying 
technologies.
vEDA eXTENSIBLE dISTRIBUTED aRCHITECTURE (veda) is proposed for 
very high
availability and location freedom and absolute privacy.
The proposed architecture
A Unified Machine (AUM) architechture is proposed
Overall architecture:
The layered architecture is described below:
The selection is based on ready availability of necessary hardware 
&
Software. Many of the proposed channels are already functional 
and
available under GNU GPL or other open source licence.
Storage Pyramid: (all serving as network data reservoir)
   Registers (0)
   Cache (1)
   RAM (2) - Storage for programs
   RAM (3) - as a Cache for devices below
   HDD (4)
   Optical R/W & R/O disks(5)
   Magnetic tape (7)
CPU/Kernel Pyramid (Only two states: Running and housekeeping)
It is 128-bit word architecture out of which 120 bits for data 
and 8 bits
  for ECC.
Error Checking and correction algorithm is based on a simple 
binary tree
representation and a small recursive function which terminates 
when two
immediate neighbours are found.
Running implies that the CPU continuously switches between four 
states
preemptively in real-time
Run Level 0: (Kernel) (Ring 0)
IPC Media : registers
Line frequency synchroniser
    Premptive Task Switcher
    Memory Manager
    Device Drivers
    Heartbeat Generator/ Responder
    Network communicator local/remote
    real time calender (cosmological)
Run Level 1: (Two Tasks only)
IPC Media : Cache Memory
Each task will listen and process from exactly one input channels 
and give
output to exactly two channels. those two channels could be any.
Virtual Machine 0 & 1 (Takes one input and gives two outputs - 
Local & network)
Input Threads: (two only)
         Check & Listen to Local Input Device (core dump errors)
         Check & Listen to Network Input Devices (core dump 
errors)
Process Threads (heartbeat core dumps to local storage)
         Take data from Input device (core,log dump after 
Checking)
         Prepare transfer of data (Presentation & core,log dump)
         Output Data to output devices (after Checking Tee to n/w 
and local  storage)
Output threads
         Check & Talk to Local Output Devices (core dump errors)
         Check & Talk to Network Output Devices (core dump 
errors)
Run level 2 (Two Tasks Only - Graphics processing)
IPC Media : RAM
Typical tasks (high bandwidth requirement)
Input       = Images / Sound
     Process     = Image/High bandwidth signals
     Output task = Display Update
Run Level 3 (Two tasks only - Structured Text / Voice 
processing)
IPC Media : RAM
Typical Process (Voice processing, Structured Text 
processing)
Input       = Sound Device
     Processing  = Uses ITRANS/ other algorithms
     Output      = Sound Device
Run level 4 (User Interface)
Typical Applications: User authentication
                       Configuration Selection
IPC Media   : HDD
Input       = Composite
     Processing  = Transaction
     Output      = Composite
Run level 5 (long term processing - User Applications)
IPC Media : Replicated file system
Input   : Messages from user in composite media
     Process : Composite processing using all lower run level 
facilities)
     Output  : Messages to User in composite media
Primitive Data types :
Numbers:
only whole numbers (unsigned 128 bit integer)
64613997892457936451903530140000000 = 2^119
  1329227995784915872903807060280000000 = 2^120 (Max)
no floating point representations
  no negative numbers representation in bare machine
  Zero represented by all zero bits
  Only closed infinity representation (all 1s) (affine or 
closure?)
Text:
   Plain Ascii
Graphic:
For the simplicity sake of this architecture outline, A ring of 
primary
colour Red Green Blue and White as four points are assumed. The 
ratio is
somewhat similar to Television standards colour ratio for white 
and then
twisting it a bit to suit to our prime requirement of simplicity 
and
practicality
Only Two colour spaces, RGBW and CMYB, are considered.
Red, Green, Blue & White for display devices
Cyan, Magenta, Yellow, Black for Print Devices
The reason for adding white int RGB is for white temperature of 
6500K.
As the largest
primitive definition (square limits are assumed as computationally 
it is
easier on my brain):
Raster: Each Pixel has 120 bits allocated for it for each 
colour.
                  bit allocation for RGBW system are:
                    12 for Red
                    75 for Green
                    13 for Blue
                    20 for White
                     8 bits for ECC.
            (To be used as Data in Device Space)
Vector: 57 bits : 19 per dimension (x,y,z) start point (524228 
mm In device space)
            57 bits : 19 for relative co-ordinates (x,y,z) from 
start point
                      (all zero here can be treated as a point)
            6 bitsRGB colour for each starting & ending points.
            8 bits for ECC
Black or Non existance is all zero which is anyway not 
needed.
All 1s will indicate a white line from begin to end 
point (ECC excluded).
White or Black component not needed in pure colour 
space as the
            temperature, as percieved, could be essentially noise 
caused
            due to superimposition of the various waves from the 
invisible
            electromagnetic spectrum.
For clarity sake the data for coordinates may be 
represented by
            Pure 7-bit ASCII at a higher level as each point at 
the most
            may require 22 bytes. (7x3) + 1 for colour
(known light speed 300,000,000,000 mm/s)
            (To be used as Address in the device space)
( Comment on Practical Dimensions:
524 meters in X Y and Z dimensions
    > 1/2 KM media?  --
    Devices are still not available! )
Graphic Primitive Processing:
Processing is by reading the co-ordinate and doing a binary 
recursion
as specified above for either vectorising or rasterising depending 
on the
device.
Start at device Origin, Recurse along the X axis, at the end of 
device
limit, switch axis to Y and then in Z axis, Return to Origin.
Graphic Computational speed & predictability: As the binary 
recursion
algorithm has the known termination condition under all 
circumstances, it
can be safely said that rendering for worst case of two adjecent 
pixels
P0(0,0,0 - R1G1B1) and P1(0,0,1 - R2G2B2), only the end points 
needs to be
taken and the maximum iterations will be 6 * 13 = 78 iterations.
In the proposed architecture it will take 78 clock cycles to 
render one
graphic primitive primitive.
Worst case will be one bit change in each of the succesive 
pixel.
Proof of Efficiency of algorithms are to mathematically proved. 
O(n) to be
obtained mathematically.
{-- Proof? who? me? No, Sorry! I am mathematically challenged :-) 
}
Network Layers (OSI based - Total 7)
Physical (7 total):
  (Criteria: Speedwise and vicinity wise decreasing order)
    CPU-Internal-Registers (CPU/memory Bus(0)),
    CPU-RAM / Display path (1),
    CPU-System Bus (Sound card video capture) (2),
    Disk bus : SCSI, fibre channel (3)
    Ethernet (4),
    IR/Wireless (5),
    Serial Port (6) (allows use of local phone lines for PPP)
Network:
  Ring 0 = Register
  Ring 1 = Cache
  Ring 2 = RAM
  Ring 3 = RAM
  Ring 4 = HDD
Transport:
Presention
  Translation????
  Application
Scheduling algorithm.
There are three states to this machine:
0. Running
  1. Preparatory
  2. Wait
  3. Review
  4. Reorder Priority
  5. Cycle complete
The time slice ratio for these recommended are 20, 75, 12, 13 
(total 120)
clock cycles respectively
0. Running
In this state the process is actually run
1. Preparatory
In this state the necessary resource allocations for running 
the process is prepared
1. Read the Process Table
   2. allocate necessary resources
   3. signal to run the process
2. Wait State
This is the stage the machine execute the heartbeat function 
which consists of
1. read any error status from the heartbeat messages of other 
nodes
   2. prepare the status report
   3. write error / log messages in the respective locations
3. Review
In this state, the messages after running of the process and 
the necessary
     steps for transmitting the messages to the next process are 
taken
4. Reorder states
  In this stage, the states of the two processes are altered
      (If the state of the process 0 is 0123 then 1032
           and of the process 1 is 1032 change to 0123)
5. Cycle Complete state
In this state the Necessary logging of Machine state and 
heartbeat, Cycle
No Detail Stamping is to be taken care of (Fill details here)
6. State Increment State
   This is the crucial state where the promotion of state is 
done.
7. respawn state
    process with incremented states will launch itself ??? (Fill 
in properly)
In between these state is the idle state.
The suggested scheduling is sequence (In the Real time)
  0-->1-->1-->0-->0-->1-->2-->2-->1-->0-->0-->1-->2;
  1-->2-->2-->1-->1-->2-->3-->3-->2-->1-->1-->2-->3. (VM0)
   |
   v
  4
   |
   v
  0-->1-->1-->0-->0-->1-->2-->2-->1-->0-->0-->1-->2;
  1-->2-->2-->1-->1-->2-->3-->3-->2-->1-->1-->2-->3. (VM1)
   |
   v
  5
   |
   v
  6
   |
   v
  7
(This is binary recursion with the termination condition that both 
the
neighbours of the tree are found.)
Directed graph representation of states
(Root)
           (0)
          /   \
        (1)   (2)
                \
                (3)
If the inversion of 0 & 1 states are not done the machine will 
enter a be
running uncontrollably with the binary recursion algorithm that is 
used for
task scheduling.
The head-inversion causes two breaks. The task is deemed completed 
only
after machine enters the Idle State.
In fact this data structure and processing can be replicated in 
the upper
rings which will result in predictable performance.
Performence factors are to be worked out mathematically.
Implementation, & Availability Issues
  -------------------------------------------------
the two Ring-1 tasks are two identical clustered virtual machines 
using
MOSIX or any other similar Kernel pathces.
The clustering of one box with another is thru SCSI or System bus. 
(so
there can be a single box high performance Cluster).
Fibre channel/100mbps n/w media is the second option.
Each machine is identical to each other in function. they 
communicate using
Registers as network media.
The proposed machine's will a Replicated, jounalled, file 
system.
Where two disks are available, RAID Technology should be used.
There is only one login and two directories for each person.
The network is Purely a private network (IP address Space) with 
network
path to Internet wherever necessary.
It is envisaged that, by carefully planning each node's IP address 
and
their neighbours, We will never run out of Private IP Address. The 
only
data flowing in this network (if all participating machines are 
according
to this architecture) will be 7-bit ASCII text
The the most secure network protocol with PGP signature is 
suggested for
base configuration.
The User Authentication / configuration management is to be 
managed with
LDAP. This include machine specific optimised utilities for 
highest degree
of interoperability.
All the network listen channel should act as an input device 
(stdin).
All the network talk channels should act as an output device 
(stdout).
All the network error channels should act as  on error device 
(stderr).
Hardware:
The machine should be able to use the following as the network 
media:
  Register, Cache, System Bus, SCSI, NIC, Serial port, Parallel 
port,
  Sound (can be picked up thru System Bus), Optical & Wireless
The machine should be able to draw energy out of following 
sources:
  Electrical, optical, Solar, chemical, Sound (if possible)
The network and energy source path can be the same.
Each of the machine will have a compute node, storage node and 
varied
network paths as described above.
Each node participating in this computer will have exactly one 
path for
talk and listen. It should be able to Talk/listen through any of 
external
network port to any other machine.
All the network node will have fixed addresses for each network 
port.
The base machine itself will have One IP address
Networking protocols supported will be TCP/IP. Optionally IPX/SPX 
and NCP
may be supported.
Network Environment
As the Machine is always in listen mode for message and talks only 
when
required, the network bandwith requirements are based on three 
parameters
associated with the message processing:
1. Path
  2. Quanta
  3. Frequency
The aim is to select path in such a way that the product of quanta 
and
frequency, (which gives the total message size) is transmitted 
through the
network physical layer within user-acceptable time frame.
The most common information formats are (in decreasing order of 
bandwidth
requierement):
1. Video
2. Sound
3. Structured Text
4.
Cosidering the common applications a correspondance table can made 
as
follows:
Network environment supports
System environment
GNU/Linux is suggested for is immense scalability and flexibility 
and most
importantly configurability and its case sensitivess to file names 
as it is
the heart of ITRANS encoding scheme for system filenames.
All the networking code should be optimised into kernel.
This machine will have at the primitive instrauction which will 
emulate the
processing as described above.
The base software which is suggested EMACS in its various 
incarnations as
the self recursive nature of self-insert-command is what makes it 
the most
extensible software.
CVS at a lower level will ensure automatic journalling of all 
changes: textual and binary
It is proposed that this machine will use ITRANS envelope for
self-insert-command. This will ensure that the machine will be 
able to
translate Voice into ITRANS encoding. As Sanskrit is a phonetic 
language,
and ITRANS a viable and very practical representation of 
phonetics, A user
will just have to record the basic letters of sanskrit that will 
be stored
for voice reproduction. The messages can then be displayed and 
user's
interpretation of the message is recorded and stored as a 
parameter "mother
toungue" of the user.
This too can be embedded in kernel with a little bit of 
effort.
ITRANS processing should also be part of kernel. This will reduce 
the
bandwidth requirement for data as only encoded text is required to 
produce
speech. (ITRANS processing does not require backstepping). The 
phonetic
ITRANS atom is max. 3 characters.
The Cosmological time recording system should be the base time 
machine for
all time processing. Already Open Source libraries are available 
for this.
(Solar, Lunar, Planetery positions, Phase of moon, galaxy 
coordinates etc.)
conversion routines for various current time systems should be 
there.
The name space (LDAP or PostgreSQL Implementation)
Transaction Engine
A Unit transaction is defined as recording of the changes related 
to an
entity caused by an internal or external event reliably.
The transaction model is essentialy an event driven model.
A single table - Event log - is to be maintained which captures 
all the
external and internal events
Event Master table will enumerate all the possible events at the 
given
level of operation.
Name Master will enumerate all the possible names that can be 
encountered
in the system. Each name will have backward and forward pointers. 
Backward
pointers for Source of Name, Forward pointers for transformed 
entities. It
will also have an alias property referring to its alias in the 
same table.
Data model for transactions
It essentially consists of two self referential entities.
This model yields itself to self learning feature of EMACS which 
is
intended as the primary front end.
In a parametrised form, the data model can be viewed as given 
below
(In Hierarchical Model -- Inverted Tree)
An entity can be oen of the three types:
      Human or System or Physical
Corresponding Human Entities are:
      Person / Organisation Unit / Organisation
System Entities are (all are goals -- Desire):
      Functional /  Organisational Unit  / Organisation
Functional Entities are:
      Person / Organisational Role / Processes
Physical entity can be  Person or Hardware or Message
An entity is located at:
       Physical or Document or Network
A message can be
     Requirement / Status / Event
Messages in the Physical forms
     verbal / Internal / External (Images/Document)
Requirement will arise through expression and when recognised, 
will go
through following four states in the order specified on each event 
type as
shown below:
(C)       (M)       (T)
     Desire ---> Want ---> Need ---> Necessity
Status state diagram will be the following
(C)               (M)         (T)
     Initialisation ---> Running ---> Result ---> Logging
Event State Diagram will be:
Creation ---> Modification ---> Transformation ---> Wait
        ^                                                 |
        |                                                 |
        +-------------------------------------------------+
A matrix of the above could be formed to obtain master table and
transaction tables.
It should be an orthogonal matrix to arrive at precise 
relations.
at the lowest level,each Name has three properties:
1. Space Value
2. Money Value
3. Time Value
Event Model
A message triggers one or more or a combination thereof of the 
three
events:
1. Creation
2. Modification
3. Transformation (into another name)
The events Themselves are to be logged seperately in the audit 
trial table
or event log table.
These events causes insertion into the Transaction table with a 
time/location stamp.
Any modification of any entity will cause also cause Transaction 
insertion.
Transformation causes the change of name and an entry in the 
transaction
table. After transformation the original entity ceases to exist 
for further
transactions.
Reports:
Any kind of Report can be generated using a universal cross 
tabular
reporting function from a normalised table of transaction. EMACS 
can be
tweaked to be a report writer after EMACS isearch is modified with 
suitable
atoms. (minimal effort..Lazy way out)
In fact all the data may be stored in pure text as EMACS is 
Extremely good
at handling pure text.
Development tools:
A suitable Development tool to define the Data model can be 
suitably
designed for easy access of Data
+++++++++++++++++++++++  REVIEW THOROUGHLY the following 
+++++++++++++++
(insert, update and delete thoughts only here)
All the entities will have the followiong common attributes
1. Name
  2. Location
  3. Location timestamp
  4. Type (Person)
  5. Value Timestamp
  6. Value (in Number)
  7. Creation timestamp
  8. Last Access begin timestamp
  9. Last Access end timestamp
10. Last Modify begin timestamp
11. Last Modify end timestamp
12. Transformed to (target entity)
13. Transformation timestamp
In the above scheme every entity has three phases:
1. Comes into existance at a given location at a given point of 
time with a
    given value at that point of time.
2. Its Value, location and other attributes may get modified over 
time
    which are recorded in the database
3. It finally gets transformed into another Name (after this go to 
step 1)
A message is recorded along with every event that happens.
A database of the following are to be available always:
1. Name of the entity
   2. Value of the entity
   3. Space of the entity
   (Add remaining database details here)
A transaction engine which emulates the kernel in processing 
transaction.
Transaction Engine will track Name, Message, Space, Value, state
simultaneously and time / location stamp them.
Each Transaction has four States:  Initiation, Running, Sustanance 
and Perpetuality
Each Message/Value transfer has Four State: Dire Needs, Actual, 
Key (for
what???), Maintenence (of What?-- specify).
Every transaction has four states at which money/material 
exchange occurs:
     1. Initation
     2. Actual
     3. Result
     4. Maintenance
(Put a good deal of explanation text in here)
(
a good begining would be a template in c++ taking a token of any 
data type
as the argument, allocate memory and return a pointer to the that 
token.
well I can't think further .... :-)
int CreateData(<RetType> a)   {
      <RetType> (typeof(a) malloc(NULL, a, sizeof(a))
   }
...... or whatever... hope u get the point rather thatn syntax
)
+++++++++++++++++++++++  END REVIEW THOROUGHLY 
++++++++++++++++++++++++
Hard copy printing Environment.
It is envisaged that Four primary drivers must be made 
available:
0. Pure Text
    1. Epson 9 and 24 Pin emulation
    2. HP PCL Emulation
    3. Postcript
Language support Environment
It is necessay that a minimum of following character sets
0. Plain ASCII character Set (7-bit)
    1. Unicode
    2. Devnagari
       (San98 TTF or Xdvng with suitably modified glyphs for 
ITRANS encoding)
With glyphs in
    0. GUI Mode
    1. Character mode
in either Hardware / Software form is made availble
User environment
  ----------------
Each user will have exactly one login and one root directory.
The owner of the directory decides the physical location of 
data
The system will have on-demand appearence of file on invocation of 
a
command (either timed or user decided)
It is recommended that the user have two top level storage 
domains:
     Work (Organisation controlled access)
     Personal or Pleasure (User Controlled access)
Rest all can be Organised below these two domains with relevent 
symlinks
Application visibility is based on the User profile in the 
authentication
System.
This authentication System can then be suitably integrated with
certification authority for B2B and other Internet based secure 
application
requirements. A private port can be leased.
The Machine fingerprint is stored in the LDAP. Based on the 
fingerprint the
most suitable image from any of the nearby machine will be 
loaded.
Training Isssues
Primary user training will be on Touch Typing (gtypist) and 
EMACS in character mode.
Only incremental training will then be required
(Except for exceptional cases pointer device may not be required 
for most
of the usual user activities under this architecture as mouse is 
deemed as
a time waster / disruptive technology. Hence GUI Training takes a 
backseat.)
Application training will be specific depending upon user 
profile.
The user can use Character or Graphical Interface depending upon 
the
available hardware at that point of time and machine
The typical applications the Base users will need are:
Document Processor (LaTeX)
     Text Processor (EMACS)
     Spreadsheet (SC or any better as required)
     Internet Browser (W3)
     E-mail (rmail of Emacs/mutt/pine)
Printing to any printer from above application
All the above application will be in character mode
For slightly advanced users:
     All of the above in GUI and Character Mode.
     Bitmap Image Processor
     Vector Image Processor
     Presentation package
     Sound Recorder / Editor / Listner
     Video Recorder / Player
For Developers:
     All the above and CVS Access
     All the necessary development tools
For system end:
      LDAP (Authentication)
      PostreSQL or SAPdb (or similar RDBMS) for Data Storage 
(optional)
      A Transaction Engine
      A Time Engine
      A Logger Engine
      A Location Engine (with Lat/long/height & distance 
calculator)
---------- End Architecture definition -----------------
________________________________________________________________
  NIIT supports World Computer Literacy Day on 2nd December.
  Enroll for NIIT SWIFT Jyoti till 2nd December for only Rs. 749
  and get free Indian Languages Office software worth Rs. 2500.
  For details contact your nearest NIIT centre, SWIFT Point
  or click here http://swift.rediff.com/