greetings and !
I am an IT professional for last 18 years having varied experience in h/w, applications, network on various platforms. I am currently overseeing the IT operations of a leading construction company in Mumbai.
After following linux in general and open source software in particular, I believe one question would have crossed people's mind at least once:
Can there ever, single image deployable distribution, be made?
In other words, can a single distribution fit all?
In the process of following up various development in the OSS and after a study of REAL and PRACTICAL computing requirements, I found out that there is very little integrated approach to the open source computing architecture.
I have tried my best to define one purely from Indian context.
I hereby submit it for review of the linux user group.
I hope you all find it meaningful, practical and easy.
I welcome all questions, comments and criticisms.
Many of the technologies are alrady available.
I am ready to work with others to make it a reality.
Rajagopal S. Iyer
---------- begin Architecture definition ----------------- yajur aNOTHER jOINT uNDERTAKING rESULT - veda eXTENSIBLE dISTRIBUTED aRCHITECTURE
Proposed Computing Environment By Rajagopal S. Iyer, Thane, Mumbai. Phone: (9122) 547 3884 (Sr. Co-ordinator, Facilities, ISD, Lok Group) e-mail : rajsand@hindunet.com
Copyright (C) 16th February 2002, Rajagopal S. Iyer. All rights Reserved by the author.
Organisations and People who run it don't require computers: They require their familiar, private, powerful, secure computing environment anywhere, anytime.
This architecture keeps the human in the centre of the design. Care has been taken to ensure that Most of the ideas mentioned here can be implemented with off the shelf technology with realtime acceptable performance for the user.
A Vertically integrated machine with this architecture is not in existance but can be built by doing modifying various underlying technologies.
vEDA eXTENSIBLE dISTRIBUTED aRCHITECTURE (veda) is proposed for very high availability and location freedom and absolute privacy.
The proposed architecture
A Unified Machine (AUM) architechture is proposed
Overall architecture:
The layered architecture is described below:
The selection is based on ready availability of necessary hardware & Software. Many of the proposed channels are already functional and available under GNU GPL or other open source licence.
Storage Pyramid: (all serving as network data reservoir) Registers (0) Cache (1) RAM (2) - Storage for programs RAM (3) - as a Cache for devices below HDD (4) Optical R/W & R/O disks(5) Magnetic tape (7)
CPU/Kernel Pyramid (Only two states: Running and housekeeping)
It is 128-bit word architecture out of which 120 bits for data and 8 bits for ECC.
Error Checking and correction algorithm is based on a simple binary tree representation and a small recursive function which terminates when two immediate neighbours are found.
Running implies that the CPU continuously switches between four states preemptively in real-time
Run Level 0: (Kernel) (Ring 0)
IPC Media : registers
Line frequency synchroniser Premptive Task Switcher Memory Manager Device Drivers Heartbeat Generator/ Responder Network communicator local/remote real time calender (cosmological)
Run Level 1: (Two Tasks only)
IPC Media : Cache Memory
Each task will listen and process from exactly one input channels and give output to exactly two channels. those two channels could be any.
Virtual Machine 0 & 1 (Takes one input and gives two outputs - Local & network)
Input Threads: (two only) Check & Listen to Local Input Device (core dump errors) Check & Listen to Network Input Devices (core dump errors)
Process Threads (heartbeat core dumps to local storage) Take data from Input device (core,log dump after Checking) Prepare transfer of data (Presentation & core,log dump) Output Data to output devices (after Checking Tee to n/w and local storage)
Output threads Check & Talk to Local Output Devices (core dump errors) Check & Talk to Network Output Devices (core dump errors)
Run level 2 (Two Tasks Only - Graphics processing)
IPC Media : RAM
Typical tasks (high bandwidth requirement)
Input = Images / Sound Process = Image/High bandwidth signals Output task = Display Update
Run Level 3 (Two tasks only - Structured Text / Voice processing)
IPC Media : RAM
Typical Process (Voice processing, Structured Text processing)
Input = Sound Device Processing = Uses ITRANS/ other algorithms Output = Sound Device
Run level 4 (User Interface)
Typical Applications: User authentication Configuration Selection
IPC Media : HDD
Input = Composite Processing = Transaction Output = Composite
Run level 5 (long term processing - User Applications)
IPC Media : Replicated file system
Input : Messages from user in composite media Process : Composite processing using all lower run level facilities) Output : Messages to User in composite media
Primitive Data types :
Numbers:
only whole numbers (unsigned 128 bit integer)
64613997892457936451903530140000000 = 2^119 1329227995784915872903807060280000000 = 2^120 (Max)
no floating point representations no negative numbers representation in bare machine Zero represented by all zero bits Only closed infinity representation (all 1s) (affine or closure?)
Text: Plain Ascii
Graphic:
For the simplicity sake of this architecture outline, A ring of primary colour Red Green Blue and White as four points are assumed. The ratio is somewhat similar to Television standards colour ratio for white and then twisting it a bit to suit to our prime requirement of simplicity and practicality
Only Two colour spaces, RGBW and CMYB, are considered.
Red, Green, Blue & White for display devices Cyan, Magenta, Yellow, Black for Print Devices
The reason for adding white int RGB is for white temperature of 6500K.
As the largest
primitive definition (square limits are assumed as computationally it is easier on my brain):
Raster: Each Pixel has 120 bits allocated for it for each colour. bit allocation for RGBW system are: 12 for Red 75 for Green 13 for Blue 20 for White 8 bits for ECC. (To be used as Data in Device Space)
Vector: 57 bits : 19 per dimension (x,y,z) start point (524228 mm In device space) 57 bits : 19 for relative co-ordinates (x,y,z) from start point (all zero here can be treated as a point) 6 bitsRGB colour for each starting & ending points. 8 bits for ECC
Black or Non existance is all zero which is anyway not needed.
All 1s will indicate a white line from begin to end point (ECC excluded).
White or Black component not needed in pure colour space as the temperature, as percieved, could be essentially noise caused due to superimposition of the various waves from the invisible electromagnetic spectrum.
For clarity sake the data for coordinates may be represented by Pure 7-bit ASCII at a higher level as each point at the most may require 22 bytes. (7x3) + 1 for colour
(known light speed 300,000,000,000 mm/s) (To be used as Address in the device space)
( Comment on Practical Dimensions: 524 meters in X Y and Z dimensions > 1/2 KM media? -- Devices are still not available! )
Graphic Primitive Processing:
Processing is by reading the co-ordinate and doing a binary recursion as specified above for either vectorising or rasterising depending on the device.
Start at device Origin, Recurse along the X axis, at the end of device limit, switch axis to Y and then in Z axis, Return to Origin.
Graphic Computational speed & predictability: As the binary recursion algorithm has the known termination condition under all circumstances, it can be safely said that rendering for worst case of two adjecent pixels P0(0,0,0 - R1G1B1) and P1(0,0,1 - R2G2B2), only the end points needs to be taken and the maximum iterations will be 6 * 13 = 78 iterations.
In the proposed architecture it will take 78 clock cycles to render one graphic primitive primitive.
Worst case will be one bit change in each of the succesive pixel.
Proof of Efficiency of algorithms are to mathematically proved. O(n) to be obtained mathematically.
{-- Proof? who? me? No, Sorry! I am mathematically challenged :-) }
Network Layers (OSI based - Total 7)
Physical (7 total): (Criteria: Speedwise and vicinity wise decreasing order) CPU-Internal-Registers (CPU/memory Bus(0)), CPU-RAM / Display path (1), CPU-System Bus (Sound card video capture) (2), Disk bus : SCSI, fibre channel (3) Ethernet (4), IR/Wireless (5), Serial Port (6) (allows use of local phone lines for PPP)
Network: Ring 0 = Register Ring 1 = Cache Ring 2 = RAM Ring 3 = RAM Ring 4 = HDD
Transport:
Presention Translation???? Application
Scheduling algorithm.
There are three states to this machine:
0. Running 1. Preparatory 2. Wait 3. Review 4. Reorder Priority 5. Cycle complete
The time slice ratio for these recommended are 20, 75, 12, 13 (total 120) clock cycles respectively
0. Running
In this state the process is actually run
1. Preparatory
In this state the necessary resource allocations for running the process is prepared
1. Read the Process Table 2. allocate necessary resources 3. signal to run the process
2. Wait State
This is the stage the machine execute the heartbeat function which consists of
1. read any error status from the heartbeat messages of other nodes 2. prepare the status report 3. write error / log messages in the respective locations
3. Review
In this state, the messages after running of the process and the necessary steps for transmitting the messages to the next process are taken
4. Reorder states In this stage, the states of the two processes are altered (If the state of the process 0 is 0123 then 1032 and of the process 1 is 1032 change to 0123)
5. Cycle Complete state
In this state the Necessary logging of Machine state and heartbeat, Cycle No Detail Stamping is to be taken care of (Fill details here)
6. State Increment State This is the crucial state where the promotion of state is done.
7. respawn state process with incremented states will launch itself ??? (Fill in properly)
In between these state is the idle state.
The suggested scheduling is sequence (In the Real time) 0-->1-->1-->0-->0-->1-->2-->2-->1-->0-->0-->1-->2; 1-->2-->2-->1-->1-->2-->3-->3-->2-->1-->1-->2-->3. (VM0) | v 4 | v 0-->1-->1-->0-->0-->1-->2-->2-->1-->0-->0-->1-->2; 1-->2-->2-->1-->1-->2-->3-->3-->2-->1-->1-->2-->3. (VM1) | v 5 | v 6 | v 7
(This is binary recursion with the termination condition that both the neighbours of the tree are found.)
Directed graph representation of states
(Root) (0) / \ (1) (2) \ (3)
If the inversion of 0 & 1 states are not done the machine will enter a be running uncontrollably with the binary recursion algorithm that is used for task scheduling.
The head-inversion causes two breaks. The task is deemed completed only after machine enters the Idle State.
In fact this data structure and processing can be replicated in the upper rings which will result in predictable performance.
Performence factors are to be worked out mathematically.
Implementation, & Availability Issues -------------------------------------------------
the two Ring-1 tasks are two identical clustered virtual machines using MOSIX or any other similar Kernel pathces.
The clustering of one box with another is thru SCSI or System bus. (so there can be a single box high performance Cluster).
Fibre channel/100mbps n/w media is the second option.
Each machine is identical to each other in function. they communicate using Registers as network media.
The proposed machine's will a Replicated, jounalled, file system.
Where two disks are available, RAID Technology should be used.
There is only one login and two directories for each person.
The network is Purely a private network (IP address Space) with network path to Internet wherever necessary.
It is envisaged that, by carefully planning each node's IP address and their neighbours, We will never run out of Private IP Address. The only data flowing in this network (if all participating machines are according to this architecture) will be 7-bit ASCII text
The the most secure network protocol with PGP signature is suggested for base configuration.
The User Authentication / configuration management is to be managed with LDAP. This include machine specific optimised utilities for highest degree of interoperability.
All the network listen channel should act as an input device (stdin).
All the network talk channels should act as an output device (stdout).
All the network error channels should act as on error device (stderr).
Hardware:
The machine should be able to use the following as the network media: Register, Cache, System Bus, SCSI, NIC, Serial port, Parallel port, Sound (can be picked up thru System Bus), Optical & Wireless
The machine should be able to draw energy out of following sources: Electrical, optical, Solar, chemical, Sound (if possible)
The network and energy source path can be the same.
Each of the machine will have a compute node, storage node and varied network paths as described above.
Each node participating in this computer will have exactly one path for talk and listen. It should be able to Talk/listen through any of external network port to any other machine.
All the network node will have fixed addresses for each network port.
The base machine itself will have One IP address
Networking protocols supported will be TCP/IP. Optionally IPX/SPX and NCP may be supported.
Network Environment
As the Machine is always in listen mode for message and talks only when required, the network bandwith requirements are based on three parameters associated with the message processing:
1. Path 2. Quanta 3. Frequency
The aim is to select path in such a way that the product of quanta and frequency, (which gives the total message size) is transmitted through the network physical layer within user-acceptable time frame.
The most common information formats are (in decreasing order of bandwidth requierement):
1. Video 2. Sound 3. Structured Text 4.
Cosidering the common applications a correspondance table can made as follows:
Network environment supports
System environment
GNU/Linux is suggested for is immense scalability and flexibility and most importantly configurability and its case sensitivess to file names as it is the heart of ITRANS encoding scheme for system filenames.
All the networking code should be optimised into kernel.
This machine will have at the primitive instrauction which will emulate the processing as described above.
The base software which is suggested EMACS in its various incarnations as the self recursive nature of self-insert-command is what makes it the most extensible software.
CVS at a lower level will ensure automatic journalling of all changes: textual and binary
It is proposed that this machine will use ITRANS envelope for self-insert-command. This will ensure that the machine will be able to translate Voice into ITRANS encoding. As Sanskrit is a phonetic language, and ITRANS a viable and very practical representation of phonetics, A user will just have to record the basic letters of sanskrit that will be stored for voice reproduction. The messages can then be displayed and user's interpretation of the message is recorded and stored as a parameter "mother toungue" of the user.
This too can be embedded in kernel with a little bit of effort.
ITRANS processing should also be part of kernel. This will reduce the bandwidth requirement for data as only encoded text is required to produce speech. (ITRANS processing does not require backstepping). The phonetic ITRANS atom is max. 3 characters.
The Cosmological time recording system should be the base time machine for all time processing. Already Open Source libraries are available for this. (Solar, Lunar, Planetery positions, Phase of moon, galaxy coordinates etc.)
conversion routines for various current time systems should be there.
The name space (LDAP or PostgreSQL Implementation)
Transaction Engine
A Unit transaction is defined as recording of the changes related to an entity caused by an internal or external event reliably.
The transaction model is essentialy an event driven model.
A single table - Event log - is to be maintained which captures all the external and internal events
Event Master table will enumerate all the possible events at the given level of operation.
Name Master will enumerate all the possible names that can be encountered in the system. Each name will have backward and forward pointers. Backward pointers for Source of Name, Forward pointers for transformed entities. It will also have an alias property referring to its alias in the same table.
Data model for transactions
It essentially consists of two self referential entities.
This model yields itself to self learning feature of EMACS which is intended as the primary front end.
In a parametrised form, the data model can be viewed as given below
(In Hierarchical Model -- Inverted Tree)
An entity can be oen of the three types: Human or System or Physical
Corresponding Human Entities are: Person / Organisation Unit / Organisation
System Entities are (all are goals -- Desire): Functional / Organisational Unit / Organisation
Functional Entities are: Person / Organisational Role / Processes
Physical entity can be Person or Hardware or Message
An entity is located at: Physical or Document or Network
A message can be Requirement / Status / Event
Messages in the Physical forms verbal / Internal / External (Images/Document)
Requirement will arise through expression and when recognised, will go through following four states in the order specified on each event type as shown below:
(C) (M) (T) Desire ---> Want ---> Need ---> Necessity
Status state diagram will be the following
(C) (M) (T) Initialisation ---> Running ---> Result ---> Logging
Event State Diagram will be:
Creation ---> Modification ---> Transformation ---> Wait ^ | | | +-------------------------------------------------+
A matrix of the above could be formed to obtain master table and transaction tables.
It should be an orthogonal matrix to arrive at precise relations.
at the lowest level,each Name has three properties:
1. Space Value 2. Money Value 3. Time Value
Event Model
A message triggers one or more or a combination thereof of the three events:
1. Creation 2. Modification 3. Transformation (into another name)
The events Themselves are to be logged seperately in the audit trial table or event log table.
These events causes insertion into the Transaction table with a time/location stamp.
Any modification of any entity will cause also cause Transaction insertion.
Transformation causes the change of name and an entry in the transaction table. After transformation the original entity ceases to exist for further transactions.
Reports:
Any kind of Report can be generated using a universal cross tabular reporting function from a normalised table of transaction. EMACS can be tweaked to be a report writer after EMACS isearch is modified with suitable atoms. (minimal effort..Lazy way out)
In fact all the data may be stored in pure text as EMACS is Extremely good at handling pure text.
Development tools:
A suitable Development tool to define the Data model can be suitably designed for easy access of Data
+++++++++++++++++++++++ REVIEW THOROUGHLY the following +++++++++++++++ (insert, update and delete thoughts only here)
All the entities will have the followiong common attributes
1. Name 2. Location 3. Location timestamp 4. Type (Person) 5. Value Timestamp 6. Value (in Number) 7. Creation timestamp 8. Last Access begin timestamp 9. Last Access end timestamp 10. Last Modify begin timestamp 11. Last Modify end timestamp 12. Transformed to (target entity) 13. Transformation timestamp
In the above scheme every entity has three phases:
1. Comes into existance at a given location at a given point of time with a given value at that point of time.
2. Its Value, location and other attributes may get modified over time which are recorded in the database
3. It finally gets transformed into another Name (after this go to step 1)
A message is recorded along with every event that happens.
A database of the following are to be available always:
1. Name of the entity 2. Value of the entity 3. Space of the entity (Add remaining database details here)
A transaction engine which emulates the kernel in processing transaction.
Transaction Engine will track Name, Message, Space, Value, state simultaneously and time / location stamp them.
Each Transaction has four States: Initiation, Running, Sustanance and Perpetuality
Each Message/Value transfer has Four State: Dire Needs, Actual, Key (for what???), Maintenence (of What?-- specify).
Every transaction has four states at which money/material exchange occurs: 1. Initation 2. Actual 3. Result 4. Maintenance
(Put a good deal of explanation text in here)
(
a good begining would be a template in c++ taking a token of any data type as the argument, allocate memory and return a pointer to the that token. well I can't think further .... :-)
int CreateData(<RetType> a) { <RetType> (typeof(a) malloc(NULL, a, sizeof(a)) }
...... or whatever... hope u get the point rather thatn syntax
)
+++++++++++++++++++++++ END REVIEW THOROUGHLY ++++++++++++++++++++++++
Hard copy printing Environment.
It is envisaged that Four primary drivers must be made available:
0. Pure Text 1. Epson 9 and 24 Pin emulation 2. HP PCL Emulation 3. Postcript
Language support Environment
It is necessay that a minimum of following character sets
0. Plain ASCII character Set (7-bit) 1. Unicode 2. Devnagari (San98 TTF or Xdvng with suitably modified glyphs for ITRANS encoding)
With glyphs in 0. GUI Mode 1. Character mode
in either Hardware / Software form is made availble
User environment ----------------
Each user will have exactly one login and one root directory.
The owner of the directory decides the physical location of data
The system will have on-demand appearence of file on invocation of a command (either timed or user decided)
It is recommended that the user have two top level storage domains: Work (Organisation controlled access) Personal or Pleasure (User Controlled access)
Rest all can be Organised below these two domains with relevent symlinks
Application visibility is based on the User profile in the authentication System.
This authentication System can then be suitably integrated with certification authority for B2B and other Internet based secure application requirements. A private port can be leased.
The Machine fingerprint is stored in the LDAP. Based on the fingerprint the most suitable image from any of the nearby machine will be loaded.
Training Isssues
Primary user training will be on Touch Typing (gtypist) and EMACS in character mode.
Only incremental training will then be required
(Except for exceptional cases pointer device may not be required for most of the usual user activities under this architecture as mouse is deemed as a time waster / disruptive technology. Hence GUI Training takes a backseat.)
Application training will be specific depending upon user profile.
The user can use Character or Graphical Interface depending upon the available hardware at that point of time and machine
The typical applications the Base users will need are:
Document Processor (LaTeX) Text Processor (EMACS) Spreadsheet (SC or any better as required) Internet Browser (W3) E-mail (rmail of Emacs/mutt/pine)
Printing to any printer from above application
All the above application will be in character mode
For slightly advanced users: All of the above in GUI and Character Mode. Bitmap Image Processor Vector Image Processor Presentation package Sound Recorder / Editor / Listner Video Recorder / Player
For Developers: All the above and CVS Access All the necessary development tools
For system end: LDAP (Authentication) PostreSQL or SAPdb (or similar RDBMS) for Data Storage (optional) A Transaction Engine A Time Engine A Logger Engine A Location Engine (with Lat/long/height & distance calculator)
---------- End Architecture definition -----------------
________________________________________________________________ NIIT supports World Computer Literacy Day on 2nd December. Enroll for NIIT SWIFT Jyoti till 2nd December for only Rs. 749 and get free Indian Languages Office software worth Rs. 2500. For details contact your nearest NIIT centre, SWIFT Point or click here http://swift.rediff.com/