System R
This paper talks about the initial origins of System R, goes through
the development of prototypes, and finally evaluates and draws
conclusions. The paper begins by talking about the relational data
model proposed by Codd in 1970. Codd proposed that data should be
represented by data values and never by any connections visible to the
user, and users should not have to specify algorithms when they make a
request. This is the original SQL DBMS. The question was whether the
second item could be achieved well. The paper continues by specifying
the key goals established for System R. The phase zero prototype was
just intended to determine the feasibility of design methods and
functions. Work then began on the phase one prototype. The paper
continues by talking about the details of the phase one prototype.
The evaluation of the phase one prototype was generally good and was
considered to have been successful in its goals: simplicity, power,
and independence of data. System R demonstrated that a production
database system was feasible. There is not much to criticize. This
is the foundation and inspiration for modern database systems. While
not as evolved as today’s production systems, the evolution has been a
progressive march over decades.
Architecture of a DB System
This is actually a good follow-up to the System R paper. System R
gives the original thinking when the first relational DBMS were being
developed. This paper goes over what has been learned/formalized over
the decades: process models, parallel architectures, ACID.
Essentially, this paper is an overview of various design
considerations that have come up in the history of DBMS.
I would be curious if we are going to discuss the implications of
today’s large scale systems. Not just your CAP theorem, but the
responses to these challenges (distributed file systems,
virtualization). Hopefully in a bit more detail than your syllabus
from last year. For example, with virtualization, could we look at
and discuss various approaches being considered to manage and spawn
VMs based upon need. What implications would this have with
distributed storage systems? It is not as simple as spawning VMs of a
stateless application layer.