Introduction to Database

 

What is Data?

Data is a collection of facts, such as values or measurements. It can be numbers, words, measurements, observations or even just descriptions of things.

There are two type of data.

Qualitative and Quantitative

Qualitative data is descriptive information (it describes something)

Quantitative data, is numerical information (numbers).

Qualitative:

He is brown and black

He has long hair

He has lots of energy

Quantitative:

Discrete:

He has 4 legs

He has 2 brothers

What Is Database?

A Database is a collection of information that is organized so that it can easily be accessed, managed, and updated.

Fields, records, and files organize traditional databases. A field is a single piece of information; a record is one complete set of fields; and a file is a collection of records. For example, a telephone book is analogous to a file. It contains a list of records, each of which consists of three fields: name, address, and telephone number.

Database is collection of data that is used or shared by the multiple user for variety of task ?.

The modern definition of database is  ?Database is a shared and integrated collection of interrelated data. The main objective of database is organized the data in such way so as it minimize the redundancy (if similar data is stored on different places then it is called redundancy) and maximize the access.

Database has two properties.

It is integrated

It is shared

Components of Database Environment

The major components of database environment are:

1)            User Group

2)            Database Management System

3)            Application Programmer

4)            Database Administrator

5)            Data Dictionary

User Group:

This is also called end user. End user are actually the most important people involved with a data base management system. There are three basic categories of end user requests:

Read Only

Add/Delete

Modify.

All user requests for data are made through DBMS.

Database Management System:

A DBMS is a collection of pre-written integrated program. Its major function is the interfacing by physical database and user of the system. The term DBMS is a software or set of program allowing the user to access database in a logically way.

We also say that

To access information from a database, you need a database management system (DBMS). This is a collection of programs that enables you to enter, organize, and select data in a database?.

Data Dictionary:

In database management systems, a file that defines the basic organization of a database. A data dictionary contains a list of all files in the database, the number of records in each file, and the names and types of each field. Most database management systems keep the data dictionary hidden from users to prevent them from accidentally destroying its contents.

Data dictionaries do not contain any actual data from the database, only bookkeeping information for managing it. Without a data dictionary, however, a database management system cannot access data from the database.

Application Programmer:

Application programmer is responsible to develop the application program for the organization, which achieves the various need of an organization.

Database Administrator(DBA):

DBA is a person who is managing information of organization. The DBA is responsible for planning of the organization. He also must have the knowledge if the direction of data processing.

Responsibility of Database Administrator (DBA)

The DBA is a person or group of person responsible for over all control of the database or managing the information of the organization.

A database administrator (DBA) is responsible for the performance, integrity and security of a database. Additional role requirements are likely to include planning, development and troubleshooting.

There are two approaches to wards data management

1)    Traditional approach / classical approach

2)     Database approach

1.Traditional approach:

This approach is early days approach of data processing. Same data is stored at different places to meet the needs of particular objective. In this approach each program has its own set of data files for the processing of the data. If we need of new application program then existing files would have to be restricted.

2. Database approach:

This approach refers to database management system. As in DBMS approach we reduced the redundancy. Its means there is no duplication of records.

In database approach multiple user can access the file available in database. The data, which is stored in database, is independent of the application program. In database system multiple user can access and store the information in central location. 

Disadvantage of Traditional Approach

The traditional approach has following disadvantages.

1. Uncontrolled Redundancy:

In this system every application program have its own data file. So data redundancy occur means same data is stored in different places.

2. Inconsistent data:

In this approach same data is stored in different places so if we want to change any record in one file then we have to change all he files other wise information will be mismatch.

3. Inflexibility:

In this system each application program generates different outputs however such system is often quite flexible and not easy to change the output of result.

4.Limited Data Sharing:

In this approach each application has its own private file and limited sharing. Other application cannot use these files.

5.Low Program Productivity:

In this system programmers must design each record and files used for an application and will also select the file access method and write the procedure of input output statement. This will consume a lot of time to develop data files and will cause increase the cost of software.

6.Excessive Program maintenance:

In this system description of files, record and data item is embedded with individual program. There for any modification in data file such as data name format or method of access requires the program should be modified.

Advantages of Data Base Approach

The following are the advantages of the database approach over the classical / traditional system.

1)    Reduction of Redundancy.

2)    Consistency of data.

3)    Integration of data.

4)    Data sharing.

5)    Reduction program maintenance.

6)    Data Security

7)    Resource utilization.

8)    Data independence.

1.Reduction of redundancy:

As in database approach each data item is recorded in only one place. So in this case redundancy will be minimum. But we cannot say that in database approach redundancy is completely eliminated because multiple copies of same data are required to store to separately. However it can be controlled.

2.Consistency of data:

As in the database approach the data is stored centralized. Therefore the entire user can access or use the same data.

3.Integration of data:

The database approach also provides data integrity, which means that data is stored also in single logical structure. Therefore, the logical relationship mean user easily access and relate one data item to another.

4.Data Sharing:

As in database approach all the data is stored centralized. Then it is not the property of single user. Hence each user can access data according to there?s own view. This required data can be accessed or shared by means of query language.

5.Reduction program Maintenance:

In DBMS system database is in depended of the application program so if we change the database then there is no changing require in application program. And all the input and output function, files type and data field s already defined by the DBMMS so less time requires developing the application program.

6.Data Security:

In this system database administrator is responsible for database security. When aver sensitive data can be accessed then the database approach completely provides the authority over the database administration.

Function of Database

There are several functions that a DBMS performs to ensure data integrity and consistency of data in the database. The ten functions in the DBMS are:

Data Dictionary Management

Data Dictionary is where the DBMS stores definitions of the data elements and their relationships (metadata).  The DBMS uses this function to look up the required data component structures and relationships. When programs access data in a database they are basically going through the DBMS.

Data Storage Management

This particular function is used for the storage of data and any related data entry forms or screen definitions, report definitions, data validation rules, procedural code, and structures that can handle video and picture formats. Users do not need to know how data is stored or manipulated

Data Transformation and Presentation

This function exists to transform any data entered into required data structures. By using the data transformation and presentation function the DBMS can determine the difference between logical and physical data formats.

Security Management

This is one of the most important functions in the DBMS. Security management sets rules that determine specific users that are allowed to access the database. Users are given a username and password or sometimes through biometric authentication (such as a fingerprint or retina scan) but these types of authentication tend to be more costly.

Multi-user Access Control

Data integrity and data consistency are the basis of this function. Multi-user access control is a very useful tool in a DBMS, it enables multiple users to access the database simultaneously without affecting the integrity of the database.

Backup and Recovery Management

Backup and recovery is brought to mind whenever there is potential outside threats to a database. For example if there is a power outage, recovery management is how long it takes to recover the database after the outage. Backup management refers to the data safety and integrity; for example backing up all your mp3 files on a disk.

Data Integrity Management

The DBMS enforces these rules to reduce things such as data redundancy, which is when data is stored in more than one place unnecessarily, and maximizing data consistency, making sure database is returning correct/same answer each time for same question asked.

Database Access Languages and Application Programming Interfaces

 A query language is a nonprocedural language. An example of this is SQL (structured query language). SQL is the most common query language supported by the majority of DBMS vendors. The use of this language makes it easy for user to specify what they want done without the headache of explaining how to specifically do it.

Database Communication Interfaces

This refers to how a DBMS can accept different end user requests through different network environments. An example of this can be easily related to the internet.  A DBMS can provide access to the database using the Internet through Web Browsers (Mozilla Firefox, Internet Explorer, Netscape).

Transaction Management

This refers to how a DBMS must supply a method that will guarantee that all the updates in a given transaction are made or not made. All transactions must follow what is called the ACID properties.

What is Schema?

In a relational database, the schema defines the tables, the fields in each table, and the relationships between fields and tables.

Schemas are generally stored in a data dictionary. Although a schema is defined in text database language, the term is often used to refer to a graphical depiction of the database structure.


 

Types Of Database

1)      Distributed Database:

A database that consists of two or more data files located at different sites on a computer network. Because the database is distributed, different users can access it without interfering with one another. However, the DBMS must periodically synchronize the scattered databases to make sure that they all have consistent data.

To ensure that the distributive databases are up to date and current, there are two processes: replication and duplication. Replication involves using specialized software that involves looking for changes in the distributive database. Once the changes have been identified, the replication process makes all the databases look the same. The replication process can be very complex and time-consuming depending on the size and number of the distributive databases. This process can also require a lot of time and computer resources. Duplication on the other hand is not as complicated. It basically identifies one database as a master and then duplicates that database. The duplication process is normally done at a set time after hours. This is to ensure that each distributed location has the same data. During the duplication process, changes to the master database only are allowed. This is to ensure that local data will not be overwritten. Both of the processes can keep the data current in all distributive locations.

2)      Centralized database: -

A centralized database has all its data on one place. As it is totally different from distributed database which has data on different places. In centralized database as all the data reside on one place so problem of bottleneck can occur, and data availability is not efficient as in distributed database. Let me define some advantages of distributed database, it will clear the difference between centralized and distributed database.

 

Difference Between Distributed and Centralized Database

The main difference between centralized & distributed databases is that the distributed databases are typically geographically separated, are separately administered, & have slower interconnection. Also in distributed databases we differentiate between local & global transactions. A local transaction is one that accesses data only from sites where the transaction originated. A global transaction, on the other hand, is one that either accesses data in a site different from the one at which the transaction was initiated, or accessed data in several different sites.

Advantages of distributed databases

1.    Local autonomy ? a department can control the data about them (as they are the ones familiar with it.)

2.    Protection of valuable data ? if there were ever a disastrous event such as a fire, all of the data would not be in one place, but distributed in multiple locations.

3.    Improved performance ? data is located near the site of greatest demand, and the database systems themselves are parallelized, allowing load on the databases to be balanced among servers.

4.     Economics ? it costs less to create a network of smaller computers with the power of a single large computer.

5.    Modularity ? systems can be modified, added and removed from the distributed database without affecting other modules (systems).

6.    Reliable transactions - Due to replication of database single site failure doesn?t affect performance of system..

Disadvantage of Distributed Database:

  1. Complexity ? extra work must be done by the DBAs to ensure that the distributed nature of the system is transparent. Extra work must also be done to maintain multiple disparate systems, instead of one big one.
  2. Economics ? increased complexity and a more extensive infrastructure means extra labour costs.
  3. Security ? remote database fragments must be secured, and they are not centralized so the remote sites must be secured as well. The infrastructure must also be secured (e.g., by encrypting the network links between remote sites).
  4. Difficult to maintain integrity ? in a distributed database, enforcing integrity over a network may require too much of the network's resources to be feasible.
  5. Inexperience ? distributed databases are difficult to work with, and as a young field there is not much readily available experience on proper practice.
  6. Lack of standards ? there are no tools or methodologies yet to help users convert a centralized DBMS into a distributed DBMS.
  7. Additional software is required.
  8. Operating System should support distributed environment.

Database Models

A database model or database schema is the structure or format of a database, described in a formal language supported by the database management system. Schemas are generally stored in a data dictionary.

Although a schema is defined in text database language, the term is often used to refer to a graphical depiction of the database structure.

A database model is a theory or specification describing how a database is structured and used. Several such models have been suggested.

1.Hierarchical Model

In a hierarchical model, data is organized into a tree-like structure, implying a single upward link in each record to describe the nesting, and a sort field to keep the records in a particular order in each same-level list. Hierarchical structures were widely used in the early mainframe database management systems, such as the Information Management System (IMS) by IBM, and now describe the structure of XML documents. This structure allows one 1:N relationship between two types of data. This structure is very efficient to describe many relationships in the real world; recipes, table of contents, ordering of paragraphs/verses, any nested and sorted information. However, the hierarchical structure is inefficient for certain database operations when a full path (as opposed to upward link and sort field) is not also included for each record.



2.Network Model

The network model (defined by the CODASYL specification) organizes data using two fundamental constructs, called records and sets. Records contain fields (which may be organized hierarchically, as in the programming language COBOL). Sets (not to be confused with mathematical sets) define one-to-many relationships between records: one owner, many members. A record may be an owner in any number of sets, and a member in any number of sets.

The network model is a variation on the hierarchical model, to the extent that it is built on the concept of multiple branches (lower-level structures) emanating from one or more nodes (higher-level structures), while the model differs from the hierarchical model in that branches can be connected to multiple nodes. The network model is able to represent redundancy in data more efficiently than in the hierarchical model.



3.Relational Model: -

The relational model was introduced by 1970 as a way to make database management systems more independent of any particular application. It is a mathematical model defined in terms of predicate logic and set theory.

The products that are generally referred to as relational databases in fact implement a model that is only an approximation to the mathematical model defined by Cod. Three key terms are used extensively in relational database models: relations, attributes, and domains. A relation is a table with columns and rows. The named columns of the relation are called attributes, and the domain is the set of values the attributes are allowed to take.

The basic data structure of the relational model is the table, where information about a particular entity (say, an employee) is represented in columns and rows (also called tuples). Thus, the "relation" in "relational database" refers to the various tables in the database; a relation is a set of tuples. The columns enumerate the various attributes of the entity (the employee's name, address or phone number, for example), and a row is an actual instance of the entity (a specific employee) that is represented by the relation. As a result, each tuple of the employee table represents various attributes of a single employee.



4.Entity Relational Model

The entity-relationship model (or ER model) is a way of graphically representing the logical relationships of entities (or objects) in order to create a database. The ER model was first proposed by Peter Pin-Shan Chen in the 1970s.

In ER modeling, the structure for a database is portrayed as a diagram, called an entity-relationship diagram (or ER diagram), that resembles the graphical breakdown of a sentence into its grammatical parts. Entities are rendered as points, polygons, circles, or ovals. Relationships are portrayed as lines connecting the points, polygons, circles, or ovals. Any ER diagram has an equivalent relational table, and any relational table has an equivalent ER diagram. ER diagramming is an invaluable aid to engineers in the design, optimization, and debugging of database programs.

In a logical sense, entities are the equivalent of grammatical nouns, such as employees, departments, products, or networks. An entity can be defined by means of its properties, called attributes. Relationships are the equivalent of verbs or associations, such as the act of purchasing, the act of repairing, being a member of a group, or being a supervisor of a department. A relationship can be defined according to the number of entities associated with it, known as the degree.

There are three basic elements in ER models:

Entities are the "things" about which we seek information.

Attributes are the data we collect about the entities.

Relationships provide the structure needed to draw information from multiple entities.



Concept of Key In Relational Database

A key is single attribute that is used to identify one or more item form database. In simple word we say that data item which is used to identify the record is called key. The data elements, which cause other element to be known, is called key.

For example

Rno                       name               City

001                        Toqeer             DIKHAN

002                        Abrar               DIKHAN

003                        Ali                    DIKHAN

In above table RNO is a data element by which we can identify the record so it is called key.

TYPES OF KEY

1. Primary Key:

A data item or attribute that uniquely identify every tuple (row or record) in relation is called primary key or attribute key. It is used to access the data from database. It is also called atomic key (It is probably meant to identify something "as small as possible"). A table must have unique key so that no duplication occurs. Primary key always unique and not assign NULL value.

For example

                              Reg#               Name                 City

In above relation Reg # is primary key because as no more then one student have same Reg # so we can access single row with the help of Reg#. In simple word if we give Reg # of the student then we can access the whole record of that student.

2. Secondary Key:

A candidate key, which is not selected as a primary key is called secondary key.

In department teacher record group of data item identify by data item professor while another group of record may be identified with the data item lecturer.

3. Compound Key:

In database design, a compound key is a key that consists of 2 or more attributes that uniquely identify an entity occurrence. Each attribute that makes up the compound key is a simple key in its own right.

4. Composite Key:

Composite key is also a key that consists of 2 or more attributes that uniquely identify an entity occurrence; at least one attribute that makes up the composite key is not a simple key in its own right.

5. Super Key

A super key is a set of columns within a table whose values can be used to uniquely identify a row. A candidate key is a minimal set of columns necessary to identify a row, this is also called a minimal super key. For example, given an employee table, consisting of the columns employeeID, name, job, and departmentID, we could use the employeeID in combination with any or all other columns of this table to uniquely identify a row in the table. Examples of superkeys in this table would be {employeeID, Name}, {employeeID, Name, job}, and {employeeID, Name, job, departmentID}.

In a real database we don't need values for all of those columns to identify a row. We only need, per our example, the set {employeeID}. This is a minimal superkey ? that is, a minimal set of columns that can be used to identify a single row. So, employeeID is a candidate key

6. Candidate key:

When more then one attribute can access the uniquely record from a table then one attribute chosen as primary key then other attribute is candidate key.

7. Foreign Key:

An attribute which is primary key in one relation and non primary key in other relation then it is called foreign key.

For example if we have two relation

                              Student(Rno#, Name, Marks)

                              Book(Rno#, Book#, Bname)

In student record Rno# is used as primary key and in book record of library one Rno# can take more then one books so in this relation we can not used Rno# as primary key. So in this relation Book# is used as primary key. So Rno# in Book relation is used as Foreign Key.

Association Between Data Items

An association is logical relationship between data items. An association implies that the values for the associated data item are in same way depending on each other.

There are following four types of association:

1.            One To One Association

2.            One To Many Associations

3.            Conditional Association

4.            Reverse Association

                                          i.    One To One Association

                                         ii.    One To Many Associations

                                        iii.    Many To Many Associations

1) One To One Association:

A One To One association between data item X and Y means that for specified period of time a given value of X have one and only one value for Y. this association is represented as follow.

     

Suppose we have two data item Reg #  and student name then there is one to one association between Reg # and student name because for specified period of time one registration number is assigned to one student.

2) One To Many Associations

A One To many association between data item X and Y means that for specified period of time a given value of X have one or more then one value for Y. this association is represented as follow.

Suppose we have two data item Teacher   and Subject then there is one to many association between Teacher and Subject because for specified period of time one teacher may be teach more the one subject.

3) Conditional Association:

Conditional association between data item X and Y mean that there are only two possibilities X have zero or one value for Y. this association is represented as follow.

The most useful example of conditional association is hospital bad and patient. Each bed is associated with one patient or some time it will be unassigned (there will be no patient)

4. Reverse Association

a) One To One Association:

A One To One reverse association between data item X and Y means that for specified period of time a given value of X have one and only one value for Y and y have also one value for X. This association is represented as follow.

Suppose we have two data item Reg #  and student Rno then there is one to one reverse association between Reg # and student Rno# because for specified period of time for one Rno#  there is only one registration number and for one reg# there is only one Rno #. 

b) One to many Associations:

A One To Many reverse association between data item X and Y means that for specified period of time a given value of X have more then one value for Y but y have only one value for X. This association is represented as follow.

Department and teachers have one to many associations in reverse. Because in one department there are many teachers but one teacher can only teach in one department.

Many To Many Association:

A Many To Many association between data item X and Y means that for specified period of time a given value of X have more then one value for Y and Y have also more then one value for X. This association is represented as follow.

The most useful example of this association is relation between Student # and course #. Any student may have taken one or more then one course at a time and any course may have taken by one or more then one students.

Comments

Popular posts from this blog

Computer Hardware and It's Peripherals

Internet

Social Impact of Computer on Society