CIT 111: Introduction to Databases

W01: Understanding Data

This week you will:

Terms to Know

Familiarize yourself with the following terms.

Data:
The bits and pieces, like numbers, characters or symbols with no relevance.
Information:
A set of data with relevance. At a certain place in time, or for a certain person, the data makes sense.
Knowledge:
Information that has been acquired and gives understanding about the importance of the information. Used to make decisions.
Decision-making:
Acting on the knowledge obtained, to achieve some benefit.
Entity:
A person, place, thing or concept about which data can be collected.
Attribute:
Describes the facts, details or characteristics of an entity.
Data–driven Decision–Making:
Using data as the basis for making decisions in an organization to avoid bias and false assumptions that lead to poor decisions.
Query:
A request for data or information. Asking a question from the data.
Sort:
Arrange data or information in a specific way to make it more useful. Such as alphabetizing from A to Z.
Filter:
Hiding or filtering out unwanted data or information so only the information you want to see is displayed.
Statement of Work (SOW):
A document used to plan a successful execution of a project or system. Used in planning for data systems such as databases.
Data Integrity:
Ensuring the accuracy and reliability of data, making sure it remains intact. It can be maintained by standards implemented during the design phase.
Data Consistency:
The usability of data, or having only valid data written to the database. All values in a single attribute of an entity should be of the same data type.
Data Redundancy:
The repetition of data where the same data value is stored in more than one place in the database. This wastes space and necessitates multiple changes in different places when editing the value.
Entity Data Storage:
A single thing, person, place, or object; usually represented as a single row in data storage. Also referred to as a record.
Attribute Data Storage:
A value that describes a characteristics of the entity. A set of these related values for many entities are represented as a column in data storage. Also referred to as a field.

What is Data?

Data, Information, Knowledge, Decisions

data to information to knowledge to decisions

Data are the bits and pieces of information, like numbers and characters but without context or relevance. We may have data, but without context, it is meaningless to us. We can see, however, the type of data (or datatype) it might be, like numbers or characters.

Data is a latin word and does not follow the normal English rules for singular and plural. Datum is the singular form, and data is the plural form.

individual numbers, symbols or characters randomly placed

As data is brought together it gains meaning, and can be understood. There are still characters and numbers, but now they are wrapped with meaning, but may still not be relevant. In the example below, the number 80 without any context tells us nothing about what that number refers to. 80 with a unit name such as liters or miles per hour(mph), gives us context for the number.

individual words or groups of numbers randomly placed

Information is a set of data with relevance. At least one person can see the relevance in the data. Because information has relevance, it is more than just data. At a certain time or for a certain person, the data makes sense. In the example below; we can now see that there is a network of connections between the pieces of information. The different attributes such as type, color, top speed, price and year describe an automobile. Each row of data represents one entity, something about which data can be collected. In this case the entity is an automobile

vehicle with truck information (color, speed, year, price) table

Knowledge is information combined with understanding about the importance of the information. President Russell M. Nelsen has said, "I know that good inspiration is based upon good information" Knowledge comes from studying information. Knowledge can be used to make decisions.

Note: Knowledge doesn't indicate a change in the information but rather an awareness or understanding of the information.

Everyone needs to make decisions, whether in a career or every day life. The ability to understand data and the process in which it becomes knowledge are critical. There is a certain way data needs to be organized to give it context and eventually become the knowledge that allows correct decision-making for growth and change.

Decision-making is acting on knowledge to achieve some benefit. Organizational success depends on good decisions. Decision making systems used by organizations, must be able to take data, put the data into context, and provide tools for analysis. From this analysis, organizations can make decisions. A database is designed for just such a purpose.

Note: Again, there is no change in the information, but rather when information is understood, an organization can make a decision based on knowledge.

Entities and Attributes

As we begin to organize data, we group data into entities. An entity is a person, place, thing or concept about which we can collect data. If we were gathering data about each employee in our organization, we might include their phone numbers, addresses, first and last name. Here is what the unorganized data pieces might look like.

individual characters or numbers randomly placed

As we begin to put the data into context they have a little more meaning but are still not useful information. Data in context might look like this:

individual words or groups of numbers randomly placed

As we organize the data, we might put all of the attributes about one employee together; the first and last name that belong to one employee and their phone number and address. Each employee and their information might be grouped with other employees. Each entity is made up of several attributes which describe that entity. An attribute describes the facts, details or characteristics of an entity. In our example the entity is an employee, and the attributes are the values that describe each employee. Each entity has the attributes of first name, last name, address, and phone number. Thus information might look something like this:

employee list with their names, addresses and phone numbers in a neat table

The data now has content and relevance and can be understood better. However, if we had thousands of employees, it might be hard to get the knowledge we need from the information and therefore might be difficult to make good decisions with the information. That is the purpose of a database; to organize data into information and then to enable analysis on this information. Then with a knowledge of that information we can make good decisions.

Ask Good Questions

Watch the following video about how to ask good questions.

(03:42 mins, "Good Questions" Transcript)

What Can We Do With Data?

Asking Good Questions

Just as we can ask good question to guide our lives, organizations must ask good questions in order to get the information they need to make good decisions. Rather than just making decisions based on intuition or observations alone, organizations should use data that has been gathered to make decisions. This is referred to as data-driven decision making. By using data as the basis for decision-making, organizations can avoid bias and false assumptions that lead to poor decision-making.

library with many levels and lots of books

Let's apply what we know about good questions to making data-driven decisions.

Know Your Mission

President Henry B. Erying said that it is hard to see which questions matter unless you have some vision of your own life and of the world. The same is true of organizations. It is hard to see which questions matter to the organization unless there is some vision of the company's goals, focus, or its mission. Once you have a good understanding of these, you can determine what questions must be answered to achieve organizational goals.

The questions should be important and relevant to the organizational mission. A foundational knowledge of the industry and what problems your organization and others face in the industry will also help determine these questions. A clear purpose and a defined problem to solve will also bring focus on what questions are important. Remember we don't want to focus on unimportant questions to the exclusion of good ones.

For example, a company may need to decrease expenses in order to remain profitable. The company must make a decision. If the question of how to decrease the expenses in order to remain profitable is important and relevant to the company and their goals, then getting as much data about how to solve this problem is the first step.

Identify Data Sources

Once we know what questions to ask, we need to find data sources that will answer the questions. Where does the data come from? What data does our organization create or gather that will contribute to this data store? If you know what questions need to be answered, you can work backwards and figure out what data is needed to answer those questions. Also, keep in mind any future questions or problems that might require that we store additional data for the future.

Data-driven decision making requires that the data is good, in other words, that it is high quality or credible data. Many times data can be inaccurate, outdated or unreliable. We must be good stewards of our data to make sure the decisions we make are based on good quality data. It is critical to follow a quality assurance process as we gather data. For example, a company would not want to downsize staff or cut employee benefits if there is not good data to support this decision. With good data, the company might see other ways to cut costs or, indeed, decide that downsizing is the right decision. As long as the questions are good and the gathered data is high quality, a company can make good decisions. But how do we use good data to help us make decisions?

Organize the Data

In this course, you will learn how data is stored and retrieved. As you learn about database design you will learn how to organize data into an efficient design so that you can retrieve the information you need from it. You will prepare the raw data and organize it in such a way that it becomes tables of useful information that you can ask questions of (or query) for answers to your questions. As you organize the data, you will ensure data integrity (making sure our data is accurate and reliable) and data consistency (making sure our data is useable) and eliminate data redundancy (storing the same data in multiple locations). As we are querying the data, we will take tables of data and sort the data in a specific ways to make the reported data more useable. We will also filter data or hide unwanted data, displaying only the records we want to see.

Analysis and Conclusions

Once we retrieve the information we need to answer our questions. We can analyze the information and draw conclusions from the information in order to answer our questions and make decisions that will help our organization succeed.

laptop with information and graphs on screen


Statement of Work

Because data systems or other projects are built to benefit organizations, a document called a Statement of Work (SOW) can be used to plan a successful execution of the project. This works with data-driven decisions. Users of the data system need to make decisions based on the output from these systems or projects. This SOW document is the starting point of creating a data-driven system or project that will give us information to answer our organization's questions.

It's important to remember that as the designer of a database system, we don't get to decide what will and will not go into the system. The client (the ultimate users of your database) will decide what they need. The statement of work will also help keep the project on schedule and within budget. It helps everyone know what is expected. For example, the client might be expecting a complete working system once the database is built, but it is your understanding that only the backend database is being developed. It is situations like this that can be cleared up through the Statement of Work so everyone is in agreement.

Parts of a Statement of Work

Let's look at 4 basic elements of the Statement of Work.

History

The history section is where you will find why the organization needs the database system. What is the current system like or perhaps they may not even have a current digital system, and they need one. How did they come up with their current system? What problems is that current system causing? What are its limitations? At this point, it is helpful to realize you need to be prepared with questions you can ask to get the users to focus on the important aspects they need. Make sure to do a lot of listening so you can understand exactly what is needed. Keep your preconceived notions out of the discussion. You will take a look at those who will use the database system and what output they need from the system to do their jobs well. These users may not know anything about database design, but they do know exactly what they need to do their job.

Scope

Scope is a description of what needs to be done. It is a general statement of the requirements and expectations of the project. It does not go into details about how exactly this will be done but provides the overall description. This may also include general constraints such as time and budget constraints. What software and hardware will be needed and who will do the work?

Objective

The objective element of the statement of work involves the purpose of the project. What is it intended to achieve? This does not include specifics about each of the elements of the database, but what the database is supposed to achieve. What outputs will be produced by the system? Remember, users know what they need to do a good job. You will listen to them, look at their current outputs, screen views, reports, forms etc. and get an idea of what information your database will need to produce. In this course, we will be focusing on data-driven systems like databases and, from these databases, how decisions-making information is retrieved.

Timeline

In the timeline section,each task involved in creating the project is broken up into discrete tasks. Each task will have an estimated time frame of completion and what is to be done during that time frame.

Later in the course, as we begin looking at database design, we will revisit statement of work documents that were created as we begin the design process. The statement of work is the beginning point to see what will be going into the design and is completed before a database system is created. We will not get into too many details of this process for this introductory course. Many of the databases we will use, come with assumptions that we have already been through this SOW process. However, it is not something you want to skip in a real world design process. It is a mistake to think you know how to begin the design process before you have studied it out thoroughly with those who will be using the system.

A good database begins with good quality data and is designed with careful planning and with the proper people involved.

Submission

Take the Week Prepare quiz in Canvas

You may use this quizlet to help you study the terms.

Useful Links: