CIT 111: Introduction to Databases

Prepare 04: Data Retrieval

This week you will...

Instructions

This material will help you build a foundation for this week's topic. Test your understanding by taking this week's prepare quiz in Canvas. The quiz is open book and open note. You may retake the quiz up to 3 times to improve your score.

Terms to Know

SELECT:
A statement that describes the columns in the result set, used to retrieve table data.
FROM:
A command that indicated the table you are getting data from.
ORDER BY:
A keyword used to sort the rows of a result set.
LIMIT:
A clause used to show the number of rows to return in the result set.
DISTINCT:
A keyword used to eliminate duplicate rows in the result set.
WHERE:
A clause used to retrieve just the data you want; filters the data.
Arithmetic Operators:
Used to make calculations. Examples are multiplication (*), division (/), modulus (%), addition (+), and subtraction (-).
Comparison Operators:
Used to make comparisons to restrict information in the result set. Examples are equal to(=), less than(<), less than or equal to(<=), greater than (>), greater than or equal to(>=), and not equal to(<> or !=).
Logical Operators:
Use with multiple conditions to combine or negate search conditions. Examples are both conditions have to be true (AND), one of the conditions have to be true (OR) and negates the condition (NOT).
IN:
An operator that can test an expression to a list of expressions.
BETWEEN:
An operator that can compare an expression with a range of values.
LIKE:
Operator that matches string patterns to a test expression letting you filter the results according to certain string patterns.
REGEXP:
A regular expressions operator that allows much more complex string patterns to test expressions.
Null Value:
No value, blank. It is different from a zero value or a field that contains spaces, null values have been left blank when the row was added.
IS NULL (IS NOT NULL):
Constraints to test for null values. IS NULL for all nulls. IS NOT NULL for all that are not nulls.
ASC:
Keyword used with ORDER BY to sort the results in ascending order (smallest to largest).
DESC:
Keyword used with ORDER BY to sort the results in descending order (largest to smallest).
Order of Execution:
The order the system actually executes clauses. This is a different order from how the clauses are written in SQL code.
Function:
A stored program, or operation that when performed on data will returns a value or manipulated or convert the data.
Scalar Function:
A function that works on a single entity of data, not a group of data together.
Parameter:
Used to pass data values to a function. SQL function parameters are inside parentheses following the function name and if there is more than one parameter they are separated by a comma.
Concatenate:
Combining values, by appending them to each other, to form a single longer value.
Index numbering:
The number related to each character of a string. SQL index numbering starts at 1 for the first character.

Retrieving Data

Watch the single table queries video:

(16:38 mins, "Single Table Queries" Transcript)

The SELECT Statement

As you have learned, the SELECT statement is used to retrieve data from a database. Now you will study the clauses that make up SELECT statement.

5 Clauses of the SELECT Statement

SELECT Lists the columns in the result set
FROM Specifies the table from where the data will be retrieved
WHERE Filters the rows to only those that match the condition
ORDER BY Specifies how to sort the rows
LIMIT The number of rows to return
SELECT and FROM Statements

The SELECT statement is used to retrieve data from a table. The SELECT clause lists the columns you want as part of the resulting data, or result set. The FROM clause tells what table you are getting the data or columns from.

The SELECT Statement Syntax

SELECT column_list
[FROM table_source]

There are 4 ways to code the column list in the SELECT statement.

1. All columns with an asterisk
SELECT * 
FROM products
prod_id prod_name prod_category prod_price
t12 bouncy ball toy 2.99
t15 jacks toy 3.99
t18 doll toy 5.99
t21 top toy 1.99
s10 jump rope null 2.99
s13 hockey puck sports 1.99
s17 mitt sports 5.99
s22 bat sports 2.99
s24 baseball sports 1.99

The wildcard asterisk (*) is a fast way to list all the columns of the table during development but should be avoided when writing queries for production. Retrieving unnecessary columns slows down performance. It's better to list only the columns you need for a given query.

2. By column name (each column separated with a comma)
SELECT prod_name, prod_price
FROM products
prod_name prod_price
bouncy ball 2.99
jacks 3.99
doll 5.99
top 1.99
jump rope 2.99
hockey puck 1.99
mitt 5.99
bat 2.99
baseball 1.99
3. Columns created with calculations
SELECT prod_name, prod_price + 2.00 
FROM products
prod_name prod_price + 2.00
bouncy ball 4.99
jacks 5.99
doll 7.99
top 3.99
jump rope 4.99
hockey puck 3.99
mitt 7.99
bat 4.99
baseball 3.99

By default, the result set column header is the same name as the column, calculation or function. Notice 'prod_price + 2.00' is the column header for the calculated column. You can rename a column to clarify the data in that column in the result set. The AS keyword allows you to rename the column. This is known as an alias. The string following the AS keyword replaces the column name, calculation or function. If there are spaces in the alias string you need to place quotes around it.

SELECT prod_name, prod_price + 2.00 AS 'Markup Price'
FROM products
prod_name Markup Price
bouncy ball 4.99
jacks 5.99
doll 7.99
top 3.99
jump rope 4.99
hockey puck 3.99
mitt 7.99
bat 4.99
baseball 3.99

When making calculations in the SELECT statement, you can use any of the following arithmetic operators:

*  Multiplication		
/  Division		
%  Modulus		
+  Addition
-  Subtraction 

If more than one arithmetic operator is used they follow the order of precedence as they are listed in the table above.

4. Columns created with functions

Functions are not covered until a later lesson, but are mentioned here only to demonstrate that you can use functions in the SELECT statement.

SELECT prod_name, ROUND(prod_price) AS nearest_dollar
FROM products
prod_name nearest_dollar
bouncy ball 5
jacks 6
doll 8
top 4
jump rope 5
hockey puck 4
mitt 8
bat 5
baseball 4

The SELECT statement uses the ROUND function with price as the parameter to round the price to the nearest dollar. The alias is the string 'nearest_dollar'.

The DISTINCT keyword can be used with SELECT to eliminate duplicate rows in a result set. For example without the DISTINCT keyword this SELECT statement results in all prices of each product listed.

SELECT prod_price
FROM products
prod_price
2.99
3.99
5.99
1.99
2.99
1.99
5.99
2.99
1.99

With the DISTINCT keyword all duplicate prices are eliminated.

      SELECT DISTINCT(prod_price)
FROM products
prod_price
2.99
3.99
5.99
1.99

The WHERE Clause

The WHERE clause lets you retrieve just the rows you want. So only part of the table or column data shows up in the result set depending on the criteria in your WHERE clause. The WHERE clause filters the data.

The SELECT Statement Syntax with WHERE:

SELECT select_list
[FROM table_source]
[WHERE search_conditions]

You can use comparison operators to restrict information in the result set.

       =	   	   Equal to
       <		   Less than
       <=		   Less than or equal to
       > 		   Greater than
       >= 		   Greater than or equal to
       <> or !=	   Not equal to
      

If you wanted to restrict your result set to only those products that have a price greater than $5.00, you could use this WHERE clause.

SELECT prod_name, prod_price
FROM products
WHERE prod_price > 5.00 
prod_name prod_price
doll 5.99
mitt 5.99

Or those products priced less than or equal to 2.99.

SELECT prod_name, prod_price
FROM products
WHERE prod_price <= 2.99 
prod_name prod_price
bouncy ball 2.99
top 1.99
jump rope 2.99
hockey puck 1.99
bat 2.99
baseball 1.99

Or those products that are not priced equal to 2.99.

      SELECT prod_name, prod_price
FROM products
WHERE prod_price <> 2.99 
prod_name prod_price
jacks 3.99
doll 5.99
top 1.99
hockey puck 1.99
mitt 5.99
baseball 1.99

You can also use multiple conditions with logical operators. Use AND and OR to combine two or more search conditions and NOT to negate a search condition.

AND    Both conditions have to be true for it to show up in the result set
OR     One of the conditions has to be true for it to show up in the result set
NOT    Negates the condition. 

If you write a WHERE clause with multiple conditions (or a compound condition), there is an order of precedence. First the NOT runs, then the AND, and then the OR. Use parenthesis to override this order of precedence.

Select all products and their price that have a price equal to 5.99 and also have the name of 'doll'. There are two products that are 5.99 but only one with the name doll. Both conditions have to be true for the result to show up.

       SELECT prod_name, prod_price
FROM products
WHERE prod_price = 5.99 AND prod_name = 'doll'
prod_name prod_price
doll 5.99

Select all products and their price that have a price equal to 5.99 or has a name equal to 'bat'. We get both products that are equal to 5.99 and a product name equal to 'bat'. So either condition can be true for the result to show up.

       SELECT prod_name, prod_price
FROM products
WHERE prod_price = 5.99 OR prod_name = 'bat' 
prod_name prod_price
doll 5.99
mitt 5.99
bat 2.99

Select all products and their price that do not have a price equal to 5.99. Reverses, or negates the condition and returns the opposite; all products with their prices that are not equal to 5.99.

       SELECT prod_name, prod_price
FROM products
WHERE NOT prod_price = 5.99 
prod_name prod_price
bouncy ball 2.99
jacks 3.99
top 1.99
jump rope 2.99
hockey puck 1.99
bat 2.99
baseball 1.99

The IN operator tests if an expression is in a list.

Select all products and their prices that have a price of 2.99, 1.99 or 5.99.

       SELECT prod_name, prod_price
FROM products
WHERE prod_price IN (2.99, 1.99, 5.99)
prod_name prod_price
bouncy ball 2.99
doll 5.99
top 1.99
jump rope 2.99
hockey puck 1.99
mitt 5.99
bat 2.99
baseball 1.99

The BETWEEN operator compares an expression with a range of values. Select all products and their prices that have a price ranging between 1.99 and 3.99. It is inclusive and would also include 1.99 and 3.99.

       SELECT prod_name, prod_price 
FROM products 
WHERE prod_price BETWEEN 1.99 AND 3.99
prod_name prod_price
bouncy ball 2.99
jacks 3.99
top 1.99
jump rope 2.99
hockey puck 1.99
bat 2.99
baseball 1.99

The LIKE operator matches a string pattern to an expression. It uses two characters to represent characters. The _ underscore is a wild card to one character and the % percent is a wild card to any number of characters.

Select all products and their price that begin with 'b' and have any number of character after the 'b'.

       SELECT prod_name, prod_price
FROM products
WHERE prod_name LIKE 'b%' 
prod_name prod_price
bouncy ball 2.99
bat 2.99
baseball 1.99

Select all products and their price that end with 'll' and have any number of characters before the 'll'.

SELECT prod_name, prod_price
FROM products
WHERE prod_name LIKE '%ll' 
prod_name prod_price
bouncy ball 2.99
doll 5.99
baseball 1.99

Select all products and their price that have one character before an 'o' and any number of characters after the 'o'.

SELECT prod_name, prod_price
FROM products
WHERE prod_name LIKE '_o%' 
prod_name prod_price
bouncy ball 2.99
doll 5.99
top 1.99
hockey puck 1.99

The REGEXP operator allows much more complex string patterns to test expressions. Regular Expressions have many different symbols and are beyond the scope of this course, but here is an example.

In this example the | bar pattern represents 'matching any of the patterns'. So only product names that have a 'ck' or an 'all' string within them will show up.

SELECT prod_name, prod_price
FROM products
WHERE prod_name REGEXP 'ck|all'; 
prod_name prod_price
bouncy ball 2.99
jacks 3.99
hockey puck 1.99
baseball 1.99

There are other patterns like:

^      matches the beginning of a string
$      matches the end of a string
[']    any character inside the square brackets
p1|p2  matches any of the patterns p1 or p2 

A NULL value is when a value has been left blank when a row or record is added. It is different from a zero value or a field that contains spaces.

The IS NULL or IS NOT NULL will find all nulls or all values that are not null. We only had one null in our table; the product category of the jump rope row.

prod_id prod_name prod_category prod_price
t12 bouncy ball toy 2.99
t15 jacks toy 3.99
t18 doll toy 5.99
t21 top toy 1.99
s10 jump rope null 2.99
s13 hockey puck sports 1.99
s17 mitt sports 5.99
s22 bat sports 2.99
s24 baseball sports 1.99

Because NULLs are not values (they have no value), it is not possible to use comparison operators (=, < or>) with NULL.

SELECT prod_name, prod_price
FROM products
WHERE prod_category IS NULL
prod_name prod_price
jump rope 2.99
SELECT prod_name, prod_price
       FROM products
       WHERE prod_category IS NOT NULL
      
prod_name prod_price
bouncy ball 2.99
jacks 3.99
doll 5.99
top 1.99
hockey puck 1.99
mitt 5.99
bat 2.99
baseball 1.99

The ORDER BY clause will sort your result set. You can specify one or more column names to sort the result set.

The SELECT Statement Syntax with ORDER BY:

SELECT select_list
[FROM table_source]
[WHERE search_conditions]
[ORDER BY order_by_list]

With this ORDER BY the result set is sorted alphabetically by product name.

SELECT prod_name, prod_price
FROM products
WHERE prod_price > 1.99
ORDER BY prod_name
prod_name prod_price
bat 2.99
bouncy ball 2.99
doll 5.99
jacks 3.99
jump rope 2.99
mitt 5.99

With this ORDER BY the result set is sorted numerically by product price.

SELECT prod_name, prod_price
FROM products
ORDER BY prod_price
prod_name prod_price
top 1.99
hockey puck 1.99
baseball 1.99
jump rope 2.99
bat 2.99
jacks 3.99
doll 5.99
mitt 5.99

With this ORDER BY the result set is sorted numerically by product price then within the product price it is also sorted by product name.

SELECT prod_name, prod_price
FROM products
ORDER BY prod_price, prod_name
prod_name prod_price
baseball 1.99
hockey puck 1.99
top 1.99
bat 2.99
bouncy ball 2.99
jump rope 2.99
jacks 3.99
doll 5.99
mitt 5.99

You can reverse the sort from the default ascending (ASC) to descending using the keyword DESC. The keyword is assumed unless DESC is used. Now the prices are sorted from largest price to smallest price.

SELECT prod_name, prod_price
FROM products
ORDER BY prod_price DESC
prod_name prod_price
mitt 5.99
doll 5.99
jacks 3.99
bat 2.99
jump rope 2.99
bouncy ball 2.99
baseball 1.99
hockey puck 1.99
top 1.99

The LIMIT clause specifies the maximum number of rows that will be returned.

The SELECT Statement Syntax with LIMIT:

SELECT select_list
[FROM table_source]
[WHERE search_conditions]
[ORDER BY order_by_list]
[LIMIT row_limit]

In this statement, the result set is limited to 4 rows, even though there are 9 products in the table that would have been a part of the result set if there were no LIMIT clause.

SELECT prod_name, prod_price
FROM products
LIMIT 4
prod_name prod_price
bouncy ball 2.99
jacks 3.99
doll 5.99
top 1.99

Remember we have to type the clauses in a specific order. SELECT, then FROM, then WHERE, then ORDER BY, and then LIMIT

This is the order of how you must write a select query.
SELECT
FROM
WHERE
ORDER BY
LIMIT

But the order in which the computer actually executes the clauses is different. This is referred to as the Order of Execution.
FROM
WHERE
SELECT
ORDER BY
LIMIT

The data from the entire table is retrieved, then that is filtered by the WHERE criteria. The columns, calculations, or functions that will show up are determined in SELECT. Then that resulting data is sorted and then limited.

The order of execution is why an alias that was set up in the SELECT clause can be used in an ORDER BY but not a WHERE.

SELECT prod_name, prod_price + 2.00 AS mark_up
FROM products
WHERE prod_price > 1.99
ORDER BY mark_up
prod_name mark_up
bouncy ball 4.99
jump rope 4.99
bat 4.99
jacks 5.99
doll 5.99
mitt 7.99

Watch the functions video:

(11:36 mins, "Functions" transcript)

Using Functions

In MySQL, a function is a stored program, or operation that when performed on data will returns a value or manipulate or covert the data. This allows us to perform manipulations on the data before retrieving or saving it. Functions have a name that you refer to and can also accept one or more parameters that can be passed into the function.

function name>(parameter1, parameter2, ...)

In this lesson, we will be looking at scalar functions that operate on the individual rows of a table. Scalar means the function will work on a single entity of data, not a group of data together. So if the function is used with a table with many rows the function will work on each single value or entity of data in each row.

Types of Functions:

Text used to manipulate strings of text
Numeric used to perform mathematical operations on numeric data
Date & Time used to manipulate data and time values and to extract specific components
System returns information specific to the DBMS being used

We will be going over some of the more common text, numeric, and date & time functions.

Commonly used Text-manipulation or String Functions

Function Description
CONCAT(string, string) Returns all the string combined into one string
LEFT(string, length) Returns characters from the left of the string
RIGHT(string, length) Returns characters from the right of the string
LTRIM(string) Trims white space from the left of the string
RTRIM(string) Trims white space from the right of the string
TRIM( string) Trims white space from the left and right of the string
LENGTH(string) Returns the length of characters of the string
LOWER(string) Converts the string to lowercase
UPPER(string) Converts the string to uppercase
LOCATE(find, search, start) Find a substring within the string
SUBSTRING(string, start, length ) Return characters from within a string

Let's use the keyword table to demonstrate some of the functions.

SELECT * FROM keyword
keyword_id keyword
1 sky
2 landscape
3 people
4 nature
SELECT CONCAT('The primary key of  ', keyword, ' is ', keyword_id) AS new_string, 
FROM keyword
WHERE keyword_id = 1
new_string
The primary key of sky is 1

The result is made up of string literals inside quotes and column values all combined into one new string.

SELECT  LEFT(keyword, 4), LENGTH(keyword), UPPER(keyword)
FROM key
WHERE keyword_id = 2
LEFT(keyword, 4) LENGTH(keyword) UPPER(keyword)
land 9 LANDSCAPE

The first column takes the string 'landscape' and returns 4 characters from the left.
The second column adds up all the characters in the string 'landscape'
The third column converts all the letters in the string 'landscape' to upper case.

Index numbering of the characters in the string 'landscape'

l a n d s c a p e
1 2 3 4 5 6 7 8 9
SELECT LOCATE( 'and', keyword), SUBSTRING(keyword, 1, 5), SUBSTRING(keyword, 7)
FROM key
WHERE keyword_id  = 2
LOCATE('and', keyword) SUBSTRING(keyword, 1, 5) SUBSTRING(keyword, 7)
2 lands ape

The first column searches for the substring 'and' inside of the string 'landscape' and returns the index number of where it was found. the 2nd character in is where the 'a' in 'and' starts.
The second column starts at the first character and returns all character from there to the fifth character.
The third column starts at the seventh character and returns the rest of the string from there.



Commonly used Numeric Functions

Function Description
ROUND(number, length) Returns the number rounded
FLOOR(number) Returns the next smaller whole number
CEILING(number) Returns the next larger whole number
ABS(number) Returns the absolute value of the number
SQRT(number) Returns the square root of the number
MOD(number, divisor) Returns the remainder of the number divided by the divisor
FORMAT(number, decimals, locale) Returns the number is currency format

Lets use the magazine table to demonstrate some of the numeric functions

 SELECT * FROM magazine
magKey magName magPrice
1 Fishing in the Mojave 13.95
2 Car Racing Made Easy 15.45
3 Pine Cone Computing 17.50
4 Cooking Like Mad 18.00
5 If Only I Could Sing 25.00
SELECT ROUND(magPrice), ROUND(magPrice, 1)
FROM magazine
ROUND(magPrice) ROUND(magPrice,1)
14 14.0
15 15.5
18 17.5
18 18.0
25 25.0

The first column rounds to the nearest whole number.
The second column rounds to one decimal place.

SELECT magPrice, FLOOR(magPrice), CEILING(magPrice)
FROM magazine
magPrice FLOOR(magPrice) CEILING(magPrice)
13.95 13 14
15.45 15 16
17.50 17 18
18.00 18 18
25.00 25 25

The number 13.95 has the next lower whole number of 13 and next larger whole number of 14.
The number 15.45 has the next lower whole number of 15 and the next larger whole number of 16.
The number 17.50 has the next lower whole number of 17 and the next larger whole number of 18.
The number 18.00 has the next lower whole number of 18 and the next larger whole number of 18.
The number 25.00 has the next lower whole number of 25 and the next larger whole number of 25.

SELECT SQRT(magPrice)
FROM magazine
WHERE magKey = 5
SQRT(magPrice)
5

The square root of 25 is 5.

If you wanted to manipulate your magazine prices to look more like currency you can use the FORMAT function. The first parameter is the number, the second the decimal places and the third is the locale which determines the thousands separator. Ours will be 'en_US' to show US English formatting. 10,000 dollars will be added to each magazine price so we can see the commas placed at each thousands grouping. The CONCAT function will also be used to add the '$' dollar sign to the number.

SELECT CONCAT('$', FORMAT(magPrice + 10000, 2, 'en_US')) 
FROM magazine
CONCAT('$', FORMAT(magPrice + 10000, 2, 'en_US'))
$10,013.95
$10,015.45
$10,017.50
$10,018.00
$10,025.00

Commonly used Date and Time Functions

Function Description
YEAR(date) Returns the year portion of a date
MONTH(date) Returns the month portion of a date
DAY(date) Returns the day portion of a date
HOUR(time) Returns the hour portion of a time
MINUTE(time) Returns the minute portion of a time
SECOND(time) Returns the second portion of a time
NOW() Returns the current local time and date from your system
DATE_ADD(date, INTERVAL length time) Adds a specified interval to a date
DATEDIFF(date, date) Calculates the difference in days between two dates
DATE_FORMAT(date, format) Returns a string for a date with specific formatting

Let's use the employee table to demonstrate some of the date and time functions.

SELECT * FROM employee
emp_id emp_fname emp_lname hire_date emp_email
1 Jenny Jensen 2013-06-15 jensenj@salon.com
2 Haley Lopez 2013-08-23 lopezh@salon.com
3 Robert Green 2014-01-03 greenr@salon.com
4 Olive Adams 2015-07-12 adamso@salon.com
5 Julie Davis 2015-10-20 davisj@salon.com
SELECT hire_date, YEAR(hire_date), MONTH(hire_date) 
FROM employee
hire_date YEAR(hire_date) MONTH(hire_date)
2013-06-15 2013 6
2013-08-23 2013 8
2014-01-03 2014 1
2015-07-12 2015 7
2015-10-20 2015 10

The YEAR function takes the hire date and extracts only the year and the MONTH function only extracts out the month from the date.

SELECT NOW()
NOW()
2020-12-28 13:26:17

The NOW function returns the systems current date. When you run this function it will be the current date and time of your system.

SELECT hire_date AS Hire Date, DATE_ADD(hire_date, INTERVAL 6 MONTH) AS 'Six Months Later'
FROM employee
Hire Date Six Months Later
2013-06-15 2013-12-15
2013-08-23 2014-02-23
2014-01-03 2014-07-03
2015-07-12 2016-01-12
2015-10-20 2016-04-20
SELECT DATEDIFF(hire_date, '2013-01-01') AS 'Days Between'
FROM employee
Days Between
165
234
367
922
1022

This gives the difference in days between January 1st 2013 and the hire date. If you want years, you would need to divide it by 365.

SELECT DATEDIFF(hire_date, '2013-01-01') / 365 AS 'Years Between'
FROM employee
Years Between
0.4521
0.641
1.0055
2.5260
2.8000

The DATE_FORMAT function allows you to format the date so it isn't always in the format of yyyy-mm-dd.

SELECT hire_date, DATE_FORMAT(hire_date, '%M %D, %Y') AS 'hire date formatted'
FROM employee
hire_date hire date formatted
2013-06-15 June 15th, 2013
2013-08-23 August 23rd, 2013
2014-01-03 January 3rd, 2014
2015-07-12 July 12th, 2015
2015-10-20 October 20th, 2015

The second parameter is a format string that includes codes for formatting the different parts of the date. Here are some of the common codes for formatting dates:

Format Code Description
%c Month, numeric
%M Month name
%e Day of the month, numeric
%D Day of the month with suffix
%y Year, 2 digits
%Y Year, 4 digits
%W Weekday name

There are also format codes for formatting times with the function TIME_FORMAT. But we won't be using any of those in this course.

There are many more functions than we have not covered here, but for the depth of this course, these are the text, numeric, and date & time functions we will be focusing on.

Submission

Return to Canvas and take the Week 03 Prepare Quiz.

Submission

Take the Week Prepare quiz in Canvas

You may use this quizlet to help you study the terms.

Useful Links: