Modeling Data in SQL

There are my notes on modeling data in sql.

Ankr Store on Amazon, keep your electronics charged by the best! If you buy something, I get a small commission and that makes it easier to keep on writing. Thank you in advance if you buy something.

When dealing with databases, we are working with files and the processes that
manage them. There are rules to follow and each database vendor has a slightly
different outlook on how you should do things. These different rules affect how
you model the data you want to work with. 

Data modeling is the first task you need to tackle. It is the process of putting
your ideas into rows, columns, and relations. There are three types of data
modeling. They are physical, logical, and conceptual. Each layer refers to a
degree of elaboration ending with a model that can be translated into sql
statements and implemented into your database system fo chocie. 

Conceptual Modeling
Conceptual modeling deals with the highest level of abstractions such as
entities and relationships. Entities refer to the actual physical objects or
abstract concepts in your requirements. If you are modeling dats for a library,
you might think of books, authors, borrowers, and librarians. If you are
creating a data model for a bank, your entities would be accounts, clients,
and money. Relationships model the way these entities interact. Relationships
are defined through primary and foreign keys.

Logical Modeling
This is the next step in data modeling. Here you begin by thinking of
attributes. The trick is to distinguish between attributes that you want to
track as opposed to those that do not matter. 

Physical Modeling
Physical modeling is the final stage. This is where your abstract ideas become
something real. You are getting ready to translate your ideas into scripts and
implement them in an actual database system. This is the stage where you define
the names of your tables, names for columns, and most importantly, your data
types. 

Data Types
Database systems were created to store information. Data types help to tune
database performance. For example, xml data can be represented as text. It is
text but text processing is not optimized for the structured nature of xml
documents. With the exception of desktop database systems, every single database
system vendor decided to implement a specific xml data type to address the
issue. 

Another reason for data types' existence is their role in enforcing domain
integrity. This refers to the ability of a specific data type to enforce
constraints. For example, with a "date" data type, it might be possible to enter
a date that also meant something else. Before we can enter the data into our
relational database, we have to break the data into pieces of specific data
types such as characters, numbers, and pictures. This information is
reconstituted by a client application later on.

Character Data
All character strings in sql can be of fixed length or varying length. A
character string can be defined as a sequence of characters that belong to a
predefined character set. A character set is the language your database stores.
You might remember specifying collation order or locale during your database
system setup process. The length of the string is the number of characters in
the sequence. This is where internal representation matters. A character can be
represented by one or more bytes with latin-based languages. Everyone else in
the world, just about, cannot.

A byte is a computetr term for a unit of information storage that consists of 8
bits. Each bit can be either 1 or 0. The combination of 8 bits allows us to
store 256 distinct values, which form the foundation of ASCII character sets.
While there were attempts to remedy the situation with extended code pages,
ultimately the solution came with the introduction of Unicode. It is a standard
doub;e byte character set that assigns a unique number to every single
character, so it can represent many more characters than ASCII. The Unicode
standard is the result of development coordinated by a nonprofit organizaiton
called the Unicode consortium. 

Sql thinks in characters but computers count bits and bytes. If you define a
string to be of fixed length, the system allocates a certain number of bytes in
memory or on a computer hard disk. If your character set is latin based, a total
of 10 bytes will be allocated for the string, but if you use a double byte
character set, such as chinese, 20 bytes will have to be allocated. It does not
matter whether the actual value to be stored in that string is exactly that many
bytes or not, it will occupy the whole space that is allocated to it. So, all
string will have exactly the same length.

If you define a string variable as a varying length string with a maximum of 10
characters to store, the behavior will be different. The actual memory or disk
space required to hold the value will be allocated dynamically as needed. Only
strings that are 10 characters long will have all 10 characters allocated to
them, but if you have a string that is only 2 characters, only 2 bytes of
storage will be allocated. 

As you can imagine, this flexibility comes with a performance hit because the
database system must perform the additional task of dynamic allocations. A
standard piece of advice when you need to squeeze the last drop of performance
out of the database is to profile your data and allocated fixed length strings
for values that always come in predefined lengths. Just don't forget to have
checks in place for your inserts. An attempt to insert 11 characters into a
field defined as char(10) would result in an error. Also, when comparing two
character strings, variable length strings would not care much about trailing
blanks but fixed length strings would.

Binary Strings
A binary string is a sequence of bytes in the same way that a character string
is a sequence of characters, but unlike character strings that usually contain
information in the form of text, a binary string is used to hold nontraditional
data such as images, audio, and video files. Binary strings can be used for
purposes similar to those of character strings, but the two data types are not
compatible. The difference is like text and a photo of the same text. 

It might be a little confusing to learn that plain text documents can be stored
as character strings and a Word document has to be treated as a binary string. A
Word or Pdf file is a text document from a user's point of view, but from a
computer storage perspective, it is not. In addition to plain text characters,
it contains many special markers and instructions that only those programs can
interpret. The same is true for any other special files such as bitmaps.
spreadsheets, audio files, and video files. 

Because a Pdf is a binary file, it also contains information about fonts,
positioning, coloring, and other formatting. This information is a set of
instructions that the host program understands and can interpret to display
human readable text and any other objects that might be embedded into the
document.

Numeric Data
After the characters come numbers. Exact numbers can either be whole integers or
have decimal points. Numbers can be positive and negative. They can have
precision and scale. Precision determines the maximum total number of decimal
digits that can be stored. Scale specifies the maximum number of decimals
allowed. 

Approximate numbers are numbers that cannot be represented with absolute
precision. Each numeric data type has limits, a range of values that it can
represent which is pretty consistent across systems. The value ranges for some
common numeric data types. 

Most of the time, numbers are hidden in the database. Once in a while, you might
need to use numbers as a value inserted into your query, and databases can be
very particular in what format you supply them. Literals for numbers are
represented as strings, optionally preceded by plus or minus signs, with an
optional decimal part for numeric or decimal data types separated by a dot.

Time Data Types
Handling dates and time is probably one of the most confusing and inconsistent
topics in sql. Partly them comes from the inconsistency of the human devised
system of tracking days and times. The date data type behaves differently from
implementation to implementation. Some system have separate keywords and others
comebine date and times into one keyword for queries. 

The dates and times get into the database as literals as a return result from a
function or as a conversion. The database systems have implemented a number of
sql functions to help handle this peculiar data type. While it might be apparent
to humans that some literal strings are veritable dates, computers have no such
insight and try to treat anything as a date. Once dates are in the database, you
need to take extra care manipulating them. For example, if you compare two dates
that have the same day, month, and year component but differ in time they will
be evaluated as not equal. Be sure to compare apples to apples. There are many
date and time related functions to help you compare values, extract information,
and even do date arithmetic.

Binaery Data
Binary data are for computers to understand and interpret so that humans can
understand. Prime examples of binary data are pictures and only your image
editor knows how to arrange these ones and zeros into a picture. There is not
much you can do with binary data in sql besides storing it and retrieving it on
demand. To store binary data, database systems have introduced a number of
binary data types.