Basic Concepts of Data Processing (Part 1)
Introduction
In 1977, a high school student named Richard Garriott took a summer course in computer programming at the University of Oklahoma. An avowed fan of Tolkien’s Lord of the Rings, he became enamored of a pencil and paper game called Dungeons & Dragons (D&D), which is set in a similar fantasy world. Not long after, Garriott started taking the rules of the game and programming them into a computer.
The game of D&D is a nearly perfect match for the computer. It involves ability scores, numbers that are assigned to different human attributes such as strength, intelligence, and dexterity (data). These numbers are then used to calculate modifiers to determine chances of success (probability) to perform imaginary actions in the game, such as hitting an orc with a sword, deciphering ancient mystical ruins, or climbing up the side of a steep tower. As a player attempts to take certain actions, dice are rolled, with these rolls—modified by said attributes—determining success or failure. Simple computer equations and algorithms could easily simulate dice rolling (randomization) and data calculation.
Garriott created his own fantasy world based on the concepts of D&D and completed the game Alakebeth: World of Doom while still in high school. He then got a job working at a software store where the owner proposed to sell Richard’s game, saying that it was actually better than the games in the store. Eventually, the game sold 30,000 copies. With Garriott’s $5 per game share, he earned $150,000, which was more than his father, a NASA astronaut, earned in a year at that time (Bebergal, 2020).
Any type of programming involves taking human concepts and ideas and boiling them down to numbers, logic, rules, and, finally, output that can be interpreted by a human user. To begin our study of Computer Science, we are going to take a look at the core concept of data.
Data, Information, and Messages
Data consist of raw numbers and symbols that can be arranged into meaningful information, which is then used for specific research or analysis purposes. To a computer, data are numerical and the basic data storage system is the binary numbering system. While human beings are used to using many symbols to represent information, including letters of the alphabet and decimal numbers, computers are restricted to representing every single kind of information imaginable as ones and zeroes. If you asked a computer to remember how you liked your coffee, it would store this information as something like this: 01011100. To you, this would be meaningless, but this combination of zeroes and ones could easily represent sugar, cream, size, and temperature of the first drink you have in the morning. Your favorite song, to a computer, would be a longer string of ones and zeroes, as would a picture of your grandma, your résumé, and that cat video you love so much.
The reason for this is simple: when computers were first designed nearly a century ago, the only way to store different states that represent information was to turn switches ON or OFF, which corresponded to the numerical values 1 and 0. In fact, this is still represented today in most digital devices: the power button is a zero intersected by a one.
If we put a series of switches together, we can represent more combinations and therefore more information. For instance, placing eight binary digits together gives us a way to represent 256 different combinations; these 8 digits together are called a byte. A byte could then be used to represent a letter of the alphabet, a specific color, or different application preferences. The type of information represented to a computer depends on the context.
Context is Everything
Without context, even the letters you are reading right now are meaningless. However, once the rules of the English language are applied, these letters form meaning. We can use the same alphabet to represent other information in other languages or codes. Without context, data are completely meaningless, especially binary data, which are merely a long progression of zeroes and ones.
ASCII
As mentioned earlier, binary can represent letters of the alphabet. Using a byte for each letter, the English alphabet can be encoded into a data file. Fortunately, we don’t have to create our own encoding rules for this—someone already did that years ago when they formed ASCII, the American Standard Code for Information Interchange. For example, the letter “A” in ASCII is represented as:
01000001
Since a capital “A” is distinct from a lowercase “a,” there is a separate code for it: 01100001. This way, a computer can represent all of the letters of the English alphabet along with various punctuation symbols. Using this context, we can now communicate with letters. You may have realized at this point that some data translation must take place. Humans do not input data in binary (at least, they probably would not find it very efficient) so computers must translate to binary to store the data—and then translate them back to show them to humans again! This is a good introductory concept of how computers work. They are constantly translating back and forth between human context and binary.
Image
Another context that data can take on is a graphical image. In the 1950s, Russell Kirsch invented the pixel (Ehrenberg, 2010). He decided that the simplest way to divide up the data in a photo or image was to demarcate it into discrete squares—that is what we have used to this day. It was a simple solution and worked well enough that humans could recognize photographs on a computer screen.
The concept behind pixel images is to break any image up into small squares, so that each square would then have one (and only one) color. If the squares are small enough, the human eye cannot distinguish them individually, and so our brain will see a photographic image. Pixels are used in all types of images including fonts, icons, and graphics, along with photos. How are these data stored? The basic idea is that each pixel has a sequence; the image could start at the top left for the first pixel, and then go across the first row in that order until completing it and then jumping down to the next row of pixels. Since the order is implicit, there is no need to number the pixels. Each pixel has a color as its basic data, so we must represent each separate color as a binary number. If we use eight bits (a byte), then there are 256 different colors we can represent uniquely. This is called 8-bit color.
In earlier computers, the VGA graphics system used 8-bit color schemes where the first three bits represented red, the next three green, and the last two were blue. This is where the term RGB comes from for color displays—these three colors of light,in combination, can create any color visible to the human eye. However, with only 8 bits, there are only 256 possible color combinations to represent out of the potentially infinite grades of colorization.
If we combine the most intense red and the most intense green with no blue, we will get a bright yellow color.
Music can also be stored digitally (which simply means “as digits”) in binary. In the same way that we chop an image into little squares (pixels) we can cut a sound or song into tiny slices of time and then string them back together. If the slices are small enough, our ear cannot tell the difference in much the same way that the eye cannot see very tiny pixels.
A document in Microsoft Word, for instance, is also stored completely as ones and zeroes. The context for this data includes some binary codes that stand for formatting, such as bold and italic, and others that stand for page margins, font color, other settings, and, of course, binary code that stands for letters, numbers, and punctuation. MS Word, in a sense, has invented its own code for storing data in a .docx file. When you open a Word document, the binary gets translated back into what you view on the screen so you can begin to read and edit it.
As a programmer, your job will be to decide how to encode and store data and give this data meaning and context so that both a human user and the computer system can understand it.
Data Messaging
Computers use different hardware components to perform tasks and must communicate using binary messages. Computer components have a special list of commands that indicate information and instructions to each other; these are sent primarily through the motherboard or through data cabling. Computers also send messages to each other through networking.
Now that we have seen how to store letters, images, music, and documents as binary data, we will examine the process of sending these data over a network. The actual process of sending data is complex—there are many stages that include establishing and verifying a connection, agreeing on the mode of communication between computers, preparing the data to be sent, and then, finally, sending it and performing error checking to make sure it has arrived properly.
For our purposes let’s say you’re going to send a picture of a cat over the internet to your grandma. The cat picture, of course, is just zeroes and ones. But we have to make sure they get to grandma’s computer. Here is a simplified demonstration of that process.
You see the photo of the cat on your screen and save it to a file. Then you send it to your grandma, say, through email. But to send it to her, you have to have two addresses; both TO and FROM fields need to exist for delivery. Once you fill this in, everything is converted to binary: To, From, and all the data of the cat picture. It is sent out over the network to Grandma’s computer, which then translates it back so she can see that she has received a cat photo from you. Once the photo data is received, the To and From data are stripped away, as they are no longer needed. Grandma opens up the file, and the computer translates the binary data into a viewable photo once again.
Software, Firmware, and Hardware
Three types of components exist that work together to enable a computer to function. The most visible component is the hardware; it consists of all the physical components of a device. For our purposes in this unit, we will examine the modern personal computer. Other types of computers also use a similar hardware architecture.
Hardware
The Central Processing Unit (CPU) is often referred to as the “brain” of the computer. It does all the processing, which includes doing calculations and sending instructions to other hardware. It is a square computer chip which is inserted into a motherboard—this is a circuit board designed for computer components to connect to and communicate through. Computers need memory storage. A personal computer will need both long-term and short-term memory storage.
One analogy to use is actually human consciousness. We process with our brain, but we also store things in our memory. We have short-term and long-term memory.
We all have short-term memory; it contains the things you are thinking about right now. How many things can you think of at once? It may seem like a lot, but let’s compare it to your long-term memory. Your long-term memory stores everything that you know. You can’t possibly think of all of those things at once; they won’t fit into your short-term memory. A computer works the same way. Everything that the computer knows (data) is stored on the hard drive. Whatever you are currently working on is moved into RAM, the short-term memory, while it is being used.
Let’s say you are working on your résumé. At that moment, the computer is using an application such as MS Word to edit it. While you’re working on the document the data in your resume as well as the Word application are taking up RAM (short-term memory). Once you savethe document, a copy is taken from the RAM and placed in the hard drive which is long-term storage. When you’re done working, you close Word, which means RAM is freed up to perform the next task. Since you’ve saved the file to long-term storage, you know it will be there when you want it again. Just like in real life, when you finish the résumé you stop thinking about it and start thinking about the next thing you are going to do. Your short-term memory clears out space, and you think about the current task. Most modern computer systems follow this model.
You might wonder what the need for RAM is when the hard drive works perfectly well as a storage medium. The short answer is that RAM is a lot faster than the hard drive in terms of memory accessing speed and reading and writing data. It serves as a middle stage between the CPU and the hard drive to provide data quickly for calculations and other uses.
Another comparison is a library. The hard drive is the entire library, whereas the RAM is your desk. It has a smaller data size than the hard drive. You may want to look at one book, or a few at a time, but only so many will fit on your desk. Besides, you can only really look at a few books at a time due to the way your brain (CPU) is structured. When you’re done with the current books, you put them back into the library, freeing up the desk (RAM space) so you can put some different books on it.
Firmware
Before we discuss firmware, we must briefly explain software. Software is data; it is represented in binary. The difference between normal data and software is that it consists of instructions to the computer—in other words, it is a specialized type of data. Some software is optional or can vary, but other software is required by the hardware for essential functionality. This type of software is known as firmware. When you first power on a computer, it uses special firmware chips to boot up.
Boot up process
RAM is volatile—this means that it only preserves data when it has electrical power. This is why in years past (before laptops with batteries became more popular) if you lost power while working on a computer project, it was gone unless you had saved it to the hard drive. When a computer first starts up (the boot process), the RAM is emptym and the computer is waiting for instructions on what to do. It finds those instructions (software) in a special chip called a BIOS, which stands for Basic Input
Output System (Tarnoff, 2007). It is also referred to as ROM or Read Only Memory. This means it is static and cannot normally be written to. (There is a special process to update these chips.) This is how firmware is designed to be: stable and not easily changed. These instructions are what the computer follows when first it starts up. They are always the same, so they are programmed into a stable chip that retains the data even without electrical power (a non-volatile chip). A modern computer first runs the instructions on the BIOS chip for booting up, which includes identifying all of the hardware attached to the machine (hard drives, amount of RAM, graphics card, etc.). Once that is done, it finds the operating system (OS) on the hard drive or other long-term storage device and begins the process of copying it into RAM. That is what you are waiting for when a computer is “booting” up. You stare at the logo for your computer until this process is finished and you can interact with it.
Firmware describes computer chips that have instructional software on them that remain the same. They are either used for booting up a computer or electronic devices that always perform the same task, such as your smart refrigerator or a child’s toy. A single-task device does not need RAM to run different applications. Advanced personal computers can perform many tasks and need the RAM and extra hard drive storage to do so.
Software
Software, as mentioned above, are data that specifically tells the computer to do something—in other words, instructions. (Remember, all computer data is binary.) There are two main types of software: operating systems and applications. Both types of software are updatable, but apps are optional; this means they are not stored on permanent ROM chips, but long-term storage such as hard drives that can be written to.
Operating systems
An operating system (OS) is the first software loaded into the RAM during the boot process. An OS is essential for a computer to function (assuming a standard personal computer). Examples include Windows, Linux, and MacOS. This software provides three fundamental functions for the computer:
- Provide an interface
- Control hardware
- Run applications (apps)
An interface is required so a human being can interact with a computer. The hardware in a computer system would sit, dormant, without the OS to give it instructions. Applications need an operating system to run. They are a second layer of functionality. This is why when you get an application you must get the application for Windows or for MacOS; they are designed to work with a specific set of instructions that are part of the unique OS.
Applications
Applications (apps) are used to perform specific tasks. If you want to write a document, you load an app. If you want to play a game, listen to a song, edit photos, or create music—all these are tasks performed on a computer that rely on apps. You open the app (loading it into RAM) and use it for your task. When you’re done, you close or quit the app. Apps provide specific instructions to the hardware to complete your task, which is why they are also considered software.