Algorithms and Data Structures (Part 1)Article index:
For the last 20 years, Google has been the main search engine used on the internet, accounting for over 90 percent of the market share internationally. (StatCounter.com, 2020.) But how does it work? A Google search works by a set of rules called an algorithm.
An algorithm is a set of rules or instructions for completing a task. This applies to human beings as well as computers. The vast amount of data that is available on the internet poses a challenge to anyone who would want to search through it. Google has established a method to search through this data with excellent results.
According to Google, there are actually several algorithms working together to achieve the results for an internet search. One of these is used to determine meaning: Google can substitute synonyms for words in its search such as “move” and “relocate” to find similar matches. Another algorithm is used to analyze relevance. The keywords typed in the search are matched to the page—the better the match, the higher the search results. The search algorithms for Google also incorporate a quality assessment of the website itself in terms of expertise, authoritativeness, and trustworthiness on a given topic (Google, n.d.).
This seems like a good start, but the algorithms go even further. The page will also be ranked for “usability,” which means how well the page displays in different browsers and how quickly the pages load. Finally, the algorithm includes personal information about the user that is searching. If that user lives in Barcelona and searches for “fútbol,” Google will prioritize local teams and FC Barcelona would be a higher result compared to results for someone in Berlin.
Of course, the exact details of Google’s algorithms are secret, like the secret spice recipe for Kentucky Fried Chicken. It is, after all, the most popular search engine in the world, and Google would not want anyone stealing their market share by implementing the same algorithms.
Is the algorithm perfect? No. New sites take a while to start to appear in search results. There is also a chance that a good site match for a particular search does not come up in the first page of searches. However, it is a good enough algorithm that it is the top choice of most web searchers around the world.
Often in data processing, there will be a data space that is extensively large, like the internet, that poses a challenge. Certain mathematical problems also have a vast number of solutions and can be difficult to manage. Often, it is impossible even for a computer to search through every possibility (at least, if quick results are desired) and an algorithm will be designed to give a “good” result quickly instead of a “perfect” result that takes much longer.
Algorithms and Flowcharts
An algorithm at its most simple definition is a set of instructions. This works for computing applications as well as human beings. Let’s say that you want to tell someone how to fix a broken lamp. You’d probably ask the usual questions like, “Is it plugged in?” and “Did you replace the light bulb?” However, to put these into a set of instructions, you’d have to organize things a bit more. You’d do something like this:
- Check to see if the lamp is plugged in. If it isn’t, plug it in. If it works, you’re done. If not, go to step 2.
- Check the lightbulb. Is it burned out? If so, replace it. If it works, you’re done. If not, go to step 3.
- The lamp is broken. Buy a new lamp.
This is a perfect example of an algorithm. Computer algorithms are especially useful for solving recurrent problems.
Flowcharts (Flow Diagrams)
To translate human instructions into computer instructions, coders have used flowcharts for many years. These charts make it easy to observe the structure of an algorithm and create it with a programming language. They use standard symbols to indicate different types of instructions, as shown below.
A terminator symbol (oval or rectangle) is used at the beginning of a flowchart and at all the endings. They represent the external nodes of flowcharts. A rectangle represents a process, for instance, some type of data calculation or filtering. The diamond is a decision—a decision has multiple pathways that can be taken. A parallelogram in a flow chart represents input or output, where data are sent into or out of the process. Finally, the arrows represent the flow from beginning to end of the sequence.
There are many other symbols that can be used in a flowchart, but these will suffice for now. If we return to our example about fixing a lamp, we can use a flowchart to illustrate that process.
The flow starts at the beginning; from there we just follow the arrows along the pathway chosen. You can see that the beginning is a terminator with our starting point: “Lamp doesn’t work.” From there we go to a decision asking if the lamp is plugged in. There are “yes” or “no” options here. These represent binary thinking in computers where a “yes” can be symbolized by a one and a “no” by a zero.
Let’s take a look at a computer-based example of a flowchart. For this example, we will use a simple key code door lock. The lock works like this: the user has three attempts to enter the correct four-digit code to unlock the door; if they fail three times, the door will be locked for ten minutes and will not take any more input.
This illustrates the algorithm for our door lock. At the beginning, when the lock is activated we move from start to the processing step. This is where the counter variable, attempts, is set to zero. Then the input step shows that the four-digit code is entered by the user. Then a decision is made: is the four-digit code correct? If so, the door is unlocked. So far, so good. If it’s not correct, what happens next? It flows to the next decision. To a human, we would ask the question, “Is this the third failed attempt?” However, a computer is different. We have to use the variable “attempts” to count them. We check the value of attempts to see if it is “3.” If it is, then the user is out of attempts and the door will lock for ten minutes. If they haven’t reached three, the flow will loop back to where they enter the code again. Notice that every time they enter the code, the counter is incremented by one. This flowchart, which represents an algorithm, can be used as a guide to create programming instructions for a computer.
Simple Data Structures
Data are often entered into a computer sequentially, meaning in a specific order. Think of this process as similar to what you are doing now; you are reading letters and words in sequence. Simple data types such as “integer” and “char” will store one unit of information, but to store them in a sequence they must be arranged in a series. There are different ways to do this in computer memory and each has its advantages and drawbacks.
A stack is a data structure that works under a specific philosophy: Last In, First Out. This is often abbreviated as LIFO. Imagine you are inputting data to a computer program—in this case, we will say it is a card game simulator. It will use a standard deck of 52 cards, and they will be randomized and placed into the stack.
Just like in real life, when you are playing cards, you can only draw the card from the top of the deck. That is the last card that was placed on top, so it is the last in, but the first out. Once the game is started, the order cannot be changed, the next card in play is always the one on top of the deck—or in our simulation, the stack.
A stack in computer programming allows for two basic operations: push and pop. Pop will take the top piece of data from the stack, which removes it. Push will add a new item to the top (and only the top) of the stack.
To populate the stack with data, we use the push operator, one element at a time. Once all of the data has been entered, it is read by using the pop operator. Note that the pop operation returns the value of the element and removes it from the stack. This is a very simple and efficient way to store data. Pushes and pops can be performed at any time to add or remove elements from the top of the stack. This structure has the advantage of taking up less space than more complicated data structures—however, it cannot be searched randomly.
RAM is called “Random Access Memory” because any part of it can be accessed at any time. This might seem obvious, but initially computer storage was only accessible in a sequence. This means that any time you wanted to read some data, you had to start at the very beginning until you got to the part you wanted. This is similar to reading a book. Last night you were on page 77 in the book. The next day you wanted to read it you’d have to start at page 1 and read your way all the way back to 77. Obviously, this isn’t very efficient. Random access means you can jump right to whatever page you want, in this case page 77. You could also skip ahead even to parts you haven’t read yet, if you wanted. Stack data structures can only be accessed sequentially, not randomly. Not only that, but the sequence is LIFO, giving you the last element first (imagine reading a book backwards!). There are times that this order might be useful (such as our deck of cards example) and times when it is not. Stacks are used because they take up less memory that other data structures. Basically this means that when you can use a stack for your purposes, you should.
A linked list is also a sequential data storage structure. Instead of being stored in a simple stack of data, each element is linked to the next. In other words, an element in the data structure linked list contains two pieces of information, the data and the link.
The last element in the linked list structure does not have a pointer to the next value, instead it points to null, which indicates that there are no more elements. Null is often used to end a series of data. The link in each list is a pointer, which is a memory location. It identifies the part of memory that contains the next element.
In many computer implementations, a linked list is unbounded. This means that there is not a specific number of elements that it is restricted to. (Of course, the size would have an upper limit based on the computer memory or the programming language structure.) This is useful when you are working with a series of data that you do not know the length of, including one that has a continually changing length.
One thing that linked lists can do that stacks cannot is insert data into the middle of the sequence. If you imagine the linked list as a chain, the chain could be temporarily broken and have a new link inserted, making it whole again. To do this with the data, we can simply change the pointers to include the new element.
Let’s say you have a linked list of integers.
he list is comprised of the elements 83, 23, 12, 99, 52, and 94. That means that the element containing the 12 also points to the next one containing 99. If we wanted to insert a new node with 37 (a link in the chain) into the list, we would simply have 12 point to 37, and then 37 point to 99 and now we once again have an unbroken chain.
Linked lists work well if you need speedy access to sequential data with the ability to insert new data into any part of the list.
One of the most useful and common data structures is the array. Though it takes up more memory than a stack, for instance, it provides random access for reading and writing data. Usually, when arrays are declared in a programming language, they are “bounded,” meaning that they have a set size. Linked lists offer similar functionality with an unbounded size. However, linked lists do not allow random access to elements. This means that, while arrays are more costly in size, they offer the most functionality of the sequential data structures.
The data in an array is arranged in elements, just like our other sequential types. However, each element in the array is given a specific index—a number that identifies it. The index is ordinal and starts at zero (in most programming languages.) This means that the first element is element zero. It may take a while to remember that computers start counting at zero, but there’s a good reason for it. When you are working with small amounts of data and fail to use zero as a counter, you are basically wasting that piece of memory. So, to be more efficient, zero is used as the first number when counting.
Let’s say you were going to store a list of customer names in an array. It would look something like this:
The array name is CustomerName, which will hold all of the elements. The first element is element zero  in the list. The number after the array name is called the index. The value stored in element zero is “Sandra Whitehall.” The advantage of this array is that you can access any element immediately. If you want to read or write to element 3 (the fourth element), you can go directly to it without having to read 0, 1, and 2 first.
Another advantage of arrays is that they can be multi-dimensional. If you think of a list of items, for instance, you think of a linear progression from start to end—a line, or one dimension. Adding a second dimension means that the list can be read in two directions. One way to imagine this is to think of a checkerboard. A standard checkerboard is 8 by 8 squares, giving us a total of 64. If you are playing checkers, the squares can have three possibilities for their contents: a red checker, a black checker, or no checker. Let’s store these as RED, BLACK, and NULL.
The top left square is the first one at 0,0—remember computers start counting at zero. This means there is a row zero and a column zero on the board. This array references the squares similar to coordinates. The square at 3,5 for example means column 4, row 6 (remember to think like a computer!). Let’s say we simply called this array “checkerBoard.” If a red checker was in square 3,5, we could say:
checkerBoard[3,5] = “RED”
We could store information for the entire checkerboard using this array, where every square would have RED, BLACK, or NULL stored in it as data. Then we could implement algorithms to move the checker on the board.
There are two types of moves, a regular move and a “capture” where you take one piece (or more) of your opponent’s. The checkers must only be on the black squares and therefore must move diagonally. If we have a red checker on 3,5, and we are facing the opponent (we are on the bottom of the board), then the valid regular moves are to go to 4,4 or 2,4. The algorithm would look something like this:
- Player inputs move choice.
- Can you move right?
- Move right is +1 X and -1 Y (Where X and Y are array indices.)
- Can you move left?
- Move left is -1 X and -1 Y
You would have to test if you could move by checking to see if you are at the edge of the board, and also check whether there was an opposing piece in the way. In other words, if the destination number for X or Y is larger than 7 or less than 0 you are not on the board. If the space you want to move into contains a NULL, it is fine, but if it contains a RED or BLACK piece, you cannot move there.
Of course, this would just be an algorithm for using a regular move on one piece. There would be a larger algorithm for what happens during a player’s turn, for example, which would include the rules for both a regular move and a capture move. Arrays make it easy to map out coordinates in two or three-dimensional space, like with checkers on a board.
More dimensions in arrays
There is no need to stop after only two dimensions, but keep in mind that for every dimension you add, you are raising the complexity of the data by an exponent. For instance, our checkerboard takes up 8 x 8 elements of space, for 82 or 64. If we made it a three-dimensional cube, 8 x 8 x 8, it would be 83 or 512 elements. As a
There is no need to stop after only two dimensions, but keep in mind that for every dimension you add, you are raising the complexity of the data by an exponent. For instance, our checkerboard takes up 8 x 8 elements of space, for 82 or 64. If we made it a three-dimensional cube, 8 x 8 x 8, it would be 83 or 512 elements. As a programmer, you must be careful never to waste precious memory and only use what you need.