Excel 365: Validating, Cleaning, & Performing Lookups on Data

This is a guide on validating, cleaning, and performing lookups on data.

C++ is among the best languages to start with to learn programming. It is not the easiest, but with its speed and strength, it is one of the most effective. This small study book is ideal for middle school or high school students.

 

Using Absolute Cell References and the $ Symbol

[Video description begins] An Excel Workbook appears. [Video description ends]
Let's now change tracks and start working with a new data set with a focus on formulas. The data set is taken from a kaggle URL, which you see on screen now. And it contains a sampling of ecommerce data.

[Video description begins] The url reads: Data Source URL: https://www.kaggle.com/carrie1/ecommerce-data/data. [Video description ends]

Let's start with a simple operation down below. Let's go ahead and compute the sum of all of the data values in column D. So we add a self explanatory note and then we make use of the Excel built in SUM function. And we use the mouse in order to choose the input arguments, the cells D2 to D39.

[Video description begins] The sum function reads: =SUM(D2:D39). [Video description ends]

Once we click out of this cell, we can see that the sum of all of the quantities is 661. Next, let's go ahead and try an operation which will demonstrate why we need absolute cell references. Let's insert a column and

[Video description begins] To insert a new column, select the last column and then right click on it. Select Insert from the shortcut menu that appears. A new column before the last column gets inserted. [Video description ends]

this column is going to have a tax calculated as 5% of the sale value. This is going to require us to add in a new row so we insert a row as well.

[Video description begins] To insert a new row, select the first row and then right click on it. Select Insert from the shortcut menu that appears. A new row before the selected row gets inserted. [Video description ends]

And then up above in cell G1, we type in the tax rate which is 5%. This is a handy way of having the tax rate in a separate cell so that we can easily modify.

Now down below, we are going to compute the tax as the product of the unit price and the quantity and the tax rate. So F3 multiplied by D3 multiplied by G1, so far so good.

[Video description begins] The formula reads: =(F3*D3)*G1. [Video description ends]

When we apply this formula only to the data in row 3, the tax computed is correct, it's 1.02. However, when we try and copy this formula down into the other rows, we get an error. Or to be a little more precise, we get a series of errors in all rows other than row 3. Let's examine what went wrong. Let's click into the cell G4. And there we can see the problem is in the formula bar. Here the tax computation incorrectly multiplies F4, D4, and G2. Now this is something which we can see in the formula bar

[Video description begins] The formula reads: =(F4*D4)*G2. [Video description ends]

right up above the data area, and the error here lies in G2. One other way of checking this is by clicking on the Formulas menu item up top. And then scrolling over to the right and clicking on the little button titled Error Checking.

This itself has many different bits of available functionality, we can run Error Checking or Trace Error. For now, let's do a Trace Error. This very clearly highlights all of the different cells in the workbook which have gone into the computation of a formula. Note that this applies to the currently active cell. Here, because we ran trace error with the cursor positioned in cell G4, we are going to see what it was that caused the error in cell G4. And by looking at it in this color coded fashion, it's quite obvious that the problem lies with the tax rate. We ought to be using cell G1 and not cell G2. Likewise, if we scroll down, we can see that all of the other values in column G are incorrect as well. The only one which is correct, is the one in cell G3. All subsequent tax calculations are wrong, because as we copy pasted the formula down the reference to the tax rate did not remain locked to cell G1. So now let's go ahead and fix this.

Let's go ahead and delete all of the incorrect values in cells G3 to G40 by right clicking and hitting Clear Contents. Then let's go to cell G3, and there let's simply lock the value of cell G1. The way we do this is by typing out the dollar symbol before the row as well as the column identifiers.

[Video description begins] The formula reads: =(F3*D3)*$G$1. [Video description ends]

Now you should know that the dollar symbol can be applied individually to both the row and the column identifiers. So it's perfectly okay to have a dollar symbol only before the G but not before the 1. Here of course, we actually would like to refer to cell G1. That's the absolute cell reference containing the tax rate. So we use the dollar symbol ahead of both parts of the cell address. Also, please note that on Windows machines, you can use the F4 keyboard shortcut to quickly turn a cell reference into an absolute one. So now the formula reads F3 multiplied by D3 multiplied by $G$1. This formula is correct. When we copy this formula down, all of the other tax computations are going to be correct as well. We can see by clicking around in the values in column G, that the references to F and D are both relative.

So for instance in the cell G9, the references to F and D point to F9 and D9 respectively, that's correct.

[Video description begins] The formula reads: =(F9*D9)*$G$1. [Video description ends]

The reference to the tax rate however, is constant, and always points to the cell G1. And even more elegant solution for situations like this is the use of named ranges. We've encountered named ranges once before. Let's go ahead and define a named range. This is simple enough to do we click on the Define Name menu item up top.

[Video description begins] The New Name dialog box appears. [Video description ends]

Here we are prompted for the name, let's call this tax rate. We have the scope which is workbook and then what this name range refers to. This correctly points to cell G1 of the orders received worksheet. Notice that G1 is an absolute cell reference. There are dollar signs before the G and the 1 respectively. Let's hit OK, this named range now comes into existence. And we can now see that in the cells or the data in column G, we can replace the reference to G1 with a reference to the tax rate.

Let's clear the contents of all but the first row and then edit the formula in the first row and replace the $G$1 with the named range tax rate. We can see that as we start typing out, Excel has an autocomplete. It offers all possible named ranges or formulae, which begin with the letters we have just typed out. At this point we have typed out Ta, so Excel identifies two possible candidates which are functions. These are TAN and TANH, that's the hyperbolic tangent. But what we're really interested in is the tax rate. Also notice how there are slightly different icons for formula versus named ranges in the autocomplete menu.

[Video description begins] The formula now reads: =(F3*D3)*TaxRate. [Video description ends]

Let's go ahead and select this named range and the formula is correct. We can now copy paste this formula down by dragging it in the usual fashion. Once again the tax rate calculation is correct. This little demo got us started with the use of formulae. It also got across the important differences between relative and absolute cell references. As we saw, if we make a mistake and we make a reference relative when it should be absolute problems will arise when we copy paste that formula into adjacent cells. We also saw how named ranges can be used as an elegant proxy for fixed or absolute cell references.


Using the Subtotal Feature

In this demo, we'll pick up right from where we left off at the end of the last demo, and move on to an explanation of subtotals in Excel. Let's begin by inserting a new column. This column is going to hold the total price, which can be computed very simply as the sum of the tax, plus the product of the unit price and the quantity. We go ahead and express this in terms of a formula. The tax is G3, the unit price is F3, and the quantity is D3. So the total formula is G3+(F3*D3).

[Video description begins] The formula reads: =G3+(F3*D3). [Video description ends]

Note that here, all of this data is row specific, so all of these references should be relative. Let's go ahead and copy this formula down. And we have now successfully populated all of the rows in column H, which need to have the total price. Let's add just a little bit of formatting to make this sheet easier to understand. And once we are done with that, let's turn our attention to the subtotals. This is done by clicking on the Data tab that's right up top, and then we click on the Outline button over on the extreme right.

We've already seen how to make use of the Group and the Ungroup functionality here. It's time for the Subtotal button, so let's go ahead and click that. Please note that the Subtotal button ought to be pressed down with our data already selected, and you can see that that is already the case. This brings up the Subtotal dialogue. Here we can see up top that we are asked for each change in, and then we specify a column name. So we can change that default from InvoiceNo to InvoiceDate, use a function. And then there's a second drop-down with a long list of aggregate functions, let's go with the Sum. We are then prompted to add the subtotal to, and now comes a list of all of the columns with checkboxes next to them. Currently, only the Country column has a checkbox next to it. Let's go ahead and check a few other columns. This is going to have the effect of computing the sum for the total price.

The Country column, of course, is not numeric so subtotaling the Country column wouldn't make a whole lot of sense. Then come three checkboxes, the first of these will replace current subtotals. The second, which is not checked, would have added a page break between groups. And the third, which again is checked, will place a summary below the data. At this point, let's go ahead and hit OK. And immediately we can see that there is a big, dramatic difference in the appearance of our workbook. One thing that's changed is that there is now a level of outlining over on the extreme left. So the subtotal wizard has created three levels of outlining. There is now one grouping for every set of rules based on the invoice date. And that's because in the subtotal wizard, we had asked for the subtotaling to be carried out for every change in the InvoiceDate field. Now, the grouping and ungrouping functionality did not actually change our data in any way, but the subtotaling wizard does.

You can see, for instance, now that row 7 and row 12, and row 15 have been inserted by the subtotal wizard. Each of these displays a subtotal of the rows immediately above it. This is in column H. And that's because we had asked the subtotal wizard to print out or to subtotal the total price for every change in the InvoiceDate. Finally, in column E, in each one of these inserted rows, the subtotal wizard has added a self explanatory note. Let's use the + and - buttons to expand and collapse the format of our spreadsheet. We can see, for instance, that we can easily reduce the amount of detail that has been displayed so that we only have the subtotals for each invoice. We can quickly see in this way that some of the invoices are relatively large. For instance, the one on row 15 is for $320. We can go through and collapse each of these groups of invoices, until we only have a summary view. And then down below is a Grand Total. That Grand Total is $1163.652.

We can see from this how useful Excel's subtotals are. We can zoom in and zoom out of the data at our convenience. However, there is one important point worth keeping in mind. Like the auto outline wizard, which we had worked with previously, subtotals will only work when there is already some discernible structure that Excel can work with. Here for instance, let's say we had tried to subtotal on the invoice date. And let's say that the invoice date had been a primary key. Every row would have had a different invoice date, and then the subtotals would not have made much sense. And one last but important note. Because the subtotaling wizard adds a lot of additional rows into our spreadsheet, the correct way to eliminate this is, again, by making use of the data outline menu item. Click on Subtotal, and there choose the option to Remove All subtotals. That's in the bottom left of the dialog box. This will clean up all of the subtotal state and revert the workbook to the way it was before we calculated the subtotals. Please do not attempt to delete the subtotals manually. It's a lot of work and not worth the effort.


Using the Filter Feature

Let's now turn our attention from subtotals to filters. We're working with the same data as in the previous demo, but this time we've clicked on the Filter button in the Data tab. You can see that as soon as we click on it, a small drop down appears next to each of the column names. This is visible on row number 2. These small drop-downs will allow us to filter data based on specific values. As an example, let's click on cell E2, this is the header row for the InvoiceDate column. We can see that this is also a way to sort data, either alphabetically or reverse alphabetically or using cell formatting. Down below are a bunch of filters. These allow us to check or uncheck specific values which are present in that column of our data. By default, all values have been selected and that's why there's a checkbox next to select all. Then comes a list of all unique values of InvoiceDate. We can go ahead and uncheck the Select All, and then specifically only select those invoice dates that we are interested in.

Please note that the filters across different columns interact with each other. Also the formatting of the columns makes a difference. So let's go ahead and ensure that the formatting of our dates is the way we'd like it to be. Then come back to this drop down. You can see that there is now an increase in the number of uni invoice dates. Let's go ahead and select a subset of these dates. So we uncheck the Select All button and scroll down and pick a handful that we are interested in. Once we hit OK, as we change the selections here, a different set of rows is going to be displayed in the spreadsheet down below. Now you can see that we can only view those rows which have invoice dates from the list we've selected. We can also see up above that cell E2 looks a little different than all of the other row headers.

The symbol next to the InvoiceDate field name clearly shows that we have applied a selection on this column. Another important point to note about filters is that they do not change the original row ordering. So for instance, if we look closely at the row numbers which are visible all on the extreme left, we can see that all of the filtered row numbers are visible in a blue font. The important bit is that all of the other rows are simply hidden. It's not like the row numbers have changed as they would have in the case of subtotals. This means that we can reliably use references into filtered data from outside our data range. This is not something that we could do with subtotals. Let's see this in action. Let's calculate down below a sum of all of the sales for the year 2010. This can be done using the current value of the filtered cells because all of the invoice dates are from 2010. In this case, our task is further simplified by the fact that our data was sorted by invoice number.

And that's why all of the 2010 invoices appeared next to each other. So we could simply use our mouse to drag the input arguments into the sum function. Then we hit Enter on this formula, which is the sum of H22 to H40.

[Video description begins] The formula reads: =SUM(H22:H40 ). [Video description ends]

The computed value is $450.093. This is the sum for 2010. Let's now find the corresponding sum for 2011. To do this, we change the selection on the invoice date filter. We start by checking the Select All box, then unchecking it again and then manually going through and selecting only the invoices from 2011. Note that we do not select the 2010 invoices this time. We hit OK. And now our data still resides from row 3 to row 40. No change there. It's just that all of the rows after row 21 have been hidden. And we can see this if we look closely at the row numbers in the extreme left, all of the filtered row numbers appear in a blue font, and then there are two small horizontal lines telling us that the remaining filtered data is hidden over there.

We can now go ahead and use the SUM function to calculate the total sales for 2011. We'll do this as before, once again, we are careful to stop our selection at row 21.

[Video description begins] The formula reads: =SUM(H3:H21 ). [Video description ends]

This means that we will not include any of the hidden rows. And this sum works out to 713.559. The important bit, which is worth noting is that the sum for 2010 has not changed. In other words, we still have the correct sum for both years in our data. Let's use this as an opportunity to introduce the idea of circular references. A circular reference results when a cell directly or indirectly references itself. On screen now, let's intentionally introduce a very direct and explicit circular reference. We are currently in cell B46. There, let's intentionally express B46 as the sum of B46, and another cell from the shipping revenue staff.

Now, of course, this is not possible, there's no way for Excel to compute the cell in B46 in terms of itself, and

[Video description begins] The formula reads: =B46+ShippingRevenues!B3. [Video description ends]

so this is going to result in a circular reference. As soon as we hit Enter, we get the warning message that we see on screen now. Excel is telling us that there are one or more circular references where a formula refers to its own cell either directly or indirectly. This will make the calculation of that cell incorrect. Let's go ahead and hit OK. We can see that there is a value of 0 placed in cell B46. That's a clear indication of trouble, and there is further indication of error in the toolbar at the bottom left. Excel is telling us that there is a circular reference in cell B46. This is a clear indication of trouble. Usually, circular references are signs of logical errors in the way a complex spreadsheet has been set up. This circular reference in our example is quite simple to eliminate because it's a typo.

Cell B46 contains a reference to cell B46, that's simple enough. But let's say we had a complex calculation which inadvertently introduced a circular reference, debugging it could have been quite tricky. In any case, let's move on, let's use an option available from a Formulas toolbar in order to help mitigate this. We click on Error Checking, Excel error checks are not able to pick up circular references. And that's because circular references introduce warnings rather than errors. The error check would catch errors like hash NA or hash value. Here in cell B46, we have a legitimate value of 0, that's not an error. So let's try again. Let's click on the little arrow next to error checking. There down below is a menu item titled Circular References. And when we click on it, we can see very clearly that cell B46 has a circular reference. It appears there as an absolute reference with the dollar signs and a little check box next to it. In any case, it's easy enough to fix this circular reference.

We can do so by simply undoing our last typing. This can be done by clicking on the undo button on the Quick Access toolbar on the top left. When we do this, the contents of cell B46 return to their original correct value of 450.093.


Checking Conditions Using IF()

In this demo, we will introduce the use of the If function. If function in Excel is a great way of producing true and false values based on whether or not a predicate, that is a logical condition, is satisfied or not. The data set that we will be using is on screen now.

[Video description begins] The URL reads: https://www.kaggle.com/loveall/clicks-conversion-tracking. [Video description ends]

It's sourced from kaggle. And it has to do with e-commerce data, specifically with the results of an online marketing campaign. Conversion tracking relates to a website feature that can be turned on in tools such as Google Analytics to see the interaction of users with a website or other online content. Let's begin by going ahead and introducing a new column. So we scroll over and select a column, and then choose the Insert option from the right-click menu. Did a particular campaign attract many clicks or not? .That's what we'll try and measure in this column. So we title the column Many_Clicks, and then inside cell I2 we go ahead and add a formula, which will return true if the number of clicks on this campaign was greater than 175, and it will return false if the number of clicks was less than 175.

[Video description begins] The formula reads: =IF(H2>175, TRUE, FALSE). [Video description ends]

This is done using the IF formula and it references the number of clicks which is present in the corresponding cell in column H. The basic structure of the IF formula is simple enough, the first input argument is a condition. Here the condition is H2 greater than 175. Obviously, this condition could be either true or false. The second input argument is what this cell should return if the condition is true. And here of course if the condition is true, we choose to return TRUE. The third and last input argument is what should be returned if the first condition is false. Here if the first condition is false, we just go ahead and return FALSE. A couple of points worth noting here. The first is the use of the TRUE and FALSE Boolean keywords. These are special Boolean keywords present in Excel to represent the true and false values and are accepted by many functions which work with Boolean arguments. The other bit worth noting is the fact that the cell reference to H2 is relative. It is not absolute.

We have not locked it. We have not made use of the dollar signs. And that's because we would like to copy this formula down into the other cells in column I. And for that purpose, the corresponding cells in column H should be referenced. And that's why this cell reference is relative. So in this way, we've computed a column which tells us whether a particular add has secured many clicks or not. Now let's go ahead and add another column.

[Video description begins] To insert a new column, select a column and then right click on it. Select Insert from the shortcut menu that appears. A new column before the selected column gets inserted. [Video description ends]

So we scroll over to the right and insert a new column, column K, which is called Spent_vs_Clicks. This is going to make use of a nested if condition. Let's take a moment to properly understand the formula which we can see on screen now.

[Video description begins] The formula reads: =IF(H2<175,if(J2<300,TRUE,FALSE),FALSE. [Video description ends]

The formula bar contains an IF function. The first input argument into the IF function is a condition that condition reads H2 less than 175.

The second input argument is another if statement. So this second input argument is going to be evaluated if the condition is true, that is if H2 is less than 175. If this is the case that is, if H2 is less than 175, we then check whether J2 is less than 300. And if the answer to that question is true as well, this combined if statement will return TRUE. Otherwise, if the first condition was true and the second condition was false, then we return FALSE. So for values where H2 is less than 175 but where J2 is less than 300, this function will return FALSE. And finally, if H2 is not less than 175, so if H2 is equal to 175 or greater than 175, then this function will return FALSE. Let's go ahead and hit Enter. And the first evaluation that's the one in cell K2 evaluates to TRUE. That's because the value in cell H2 is 1, that's less than 175. That takes us into the first input argument where we evaluate the second or the nested if condition.

We check the value of J2, that's 1.4299999. And that is less than 300. So that returns TRUE. Once again the cell references are relative, we go ahead and copy this formula down into all of the other rows. And we can see that the corresponding values of true and false are calculated as well. Let's keep going. Let's go ahead and insert yet another row. This will demonstrate the use of a logical AND operation. So now we are going to have an if condition but then the first input argument into the if condition is going to be an AND. AND is a relational operator, it will take in one or more logical conditions and return true if all of those logical conditions evaluate to true. So here we pass in the two conditions, H2 greater than 175 and J2 less than 300. If the result of this AND operator is true, then our if is going to return Success. If the return value from the AND is false, this will return Failure.

[Video description begins] The complete formula reads: =if(AND(H2>175,J2<300),"Success","Failure"). [Video description ends]

In this case, the first condition in the AND evaluates to false because H2 is 1, that's obviously not greater than 175. So we immediately skip into a second input argument passed into the if and we return Failure. We can see this when we now hit Enter, the contents of cell L2 read Failure. Let's go ahead and copy this formula down. And we can see that because the cell references were relative, all of the formula values are adjusted to operate on the cells in the corresponding row. So for instance, the formula in cell L 1144 is going to operate on other data which is present in row 1144 as well. The logical AND operator will only return true if all of the logical conditions passed into it evaluate to true. As you might expect, Excel also has a logical OR operator. This is going to return true if any one of the input arguments passed in evaluates to true. Let's see this in action we've added a new column that's column M.

[Video description begins] To insert a new column, select a column and then right click on it. Select Insert from the shortcut menu that appears. A new column before the selected column gets inserted. [Video description ends]

We now make use of the OR operator. And we directly specify one or more logical conditions.

[Video description begins] The formula reads: =OR(H2>175,J2<300). [Video description ends]

Here H2 greater than 175, J2 less than 300. The first of these is going to evaluate FALSE because H2 is just 1 so it's not greater than 175. However, the second of these is going to evaluate to TRUE because J2 is 1.42999 and that's definitely less than 300. So when we now hit Enter, this is going to return a value of TRUE. And as always, we can go ahead and drag and drop this formula down into all of the rows in this column. The cell references are relative, and so each row will return a corresponding TRUE or FALSE value. And in this way, we have demonstrated the use of the IF function, as well as the use of logical AND then OR operators. These are extremely powerful because they can be combined and used with other functions in order to accomplish pretty complicated operations. And we got a glimpse of this when we made use of the nested if statement.


Using Conditional Aggregates

In this demo, we will move on from simple if operators with AND and OR conditions to conditional aggregation operators. You can see on screen right now an example of such an operator. Let's say we wanted to compute the sum of expenditure for all ads where the number of clicks was greater than 175. It turns out that to do this in Excel, we would simply make use of the SUMIF function. This would be a trivial way in which to compute this function. You can also see down below that there are more complex constructions possible with a function like this. We can use conditional operators to compute sums such as the sum of expenditure on all ads with more than 175 clicks, and where the number of folks who bought the product is greater than or equal to a certain threshold. The SUMIF function is one conditional operator. But there are additional operators available for count, average, and other functions as well. So let's go ahead and try some of these out. You can see now on screen that we've made use of the COUNTIF function. As the tooltip tells us, the COUNTIF function is going to take in a range and a criterion.

Here, the range that we pass in is simply all of the data in column I. So we specify using the mouse, the cells I2 through I1144. Now, the second input argument is a criterion. And here, the criterion can be any function which evaluates to true or false, or it can even be a specific value of true or false. So we just click on cell I2, which contains false, and that becomes our criterion.

[Video description begins] The formula reads: =COUNTIF(I2:I1144,I2). [Video description ends]

So the value in the formula, that's the one in cell F1149, is going to return the count of the number of cells between I2 and I1144 where those cells satisfied a specific criteria. What was that criterion where those cells were equal to false? And that number turns out to be 1096. Now, the COUNTIF function only allowed us to specify a single condition. Let's go ahead and try the use of something called the COUNTIFS condition. This function takes in multiple criteria and then returns the number of times all of those criteria was met. Let's now go ahead and attempt this with a function which will count the number of clicks where the number of impressions is greater than 5,000, and where the number of buyers is greater than 10.

This is implemented using the COUNTIFS function. The first criterion that we pass in has to do with the data in column G. So we have G2 through G1144. And notice how we specify a criterion in the form of a string condition. We enclose within double quotes the string greater than 5,000. Every cell in the range G2 to G1144 is going to be evaluated against this criterion. That does it for the cell range as well as the first criterion. Let's now go ahead and add a second criterion. We do this in the form of yet another cell range. This is all of the data in column O, so O2 through O1144. And then the corresponding criterion is the string formula greater than 10. Now if we go ahead and hit Enter, this cell, that cell F1150, is going

[Video description begins] The complete formula reads: =COUNTIFS(G2:G1144,">5000",O2:O1144,">10"). [Video description ends]

to contain a count of all of the instances where the number of impressions is greater than 5,000, and where the number of buyers is greater than 10. And that count works out to 6. This example demonstrated the power of string-based predicates. Let's edit the formula in cell 1150.

Let's change the second predicate so that instead of reading greater than 10, we change it to greater than, equal to 10. And we change the first predicate so that instead of reading greater than 5,000, we change that to greater than, equal to 5,000.

[Video description begins] The formula now reads: =COUNTIFS(G2:G1144,">=5000",O2:O1144,">=10"). [Video description ends]

Now when we hit Enter, the count increases, and in this way, Excel has clearly made use of the string formula. It has evaluated every cell in the corresponding ranges against the corresponding string formulae. This is a way in which Excel makes use of string-based lambda functions. This is a powerful construct from functional programming. We discussed the use of the SUMIF function, we actually made use of the COUNTIF function. Let's now move on and perform similar operations using the AVERAGEIF and AVERAGEIFS functions. The usage here is virtually identical to the usage in the previous conditional aggregate functions. We begin by making use of the AVERAGEIF operator. The AVERAGEIF operator is going to take in a range as well as a criterion.

The first input argument is the range, and here we are going to make use of the number of clicks. So this is the data in cell H2 through cell H1144. The second input argument is going to consist of the criterion. So here we specify a string criterion less than or equal to 175. Now, the first two input arguments are identical to those in the COUNTIF example, but there is now a third input argument. This one is called the average range. The average range is optional. This is going to be the cells which are actually going to be averaged. We could choose to omit the average range, in which case the first input argument is going to be used. But here we wish to calculate the average spend for those ads where the number of clicks is less than or equal to 175. That's why the average range in our example is J2 through J1144. We'll now go ahead and hit Enter.

[Video description begins] The complete formula reads: =AVERAGEIF(H2:H1144,"<=175",J2:J1144) [Video description ends]

And now in cell J1149, we do indeed have the average spend on ads where the number of clicks was less than or equal to 175. And that spend works out to $38.25. In exactly the same fashion, we can also make use of the AVERAGEIFS function.

AVERAGEIFS is going to take in, this is analogous to the SUMIFS and the COUNTIFS functions which we have already explored. This gets us to the end of our exploration of logical operators within Excel functions. Let's now move on to other topics involving the use of formulae.


One-dimensional Lookups With VLOOKUP() and HLOOKUP()

In this demo, we will turn our attention to some of the most important formulae in Excel. These are VLOOKUP and HLOOKUP. The reason for the importance of VLOOKUP and HLOOKUP is that these formulae provide us with a way to combine data which is present in different tables or in different sheets for instance. Let's take a look at the structure of this spreadsheet that we are working with. It has two tabs, GameDetails and PublisherDetails. You can see some of the columns in the GameDetails tab. These include the Name, Year_of_release, Genre, and the sales in different geographic regions such as North America, the EU, Japan, and so on. Crucially, the GameDetails tab does include the name of the video game, but it does not have the name of the publisher. If we now click over to the PublisherDetails tab, we see that this is a lot simpler. It just has the two columns, Name and Publisher. So the column name is common between both of our tabs and the values in this column are exactly the same.

Now the problem that we are trying to solve is how do we get the details for the publisher into the first tab? In other words, how do we combine the data present in these two tabs? And this is where VLOOKUP rides to our rescue. Let's go ahead and insert a column into the GameDetails tab.

[Video description begins] To insert a new column, select a column and then right click on it. Select Insert from the shortcut menu that appears. A new column before the selected column gets inserted. [Video description ends]

Let's call this Publisher. And then let's go ahead and invoke the VLOOKUP function. One way of doing this is using the Formulas tab. So we click on Formulas up top, and then click on the Insert Formula button way over on the top left. This brings up a dialog where we can search for a function. So we search for VLOOKUP and then we hit Go. Once VLOOKUP appears in the dialog down below, we hit OK. This is a way of invoking a function the first time you're using it when you don't really know how it's to be used. Over time as you get really familiar with the VLOOKUP function, you will find yourself just entering this into the formula bar right away. The first time though, it helps to have this wizard because VLOOKUP's function arguments can be a little hard to understand. Let's now take a moment to understand the input as well as the return value from VLOOKUP.

The V in VLOOKUP stands for vertical. And that's because as this dialog tells us VLOOKUP looks for a value in the leftmost column of a table, and then returns a value in the same row from a column that we specify. Let's go ahead and see how this works in practice. The first input argument here is the Lookup_value. The lookup value in our example is going to be the name of the video game. And that's because the name column is the one that's common between the GameDetails and the PublisherDetails tables. Note that we can preview the value of cell A2, we can see that it contains the words Wii Sports. Next, let's go ahead and specify the second input argument, which is the table array. This is the table that we wish to index into. For this, we navigate into the PublisherDetails tab, and there select both of these columns. It's really important that in whatever data range we specified for the second input argument that is for the table array, the first column must contain the lookup value.

Please note that at this point, we have not locked the cell references in the table array. We'll have to do that in a moment. For now, let's quickly move on and specify the other input arguments. The third input argument is the column index number which we wish to look up. This is the column index number in the table array. Here in the table array, the first column contains the name of the video game, and the second column contains the name of the publisher. And so, we would like to find the publisher corresponding to a particular video game. The publisher is in column 2, and that's why we have specified this to be equal to two. The last input argument the one called Range_lookup is maybe the most strange. This almost always takes in the value of false. Range lookup is going to tell VLOOKUP whether the match that is performed between the lookup value and the table array should be perfect, that is whether it should be an equality match or an inequality match.

Because the inequality match is almost never used, let's just leave that conversation out of the scope of this course and focus on equality matches. For equality matches, the last input argument should always be equal to false. At this point, we have entered all of the input arguments, let's hit OK. And when we do so we will find the publisher corresponding to Wii Sports as the return value, we get a preview of this in the preview pane, you can see that the publisher is Nintendo. And once we hit OK, that same value of Nintendo is going to appear in cell D2. Now we are almost ready to copy paste this formula down. There is however one important bug, and that bug is that we have not locked the table cell reference. You can see in the formula bar up top that the table reference consists of the cells A2 to B16 of the PublisherDetails tab.

[Video description begins] The formula reads: =VLOOKUP(A2,PublisherDetails!A2:B16,2,FALSE). [Video description ends]

But when we copy this formula down, if we scroll to a lower row, we can see that the table references on subsequent rows have also been shifted down. For instance, you can now see that our cursor is in cell D3. And the formula in cell D3 looks up A3, that part is correct. But it looks it up into the cells A3 to B17 of PublisherDetails, and that's wrong.

Neglecting to lock the cell references while invoking VLOOKUP, or even locking them in the wrong manner. These are rarely common sources of bugs in important Excel spreadsheets. So let's go ahead and fix this. We go ahead and clear the contents of the incorrectly copy pasted cells, and then we change the cell reference to make it absolute.

[Video description begins] To clear the contents of incorrect cells, select the cells with incorrect values. Right click and select Clear Contents from the shortcut menu that appears. [Video description ends]

We do this by adding dollar signs, we can add dollar signs very quickly by using the F4 keyboard shortcut. And now the cell references are absolute. We copy them down, and our invocation of VLOOKUP is now correct.

[Video description begins] The correct formula now reads: =VLOOKUP(A2,PublisherDetails!$A$2:$B$16,2,FALSE). [Video description ends]

In this way, we have successfully found the publisher corresponding to every video game in our game details database. Next, let's go ahead and demonstrate the use of the HLOOKUP function. For this, let's start by clearing the publisher column. We will then go ahead and recompute the publisher, but this time making use of HLOOKUP. This will also give us a way to see how we can reorient data while copy pasting it so that the rows and the columns are interchanged.

Let's go ahead and get started by clearing the contents of the publisher column, that's the first operation.

[Video description begins] To clear the contents of incorrect cells, select the cells with incorrect values. Right click and select Clear Contents from the shortcut menu that appears. [Video description ends]

Then let's head over to the PublisherDetails tab. Let's go ahead and select both rows of data there. We then hit Copy. We do this by right clicking and choosing on the Copy button. And, now things get interesting. Let's scroll down below and paste, but we are going to Paste Special. Remember that Paste Special comes with many different variants which include paste, paste values, and other paste options. For now, we are going to choose the paste transpose option. You can see Transpose is the tooltip and it has the keyboard shortcut T within brackets. Let's go ahead and paste transpose, this will interchange the rows and the columns. So now our data is laid out horizontally rather than vertically. And that's why we are now going to make use of HLOOKUP rather than VLOOKUP. Of course, H stands for horizontal and V stands for vertical. So this time we have a horizontal range which has just the two rows, but it has a large number of columns.

How are we going to index, so how are we going to look up the name of the publisher corresponding to a particular video game? We switch back over. This is now in the GameDetails tab, we click on the Insert Function button.

[Video description begins] The Insert Function dialog box appears. [Video description ends]

Once again, we make use of the lookup dialog. So we search for HLOOKUP, we hit OK, and this brings up a dialog where we can enter the input arguments. Input arguments are virtually identical to those for VLOOKUP. The only difference is that the table array is going to be horizontally aligned rather than vertically aligned. So we start by specifying the lookup value, that's the name of the game Wii Sports. Then for the table array, we've got to specify the horizontally aligned table array. What's really important here is that the lookup value must occur in the top row of the table. Remember that in the case of VLOOKUP, the lookup value had to occur in the leftmost column of the vertical array. Here the lookup value has to occur in the top row. You can see here that the data range that we have passed in as the table array consists of PublisherDetails, cells A22 through P23.

The row index number, this tells us the row in which we would like to find the value corresponding to the lookup value. Once again here, this is virtually identical. It's analogous to the column index in VLOOKUP. Finally, the range lookup type once again is false, that's because we would like an equality based lookup. Once again, we can hit OK, and we can see that the publisher corresponding to this is Nintendo. We can view this in the preview tab as well. Now notice that as we mentioned previously, while copy pasting the HLOOKUP formula, we need the table array cell references to be locked, in other words, we need them to be absolute. So we've got to go ahead and make them absolute. We do this by adding the dollar signs. So we've updated the table array so that it's publisher details A22 to P23, but with all of the row and column identifiers preceded by the dollar sign.

[Video description begins] The complete formula now reads: =HLOOKUP(A2,PublisherDetails!$A$22:$P$23,2,FALSE). [Video description ends]

Now when we copy paste this formula down, the results are going to be correct. We can see this by clicking in any of the other rows in this column, we will see that the table reference always remains the same. Copy pasting it did not cause it to be updated. For instance, we have now clicked into cell D5. You can see that the formula there changes the lookup value to be A5, so that cell reference has been updated, but the table reference remains locked at A22 through P23. VLOOKUP and HLOOKUP are maybe the two most commonly used functions for power users. VLOOKUP tends to be much more commonly used than HLOOKUP just because data is usually aligned vertically rather than horizontally. But both of these functions are very important and it's worth your while to understand their use thoroughly.


Two-dimensional Lookups Using INDEX() and MATCH()

HLOOKUP and VLOOKUP are a great functions for combining data from different ranges or tables. The thing about both of these functions is that they are great for one dimensional lookup. So for instance, if you would like to look up values based on their columns, you'd use VLOOKUP. If you'd like to look up values based on their rows, you'd use HLOOKUP. What if you'd like to perform a two dimension lookup in which you take into account both the row and the column? Here we are going to have to make use of a combination of functions known as Index and Match. Let's go ahead and see this in action. Let's start by understanding the MATCH function. The MATCH function can be used to return the row number or the column number of a lookup_value in a lookup_array. We can see that the tooltip helpfully tells us that the first input argument is the lookup_value. The second input argument is the lookup_array. And the third optional input argument is the match_type. Here, the lookup value is the name of our video game that's contained in cell A2.

Then the lookup array consists of all video game information. So this is game details, A2 through A16. And note again, that we've made use of the dollar signs in order to lock these references. The third input argument which we have specified, which is FALSE, FALSE is equivalent to 0 in Excel. And this is telling Excel that we would like to perform an exact match.

[Video description begins] The complete formula reads: =MATCH(A2, GameDetails!$A$2:$A$16, FALSE). [Video description ends]

Now, if we go ahead and hit Enter, this return value, which is 1, tells us where in the original data, at what a row number in the original data, the value Wii Sports appeared. If we go ahead and copy this formula down, we will get the row number corresponding to each of the video game names which are present in column A. Please note that the return value is always relative to the start of the lookup array. So right now on screen, we've defined our lookup array as A2 through A16. If we now change that, so that it reads A1 to A16, the position of Wii Sports will change from 1 to 2.

[Video description begins] The formula now reads: =MATCH(A2, GameDetails!$A$1:$A$16, FALSE). [Video description ends]

We can confirm this by making this change and hitting Enter. The new value is position 2. Let's go ahead and update all of the formulas. And we can see that every one of the positions is updated or incremented by 1, that's because we change the starting point of the lookup array.

So far, so good. We now understand how to make use of the MATCH function to find the position of a lookup value. Next, let's introduce the INDEX function. The INDEX function, which you can see on screen now, is going to return the value of a table or an array as selected by the row and column indexes. So this is the way in which you'll perform two-dimensional lookups in Excel. Our first application of the index match combination is going to be to replicate VLOOKUP. You can see this on screen now.

[Video description begins] The formula now reads: =INDEX(GameDetails!$A$1:$I$16, MATCH(A2, GameDetails!$A$1:$A$16, FALSE),4). [Video description ends]

In cell B2, we've made use of a combination of index and match in order to get the publisher corresponding to a particular video game. Remember that we had already done this in a much simpler fashion using VLOOKUP. But here we are introducing two-dimensional lookups using INDEX and MATCH. So let's understand what this combination does. This cell B2 contains an invocation to the INDEX function. The INDEX function takes in as its first input argument the array.

This is the array that we are going to look up into. The second input argument contains the row number. And what is that row number? It's the result of MATCH. The third input argument is a column number. And that column number for now is hardcoded to 4. And that's because we had the publisher at column number 4 in the GameDetails tab. Let's now focus on the second input argument, which is the row number. Here we are asking Excel to find the particular row in which Wii Sports occurs in the GameDetails tab, cells A1 through A16. That row number is then going to be used to lookup and return the value in column 4. And that is, of course, the publisher Nintendo. Notice that all of the cell references are absolute where appropriate, so we can go ahead and copy paste this formula down. Before we do that, notice that we have the same lookup array in two places inside the formula called to INDEX. It's present as the first input argument to the INDEX function, and it's also present as the second input argument into the MATCH function. Notice that as usual, the lookup value is a relative cell reference.

We are not locking on the lookup value A2 because that needs to be adjusted as we copy the formula down. We've now got a good sense of how INDEX and MATCH operate. Let's go ahead and harness their full power for a two-dimensional lookup. Down below, we are going to enter the name of a video game. And then, we are going to try and find the sales of this video game in a particular geographic region. We are going to need a two-dimensional lookup because the name of the video game, you can see it here in cell B25, that's a Defendin' De Penguin. That is present as a row value. And then the geographic region, which you can see in cell A26. That's North America sales that is present as a column heading. So clearly, we need to find some way in order to get the number which is present in a matrix. Which row do we want? Well, the row corresponding to Defendin' De Penguin, and which column do we want? The column corresponding to North America sales. The way we accomplish this is using the INDEX function with two invocations to the MATCH function.

And that's exactly what you can see on screen now.

[Video description begins] The complete formula reads: =INDEX(E1:I17,MATCH(B24,A1:A17,FALSE),MATCH(A25,E1:I1,FALSE)). [Video description ends]

The INDEX function takes in as its first input argument the lookup array. Here, the lookup array we have passed in is E1 through I17 that consists of all of the revenue data. So again, the first input argument into index is the lookup array that consists of E1 through I17. The column headings of this range are NA_Sales, EU_Sales, and so on. So we clearly need to identify the correct column based on the column headings. And we need to find the correct rows based on the row names. Let's first focus on getting the row name right. The row name is going to be used to lookup into column A. We can see that column A here consists of all of the video game names. We'd like to find from this column, the row number corresponding to a specific video game name. That video game name is present in cell B25. And that's why we have a match and that match takes in as its lookup value B25. And it takes up as its lookup range, A1 through A17. A1 through A17 is a list of all of the video game names just up above. Because this is an exact match, the last input argument into MATCH is FALSE.

So that does it for the first invocation of the MATCH function, which is used to identify the row number that we have to look up into. Now, let's turn our attention to the second invocation of MATCH. This is going to be used to determine the column number. This MATCH function is also going to lookup some specific value. You can see right now that it looks up the value Game Name, so that's not what we want. Let's go ahead and update that cell reference from A25 to A26. It's now correctly looking up the column name NA_Sales. Now we know that this NA_Sales is indeed the name of a column, which is present in the range E1 through I1. So that's the second match. Once again, this match needs to be an exact match, so the last input argument is FALSE.

[Video description begins] The correct formula now reads: =INDEX(E1:I17,MATCH(B25,A1:A17,FALSE),MATCH(A26,E1:I1,FALSE)). [Video description ends]

We now have all of the elements necessary for us to compute the revenue of a specific game, Defendin' De Penguin in a specific region that's NA_ Sales. We made use of the INDEX function. The first input argument was the lookup array.

The second input argument was a MATCH function which returned the row number. The third input argument was a MATCH function again, this time it returned the column number. The first MATCH function looked up the name of the video game, the second MATCH function looked up the name of the geographic region. Let's go ahead and confirm that this is the right result, 0.02. Let's scroll up until we find the row corresponding to Defendin' De Penguin, that's a row number 10. And therein column E, we find the North America sales are indeed equal to 0.02. With this, we have successfully demonstrated the use of INDEX and MATCH to perform two-dimensional lookups. Before we move on, let's very quickly demonstrate the use of another common technique. We've already seen how tricky it is getting absolute and relative cell references right.

We've seen that if you forget to place dollar signs in the right parts of a call to VLOOKUP or INDEX or MATCH, we are going to have a problem when we copy paste the formula. An easy way around this is by making use of named ranges. So here we are going to go ahead and use VLOOKUP to look up the sales of a particular game. That came is Riven: The Sequel to Myst, and we would like to find its revenues in North America and the European Union. But this time before performing the VLOOKUP, we are going to select and create two named ranges. You can see that we've copy pasted the data so that we now have a nice range which has all of the video game names along with their sales in North America. We select all of these cells and then click on the Define Name button up top. Here we go ahead and define a named range

[Video description begins] The New Name dialog box appears. [Video description ends]

NA_Sales_Range, which refers to precisely the correct cells. Notice that here, while defining the named range itself, the cell reference is absolute. We can tell that from the dollar signs before A1 and B17 in the reference to Xbox in the dialogue on screen now.

In the same fashion, we go ahead and define another named range called EU_Sales_ Range, this one corresponding to all of the data for sales in the European Union. Now that we have these named ranges ready, we can go ahead and use them while performing our VLOOKUPS. So down below, where we would like to find the revenues of the game Riven: The Sequel to Myst by invoking VLOOKUP, all we need to do is to pass in as the table array, the name range, NA_Sales_Range. That's all that we need to do, we do not need to specify any cell references at all. The column INDEX number is 2, the range lookup value is FALSE. That's because we'd like an exact look up. We hit OK, and we immediately find the North America sales for our game. And exactly the same fashion, we invoke VLOOKUP one more time. This time our table array is the EU_Sales_Range.

The rest of the input arguments stay the same. We hit OK, and we can see here that we have got NA_Sales of 1.52, EU_Sales of 0.82. And if we scroll up, we can confirm that these are indeed the correct values. On row 7, we can see that the game Riven: The Sequel to Myst has revenues of 1.52 in North America and 0.82 in the European Union.


Cleaning Data

In this demo, we will turn our focus from indexing and lookup operations towards general methods for managing and cleaning data. The data source that we are going to use for this demo is onscreen now.

[Video description begins] The url reads: https://www.kaggle.com/mjamilmoughal/fruits-with-colors-dataset. [Video description ends]

As before, it's from Kaggle. This has to do with fruits and their associated colors. To begin with, let's make use of the COUNTIF function in order to see how often the fruit, apple, appears in this data set. So COUNTIF takes a range and a criteria. The range that we pass in consists of columns B2 through C60. The criterion that we specify here consists of two cells, those are B2 and C2.

[Video description begins] The complete formula reads: =COUNTIF(B2:C60,B2:C2). [Video description ends]

And so in this way, we are able to quickly figure out that our data has 19 occurrences of apple, and 3 occurrences of granny_smith, that's a type of apple. Now let's go ahead and see how we can eliminate duplicates from this data. So let's go ahead and create a copy of the sheet. Move it into a new sheet title Sheet1. Click on the Data tab up top. And there we'll click on the Remove Duplicates button which is over towards the right. As the tooltip there tells us, we can pick which columns ought to be checked for duplicate removal. When we click on this button, a Remove Duplicates dialog opens up. This has a list of all of the columns. Here, each of the three column names has been selected. Which means that for two rows to be considered as a duplicate, they've got to have the same values for each of these three columns.

Let's go ahead and hit OK. This will get rid of duplicates. 49 duplicate values were found and removed, 10 unique values remain. And in this way, we've seen how to quickly and painlessly eliminate duplicates from a set of data rows. Data duplication is a common problem and it's one of the first steps in cleaning data. So it's worthwhile keeping this in mind. Next, let's go ahead and see how we can clean data which contains unwanted special characters. This is also common enough, especially when we import data into Excel from other text sources. These could be either text files or XML files. These might have been captured from web input for instance. So the presence of special characters in data is really common. You can see, for instance, in the example onscreen now that we have used the Alt+Enter keystroke combination in order to enter a new line within an Excel. This of course is a useful tip to get your formatting right, but it's small keystrokes like this which also introduce unwanted, unprintable characters.

So you can see that we've now introduced a few such characters. To eliminate such characters, let's go ahead and invoke the CLEAN function from Excel. CLEAN is a function which, as its tooltip on screen tells us, can be used to remove nonprintable characters from text. It's a simple enough function, it takes in only one input argument, that's the cell to be cleaned. And in this way, we go ahead and clean all of the cell information in A1 through C11.
[Video description begins] The formula reads: =CLEAN(A1:C11). [Video description ends]

The result is the cleaned array, which we can see in the range starting on cell A14. This is also a good example of an Excel array function. An array function is one whose return value spans more than one cell. Here the CLEAN function was invoked on a range. And at the end, instead of hitting Enter, we hit Ctrl+Enter. And as a result of that, the return value from CLEAN was an Excel array. This Excel array occupies many different cells, which start on A14 and end on C24.

Notice the formatting of these cells, the elements of an array cannot be individually edited. That makes array functions somewhat clunky. So let's go ahead and convert this array into an ordinary Excel range. We select the array, we copy it by right clicking and choosing the Copy option. Then when we are pasting it, we make use of Paste Special and Paste Special as Values. When we do this, we now no longer have an Excel array. So that's a good thing. But we can see that we now have warning signs in column A. So when we pasted special as values, something mysterious seems to have happened. In case it's not clear to you where these warnings occur, look closely at the contents of cells A2 to A11. Each one of those cells has a green border over in the top left corner. That green border is a sign that some kind of Excel warning has resulted. It turns out that the reason for this warning is that the CLEAN function converted the integer values 1, 2, 3, 4 into the corresponding string values.

So this is what we want to eliminate. Let's go ahead and select those cells and view the error. To do so, click on the little warning icon over on the top right of the selected cells. The tooltip tells us that the number in this cell is formatted as text or preceded by an apostrophe. So to get rid of this warning, let's go ahead and change the formatting. We click on that warning icon and select the Convert to Number option. Once we do this, all of these values will be converted from strings back to numeric and you can see that the alignment changes. The strings were left aligned, the numbers are right aligned. And more importantly, the green border over in the top left corner has gone away as well. We've now worked with two common techniques for cleaning up data. We eliminated duplicates and then stripped or cleaned our data of special characters. Let's turn to a third common data cleaning technique, which is converting unit systems. We've gone back to the original data and added in a new column.

[Video description begins] To insert a new column, select a column and then right click on it. Select Insert from the shortcut menu that appears. A new column before the selected column gets inserted. [Video description ends]

This is column E, titled mass(in ounces).

This is going to take the mass which is present in column D, and that is a mass figure present in grams, and convert it from grams to ounces. And this is done making use of the Excel built in CONVERT function. This function takes in as its first input argument the number which is to be converted. And then the second and the third input arguments correspond to unit systems. Here the second input argument is the string g. This is our way of telling Excel that the input argument D2 is a weight in grams. The third input argument is ozm. Oz is the standard acronym for ounces, m here refers to mass in avoirdupois units.

[Video description begins] The complete formula now reads: =CONVERT(D2,"g","ozm"). [Video description ends]

Once we hit Enter, that input argument is going to be converted from grams to ounces. So 192 grams is 6.77 ounces. We can now go ahead and copy this formula down. All of our weights and grams will be converted to ounces. The Excel CONVERT function should be your method of choice for various weight and mass, as well as distance related competitions.

You should not be attempting to convert grams to kilograms or stones and so on by using hard coded constants. Just make use of the built in CONVERT method instead. This also supports various measures such as miles, scientific units such as angstrom, light years, and parsecs. Different time, pressure, force, energy, power, magnetism, temperature, volume, and area measures as well. You can also use the CONVERT function to convert from bits to bytes and other information technology related units.


Parsing Strings

Let's run through a few more miscellaneous data cleaning operations. If you right-click and click on the Link icon over at the very bottom, you can add links or hyperlinks into your workbook. These could be of different types. You can see from the options over on the extreme left that your hyperlink could point to an existing file or web page, to a place in this document, or to new documents or email addresses. Here we are going with an existing file. And that's why in the center pane, we can browse. On the desktop, we have the original data source, fruit_data_with_colors. When we go ahead and select this, we can see that a relative URL appears. This relative URL consists of the . operators, along with the \ symbol. \ is a common enough symbol on all Windows platforms and machines, but it's not used in the UNIX world. In any case, for now this is a usable hyperlink. We can click on it in order to navigate to the original data file. And if we'd like to extract the file name from that URL, that is also easy enough to do.

You can see that this is being done now on screen using a rather complicated string manipulation operation. Notice how it makes use of various string manipulation functions in Excel, such as the right function, the len function, and the find and substitute functions. Excel has many powerful string manipulation functions, and you'll often need to use these while cleaning data. In any case, we are able to successfully extract the name of the file from the hyperlink using this formula. Another common situation that arises while working with data is dealing with blanks or with missing data. Let's go ahead and see how we can deal with such situations. We switch over into the unique tab, and there we go ahead and delete various groups of cells at random from our data. Once we are done with this operation, we have a set of three blanks following the fruit name apple and a set of two blanks following the fruit name orange. We also delete the last fruit name of lemon. Now let's see how we can go ahead and fill in these missing values. Let's begin by selecting cells B3 to B5. Note that we are in the Home menu.

Click on the Find & Select option over on the top right. There you'll see that there is a submenu item called Go To Special. And it's this Go To Special menu item that we are going to use to fill in the blanks. In the dialog which opens up, we first select all cells which have blanks. Note that we do this by checking the radio button corresponding to Blanks. Then we type in the value which we wish to fill in. Here we have typed in B2, which corresponds to the value apple. Now, if we hit Enter, this value will be filled into a single cell. If we choose instead to hit Ctrl+Enter, that will fill in apple into all of these different rows. So we undo the typing that we just did, we redo the Find & Select, Go To Special. We again find Blanks, and there we go ahead and select B2 and hit Ctrl+Enter this time. So in this way, we have successfully demonstrated the use of Find & Select to search for missing values, and then the use of Ctrl+Enter in order to fill in those missing values with a single value. Let's go ahead and redo this operation for the other missing values.

Once again, we select the missing values, we click on Go To Special, choose Blanks. And then we go ahead and type out the formula which will give the value to be filled in here. That formula now is the cell location B7. If we hit Ctrl+Enter, it will fill orange into those missing values. And finally, another really simple way of filling in values is to just choose one particular value and drag down from the bottom right to fill in subsequent rows.


Performing Simple Data Validation

An ounce of prevention is worth a pound of cure. This saying applies neatly to data in Excel. In the previous demo, we spent quite a bit of effort to learning how to clean data which either had special characters or missing values or duplicates. We will now see how we can use data validation in order to prevent such occurrences in the first place. Excel supports many data validation techniques, so let's plunge right in. We click on the Data menu item up top, and then we click on the little data validation button present in the data menu. This is now over on the right. Clicking on that little button reveals some sub menu items, Data Validation, Circle Invalid Data, and Clear Validation Circles. For now, let's go ahead and click on the first one, Data Validation, which brings up a dialogue where we can choose many different types of validation criteria. Let's go with what types of values are allowed. So we click on the first drop down, you can see that the default value was any value. We change that to whole number, but we can also go with decimals, lists, date, time, text length, as well as custom validation.

We'll come back to some of these. For now, let's just go with only allowing whole numbers. Now, we can specify data must occur between a minimum and a maximum value. Let's specify a minimum of 1 and a maximum of 25. Let's then click on the Error Alert dialog. This is the third tab. Here, we can customize the message which a user will see when entering an invalid value. We can also go ahead and enter an input message, but that's not important for us right now. Please note that this data validation is going to apply to the currently active cell, which is cell A12. We hit OK and now cell A12 is going to only accept values between 1 and 25. We try and enter an input value of 28, we get the error message telling us that the input should be a whole number between 1 and 25. We are given the choices of retrying and canceling the operation. Let's retry and enter a valid number of 4, and this time the input is accepted.

Let's go ahead and apply a different kind of validation to cell B12. We again, click on the Data Validation button, but this time, in the Settings tab, let's go with a different type of validation based on the length of the text we enter. So we choose Text length from the drop down menu, and we now specify minimum and maximum values of the text length. We're requesting the user to enter a minimum of 2 and a maximum of 10 characters. While we're at it, we also customize both the input message as well as the error alert. Both of these will make the user experience more intuitive for the person operating the spreadsheet. Now, we hit OK, we come back into cell B12, and we try and enter text, which is simply too short, we try to enter just a digit, which is one character. Notice first off, that the tooltip over this cell contains the input message. This tells us that there is validation of a limited length text, and that the text should be between 2 and 10 characters. If the user attempts to enter invalid text despite this helpful input message, then the error message is going to be displayed. Once again, we're prompted to either retry or cancel.

Let's retry and type in valid text. This is the name of a fruit which is lemon. Because lemon is between 2 and 10 characters, this is going to be accepted. If we try and enter something longer, such as lemon yellow, well, once again, we encounter the same error message. So we undo the extra typing, enter the fruit subtype, and then move on. The next operation we wish to accomplish is to extend this validation to more cells in our spreadsheet. So we now choose all of the cells A2 through A12, we click on Data Validation. And here, Excel is smart enough to figure out our intent. Excel realizes that we have some cells in here which do contain data validation, but most do not. So Excel asks us whether we would like to simply extend the data validation to all of the selected cells. Of course, this was our intent, so we choose Yes. And now, we can see that all of the cells in this range are governed by the same data validation rules. Let's tweak the rule so that the minimum is now 0.5 and the maximum is now 2.4.

Note that when we make this change, a large number of the pre-existing values are going to be in violation of this range. How do we figure that out? Let's now click again on the Data Validation button, but this time, we're going to choose the second option, which is Circle Invalid Data. As soon as we do this, Excel highlights all of the invalid cells with a red ovals. And we can clear this by again clicking on the Data Validation button and choosing the third option, which is Clear Validation Circles. We now go back to the original appearance of our spreadsheet. Let's now go ahead and try a few more advanced types of data validation. This time, we'll begin by selecting all of the data in our range, clicking on the data validation button, and then clearing all of the pre-existing validation. We'll do this by clicking on the Clear All button in the bottom left.

A moment ago, when we applied validation to the fruit name, we did so based on the text length. But a far more meaningful way of applying validation there would be to restrict the fruit name to only a genuine fruit. To do this, let's first create a list of all acceptable fruit names. To do this, we simply create a list of all unique fruit names. So we copy the names of all the fruits. We paste this range over into cell H1. We then select that range, navigate into the data menu, and eliminate duplicates. Once we eliminate duplicates, all that's left is a list of all of the unique fruit names. So at the end of this process, starting on cell H1, we have a list of the fruit names, which here are apple, mandarin, orange, and lemon. This now, can serve as the basis for the validation that we want to apply on column B. Let's first try it out on a smaller scale on a single cell, that's cell H9. So we click on Data, then navigate to Data Validation, and this time we choose to apply data validation of type list you can see List from the drop down of the different validation types there.

This is one of the most important types of data validation. When we choose to go with list validation, we are prompted for a source range. So we click on the Source button, navigate to our list of fruit names, and select cells H2 through H5. Notice how each of these has been coded as an absolute cell reference with the dollar signs before the row and column identifiers. After selecting this list, we click OK, and now, only values from this list are going to be acceptable in cell H9. You can see right away that the appearance of the cell has changed. There is a little drop down menu over on the right edge of that cell. Let's click on the drop down menu, and we can see that the only acceptable values are apple, mandarin, orange, and lemon. Those of course, other values from the list which starts on cell H2. Data Validation using list is an extremely powerful feature because once you restrict your users to a set of drop down values, the chances of typos or missing values is basically non-existent. You can now see that we can use a drop down toggle between different fruit types, but there's no way for us to enter a value which is not in this list.


Performing Complex Data Validation

Let's now round out this demo with a look at an even more complex form of list validation. We are going to create a list of lists. So you can see that we now create a list of all acceptable fruit types as well. We can see that in column J, we have a list of acceptable apples. In column K, we have a list of acceptable mandarin types, there's just one. Column L has all of the acceptable orange types. And column M has all of the acceptable lemons. The next item on the agenda is to create named ranges corresponding to each subtype of fruit. So we navigate to the formulas menu. There, under the defined names section, we click on the Create from Selection button. Because we've selected cells J1 through J5, Excel realizes that we would like to create a named range from these cells.

[Video description begins] Under the formulas tab, select Create from Selection. The Create Names from Selection dialog box appears. [Video description ends]

It then asks us from where it ought to get the name of this named range from. The choices are the Top row, the Left column, the Bottom row or the Right column. Here we just go with the Top row.

This has created a named range called apple. And that named range will include all of the acceptable varieties of apple, from granny_smith through cripps_pink. We now repeat this process with each of the other fruit types. So we create a list of acceptable mandarins. This is going to be called mandarin and it has the one value named mandarin. We likewise create a named range orange, which will include spanish_jumbo, selected_seconds, and turkey_navel. Finally, we'll do the same for the types of lemons as well. A named range called lemon, which has the values spanish_belsan, unknown, and Avalon. Now we're going to set up our spreadsheet so that we have a fruit name which is contained in cell H9. And then in cell I9, we are only going to allow valid fruit subtypes for that particular fruit. So for instance, if the fruit name is mandarin, we'd like to only restrict the values in cell I9 to be types of mandarin. Remember that, we've already applied validation to cell H9, so its only going to contain valid fruit names. Now let's select cell I9, click on Data Validation and go to validation of type List.

This is where things get really interesting. This time in the Source field, we do not navigate out to any of the lists that we just created. Instead, we make use of a specific Excel formula called indirect. Now the indirect formula is extremely powerful, but it ought to be used with care. Because it can make spreadsheets very hard to debug. Indirect takes in an input argument, which is a string, and it returns a cell reference to that string. So for instance, if you passed in the string A1 into the indirect function, the return value would be the cell reference A1. Here we are going to invoke the indirect function on the cell H9. And H9, of course, is the cell which contains the name of our fruit. Why will this work? Because we've just created named ranges which correspond to each type of fruit. So now, for instance, indirect of H9 refers to the named range with the name mandarin.

That's because mandarin is the value contained in H9. So indirect of H9 redirects to the named range mandarin, which is defined in cells K1 to K2. And that's why the drop down menu in cell I9 now only allows the one acceptable value, mandarin. What happens if we change the contents of cell H9? Let's change the fruit type from mandarin to lemon. As soon as we do this the drop down menu associated with cell I9 changes as well. Now when we click there, we can see all of the acceptable lemon varieties, spanish_belsan, unkown and Avalon. Why did this happen? It's because the validation type applied to cell I9 had as its source indirect of H9. Here H9 contains the value lemon. Indirect of H9 gave Excel a handle to the name range titled lemon. That name range is visible in the top right of your data area.

It begins on cell M1 and it includes the lemon varieties spanish_belsan, unknown and Avalon. And that's why the corresponding values in the drop down changed as well. That gets us to the end of this demo in which we demonstrated the advanced use of lists for data validation. We made use of the indirect function in order to create dynamic validation. Where acceptable values in one cell varied based on the values present in another cell. The indirect function, remember, takes in an input argument as a string, and that string is then converted into a cell reference. The indirect function should be used with care because it makes debugging Excel spreadsheets quite difficult, but it is extremely powerful.