• 2 months ago
Web Data Enrichment into Power BI

Enhance your data analysis skills with our comprehensive course on Data Enrichment through Web Data Extraction. This playlist is crafted for professionals, data enthusiasts, and Power BI users who want to expand their datasets by pulling in valuable external data from the web. In this course, you’ll learn how to efficiently extract, clean, and integrate web data into your existing datasets, empowering you to generate richer insights and more informed decisions.

Whether you want to add competitive intelligence, market data, or additional context to your reports, this course will guide you through the process step-by-step.

What You’ll Learn:
• Techniques for extracting data from websites using Power BI and other tools
• How to clean and prepare web data for integration with existing datasets
• Best practices for data enrichment and expanding your analysis capabilities
• Tips for automating web data extraction and maintaining data accuracy

Ideal For:
• Data Analysts and Business Intelligence Professionals
• Power BI users looking to incorporate web data into their analyses
• Anyone interested in enriching their data with external sources

Subscribe for more tutorials on advanced data techniques and learn how to leverage web data extraction for more powerful insights.

web data enrichment tutorial
Power BI data cleansing from web
web scraping for Power BI
Power BI web data connector
web data enrichment tools
Power BI data enrichment best practices
web data preparation in Power BI
Power BI web data integration
web data import into Power BI
Power BI data enrichment automation

#WebDataEnrichment #WebScraping #DataEnrichment #PowerBI #BusinessIntelligence #DataScience #ReportingTools #DataAnalysis #DataManagement #BusinessAnalytics #PowerBITutorial #DataVisualization #DataIntegration #MicrosoftPowerBI #PowerBITips #DataPreparation #Analytics #PowerBIGuide #DashboardCreation #AzureSynapseAnalytics #PowerPlatform #DataAnalytics #DataWarehousing #MachineLearning #BigData #ETLProcesses #DataInsights #SQLServer

Category

📚
Learning
Transcript
00:00Hey friends, welcome back.
00:07Now in the last video or videos we have seen how powerful the Power Query Editor can be.
00:12And when we talk about transforming, cleaning and also enriching our data.
00:18And enriching is a good keyword because one of my colleagues reached out to me and she
00:23told me that there's actually some old data in my current setup and I need to replace
00:29it with newer data.
00:32So to be specific, she told me that in our location table, there's currently a column
00:37which contains the population for each in this case of the counties, in this case of
00:43the cities, I think of this case, California, and she wants us to update this.
00:49So let's do that.
00:51And what she also provided this time was not any kind of Excel file, anything like that
00:57to update the data.
00:58Instead, she has sent us a URL to a website which contains the data.
01:05So we need to extract the data from the website to get the latest data.
01:09Interesting.
01:10Let's do that.
01:11So let's go under transform data to reopen Power Query Editor again.
01:16Also for the website, I already opened it and it's also available for you.
01:21I have the link attached before the lecture so you can use it and you can see that's what
01:27the data looks like.
01:28So I have here a website which contains information.
01:31It's called worldpopulationreview.com and I can see here the 10 largest cities in California.
01:38I can see here data points and then here, for instance, I can see a table with the information
01:43which I need.
01:44Right.
01:45This is the 2023 population.
01:46That's the up-to-date population.
01:48That's the column I want to have.
01:49Now, of course, I have various options.
01:52I could say I download the CSV file or the JSON file and then I import this file into
01:58Power BI and combine my data.
02:00That would be possible.
02:02But why not actually extract the data directly from the website?
02:07Because if we do it this way, then we could make sure that when the data changes here
02:12and we update or refresh our data, we get the latest data from the website.
02:17Right.
02:18So that's a win-win situation for us.
02:20So let's actually, I just copy the specific path to the URL and then let me just make
02:27this smaller, shrink this, and then go inside the editor and this time we say we want to
02:32add a source, a new source, and we want to add it a web source.
02:37So click on this web icon here, click on web, and then there is the pop-up window and now
02:43we need to specify the URL.
02:45We just paste here the link to the website.
02:48I click OK and then we just wait and there is our window, this one here, and it just
02:55asks us how do we want to access this specific website.
02:59So currently, the website is this one here and then we can specify are there any kind
03:03of credentials required.
03:05So is this a web API, do we have an API key, do we want to use Windows credentials, do
03:11we need to access with an organizational account, if it's, for instance, an Azure cloud storage
03:16from our company, anything like that?
03:17No.
03:18It is actually a public website so we can use anonymous here, the default one.
03:23We click on connect.
03:25So then we just wait until the window opens, just a second, and now Power Query is scanning
03:31the website for us and now we have our navigator.
03:34Kind of similar to our Excel file, if you remember, when we opened it.
03:38Now in here, what happens is that Power Query tries, or Power BI, tries to analyze the website,
03:45the structure, so the underlying HTML code, and searches for tables.
03:50In this case, we have a table number one and we can select it and that looks actually already
03:56quite good, right?
03:57So you can see here, this is a rank, the name, population, and so on.
04:02So that's exactly the data I think which I need because there is the 2023 population,
04:07this specific column I want.
04:09Of course, you can also check out suggested tables here, for instance, table number two.
04:14You can see that this is this table here which is also a table available on a website
04:18and table number three, that's also obviously a small table from the website which you could
04:23also extract.
04:24But for us now, I want to have table number one, this one.
04:28So I can simply click on this option here, I select the table, and then I click on OK.
04:35So now what happens is here we have imported the table directly from the website, right?
04:41So we have here table number one, which gives us here the name, population, and so on.
04:46This information, and this is coming directly from the website you have seen.
04:51So if I open this again, this, for instance, this one is a table which we have seen.
04:56This one is a table which is table number one, and there also if I scroll down, there
05:01should be another table somewhere.
05:02I don't know where it is, but as you've seen, it probably also extracted the third table.
05:08That's exactly the data from this website now which we use and import it into Power BI.
05:14That's what I meant by data enrichment.
05:16So we can also extract data from various sources and enrich the data we already have in here.
05:22So, so far so good.
05:24We have table number one, and now we need to bring this table number one, the data we
05:28have here, together with our location data.
05:32In the location data, we also have a name here, and we have this specific column which
05:36I was referring to, which is, where is it, population, population, there it is, this
05:41one here, and this one I want to actually replace because these are the old values.
05:47So to do that, I can go to my, in this case, location, I'm already there, and we can now
05:53bring them together by joining the data.
05:56That means this time, we're going to use again a combining step here, but this time not the
06:00append query.
06:01Instead, we use the merge, we use the merge query.
06:05So if you click on merge query, you have the same option to either merge the queries,
06:10which means we'll change the location table, or by merge queries as new.
06:16So in this case, the merge query would be fine, because I actually want to replace the
06:22column in here, and I simply replace directly in location table.
06:26Even though you could, if you want, also do it as merge query as new, and then put the
06:31location as well as the table number one here in these other queries, disable the load,
06:35and only keep the original location table, the new location table, that could also be
06:40possible.
06:41But for now, let me just use location here, merge the queries, and then we need to specify
06:47what are the tables we want to merge.
06:49So the first one is the current one, which is location, and the second table we want
06:55to merge, so to put together, to join, is actually the table number one, our web query.
07:02So these are the two tables.
07:04And then we need to specify how do we want to join the tables.
07:08Well, the common field is actually the name.
07:11So we need to select name in here, and we need to select name down there.
07:17So if we do this, we are asked about the privacy levels.
07:21What does this mean?
07:22Well, Power BI is informing us that currently, we extract data from the web, from a public
07:28web source.
07:30And we also have data in here, which is coming from our private data repository, right, from
07:35the data we have locally stored.
07:37And Power Query just tells us here that if we combine local data with a new external
07:43data source, we need to make sure that this is what we want.
07:48Because the data is brought, they brought together, and for instance, if you have really
07:53high sensitive data, high sensitive company data, you maybe not want to combine this with
07:58a public data source.
07:59Okay?
08:00Be aware of that.
08:01But for us now, that's totally fine, right?
08:04Because there is no secret in our location, so we can bring this together.
08:08So that just means we ignore the privacy levels for here, click on save, and then we just
08:12wait.
08:13And we can see that in here, based on the naming here we have, we got 71 of 74 matches.
08:21So for three specific, in this case, columns, in this case rows, sorry, for three specific
08:27cities, or counties, or whatever this is, we do not have a match, but for 71, we have
08:34a match.
08:36And for me, that is fine.
08:37So now we can simply click on okay.
08:41And by the way, this is by default a left auto match, meaning we keep all the columns
08:46from the original location table, and we also merge the matching columns, we only keep the
08:52matching columns from table number one.
08:54That's what this merge does here.
08:57So now let me click okay, do that, and you see that now inside the location, we have
09:03a new column, which comes from table number one, and it's currently a table.
09:09That means if you click next to it, don't click on the table, click next to it, then
09:14you see a preview of what's inside.
09:17So what happens actually now is we can take a look at that.
09:20We have here in the first row Anaheim, and what we get now inside this table here are
09:26actually all the matching rows, which is only one row of course, from the table number one,
09:31which is also Anaheim, which contains all the columns.
09:34So in this case, the new population column, which I want to have, as well as other specific
09:38columns, which I don't need.
09:42So in order to extract the data now from the table in here, right, we can click up here,
09:48there's a little icon, this one here, you can click on it.
09:52And that means we expand the table now.
09:55So we can see a specifier, what kind of columns do we want to keep from this table, which
09:59we merge here.
10:00And for us, let me just deselect everything, I only want to have actually the population
10:062023.
10:07This is the only column I would like to have from my table number one, which should be
10:12appended, sorry, append is the wrong word, which should be merged as a new column in
10:16here to my location.
10:19So, and I also untick this option, use original column name as prefix, because if I leave
10:26it checked, then the column would not be, the column name would not be 2023 population,
10:31the column name would be table one dot 2023 population.
10:35And I don't want to have this table one here as a name in my header, that's why I unchecked
10:39this option.
10:42So let me click OK.
10:44And you see that now I got one new column, which is this one here, which contains the
10:482023 population from the web source.
10:51So from table number one.
10:55And this now can be compared to the older one, which was this one.
10:59And you clearly see that here the numbers are different, right?
11:02So this is an older information about the population.
11:04And this is the latest one, which you get from the website, this one here.
11:10So now, as we've seen that, you can also see it by this indication here, we currently have
11:15three missing values.
11:16So three of them are empty, because our matching here was only, well, what was only working
11:22for 71 of the 74 entries we have.
11:26So we see there are three missing.
11:28So let me just check that.
11:29If we just maybe filter this from another option here, let's just see what is missing
11:33here.
11:34Obviously, for these three, in this case, let's scroll here, for East Los Angeles, for
11:39Jorupa Valley, I probably pronounced this wrong, sorry for that.
11:43And then for this last one here, obviously, the values are not there, right?
11:47So these are missing here.
11:49So we could do, we have various options, we could either say, we ignore this, so it doesn't
11:54matter for us for these three.
11:56Or we could say, we could use something we have seen before, I've shown this to you.
12:01If you go to, for instance, add column, there's an option to add a conditional column.
12:05And the conditional column could be something like, if the new 2023 population is empty,
12:12then we would like also in this case is no, then we would like to have the values here,
12:16for instance, from our old population, where is it?
12:20The population, this one here, then we would like to have these values here.
12:24Otherwise, we would like to have our original values, the new values here.
12:29So maybe we could do this, why not?
12:32So we can see and learn how this works.
12:34So let's actually go in here and say, I'd like to have, where is it, the conditional
12:38column.
12:39And I say, in this case, that is mine.
12:43Let's call this new population, for instance, let's say, this is my new population, population.
12:48And then say, if, in this case, the population 2023, this case, if it's empty, so equals,
12:56and then I can say, if this is no, no, for instance, then I would like to have here,
13:02a different column, select the column.
13:04And let's say, I'd like to see them, the older population data, right.
13:09And else, if that is not the case, then select the column.
13:13And I would like to have my new data.
13:15So this one.
13:17So this just makes sure that in the new column we create, we actually get the data, the latest
13:22data from the web.
13:24But just in case we do not have data from the web, we use our older data, right, the
13:28population column.
13:30And if we do it this way, click OK, and create the column, you'll see that here we have exactly
13:35the same values as here.
13:37But for the three entries, let's go in here.
13:40And just check this one more time, where we did not find a match in here, they now contain
13:45these numbers in here, right.
13:46And these numbers here are simply, if I scroll to the right, these would be the numbers here
13:51coming from population.
13:54So, this can be done in order to fill these values in here.
14:01One thing is, currently you can see that the population here is ABC, or in this case, ABC
14:071 to 3.
14:08It is not a number.
14:10So that's something I have to change.
14:12And actually, I'd like to change it before I add this conditional column.
14:18So the reason why I want to change it before is this number here looks different to the
14:23number I can see in my population column in here.
14:27So this is a number here, and this is a number, the new one, this one coming from the web,
14:33is a number in a US format.
14:37So with a comma separator for the thousands, for instance.
14:40This is something which is not normal in my country, so we do not have a comma separator.
14:46And that is why, if I select here, before I added the conditional column, right, let
14:51me go to the expanded table here, for instance.
14:54So before that, if I would try to go in here and say I'd like to convert this into a whole
15:00number, let's use whole number here, check this option, and yes, I would like to insert
15:05the step before adding the conditional column, click on insert, I'll see that now I receive
15:10errors here.
15:11And also these numbers are clearly not correct, right?
15:14It's not 344, it's 344,000.
15:19Now for some of you, this transformation works, especially if you are located in the US, probably
15:25you don't have any issues here.
15:26You simply can convert it like that, and you get the correct number in here.
15:31However, or maybe the number type is already as a number identified correctly by Power BI.
15:38But if you are coming from a country, like my country in this case, where the comma is
15:43not a default separator for the thousands, what you could do instead is the following.
15:47Let me just remove the step here, yes, I'd like to delete it, and let's go back to the
15:52expand table here before the step.
15:55And then there's a very, really great feature in Power BI, which is you click this option
15:59here, and instead of using the ones up here, you can go to using local.
16:06Using local allows us, yes, I'd like to insert the step one more time.
16:10This allows us here to change the data type, for instance here, from this text option to
16:15number.
16:16In this case, it is, in this case, a whole number.
16:19But now I can specify what is the location.
16:23So I would see here for Germany, there is no comma separator for the thousands.
16:28So I just specify, I'd like to convert this, but I'd like to convert it and make sure that
16:34while the conversion is done, it is identified as U.S. format.
16:39And I can do that by simply searching here for, I think it's English U.S., let me just
16:42scroll up, English, and there should be the United States, there it is.
16:47So let me check this option here.
16:49And now let's try the conversion one more time.
16:52So you can see here, 343,000, and so on.
16:57And now if I click OK, you see that now I get the correct numbers.
17:01So the comma is now removed, of course, but now I have exactly a whole number, and I can
17:07see the correct numbers in here.
17:09And they're also right aligned, like in Excel, so now these numbers are correct.
17:13So this is something I'd like to really, I'd like to show you, I'd like to emphasize, because
17:18oftentimes, especially if you work, for instance, for an international company or anything like
17:21that, you have to deal with this comma and dot thousand separators and decimal separators
17:27and so on.
17:28And it's not the same, right?
17:29So you clearly get a lot of issues with that.
17:32And this can easily be handled by this great feature here by using local.
17:38And for me, that is done now.
17:39As I said, for you, maybe you don't have to have to do this because you're already located
17:43in the U.S., but if not, then this would be a way to transform it.
17:47And even if you can't use it here, be aware that this feature exists, and maybe it's helpful
17:52in the future for you.
17:53So again, then we edit this conditional column here, and then we filter the rows.
17:58And now, oh, the filter, let's get rid of that, don't need this anymore.
18:03And now if we go here, we have here our new population column in here, and we can just
18:07convert here also into, let's say, a whole number in here, you know, we also have the
18:12correct numbers in here with the field values from this population 2023.
18:18So finally, what we can do now is simply click on this 2023 population and also go
18:24inside our population from here.
18:27This one here, I hold my control key to select those two.
18:30And then I just right click on here and say, I'd like to remove columns because we only
18:35need our new population column.
18:37And this now contains the latest data from the web and also the three, well, the three
18:43data entries from our older data where the web did not give us an image.
18:48And that's basically it.
18:50This is for enriching our data.
18:52And now finally, of course, we have the latest data.
18:55We right click on a table, number one, be aware that we want to disable the loads.
18:59We don't want to see this in our report view, but we want to include it in refresh because
19:04we want to make sure here, including refresh, because we make sure that when we update the
19:09data, then Power BI is automatically extracting the latest data from this website.
19:14Of course, requirement for that is that the website then still exists and has the table
19:18included.
19:19Right.
19:20That's true.
19:21But as I said, I think that's the better option instead of downloading the data manually from
19:25the website and then importing into Power BI because then we rely on the downloaded
19:30data.
19:31Right.
19:32And that's not necessarily always the latest data.
19:34And that's it actually for updating the location.
19:37And now also our colleague is happy.
19:40So thanks to her for informing us and providing us with the website.
19:43We did the transformation.
19:45So actually you did the transformation.
19:47So congratulations.
19:48And now, of course, we could go back under home, click on close and apply.
19:53And then we apply the transformations which we've done in Power Query and we get our new
19:58data in here.
19:59And you see that now on a location, there is this column, new population, which has
20:04now the data from the website, which you can use now in order to create our report.
20:09So that's it actually for this video.
20:11So thanks a lot for following along.
20:13Hopefully you enjoyed it and see you in the next one.
20:16Until then, best guys.

Recommended