HCOM Power BI - Transforming Data

Tags HCOM

Overview

Transform data in Power BI is where you can load and alter the data without changing the source data. You are likely to transform some of the data initially and as you are building data visualizations you will likely find additional transformation needs. Transform data acts as sort of a macro, applying the same steps every time the data is refreshed/updated. Here are some helpful tips on some of the tools to use when transforming data. 

Transform Data

In the PBIX file, locate 'Transform Data' in the Queries section of the Home tab. 

Uploaded Image (Thumbnail)When you enter the transform data screen, you'll notice to the left all of the data sources you have added. In the middle is a preview of some of that data. On the right are the steps that you apply to transform the data. Just about every data source will have Source and Navigation as the initial steps. 

Uploaded Image (Thumbnail)

At the top of the screen, you'll notice different functions you can use to transform the Data

Uploaded Image (Thumbnail)

Note that if you filter out rows to view and double check things in this function it WILL add it as a step which may be unintended. This is only a preview of the data so this is not the appropriate place to filter unless you legit need it.

As you navigate through each of the functions below, some of the most common functions will be in  bold.

Home

  • Close & Apply - Each time you transform data, you will want to be sure you select this to get back to your data model
  • New Source - This where you will connect one or more data sources
  • Recent Sources - List of recent data connections made
  • Enter Data - If you want to create a very basic reference table that doesn't have a lot of columns or data, you can ad hoc create a table without loading data
  • Data Source Settings - This is where you can enter your credentials to connect or update a connection to a data source
  • Manage Parameters - This allows you to create and manage parameters that can be used to dynamically change the output of your queries
  • Refresh Preview - As you are working with data, you may need to refresh the baseline view of the data, especially if new data/fields from your source have been added. I often do this anytime I begin to transform data
  • Properties - If you need to change the Query properties, like the name, you can do that here but you can also do that on the right side in the pane or by double clicking the name of the query on the left.
  • Advanced Editor - This is more of a developer view
  • Manage - Allows you to delete, duplicate or reference a query.
  • Choose Columns - This is a great feature that allows you to navigate more quickly to a column or reduce the amount of columns in your data set. This is especially helpful if you are connecting to a large data set but you really only need x amount of fields.
  • Remove Columns - Allows you to manually select columns to remove from a data set
  • Keep Rows - This allows you keep only top rows, bottom rows, a specific range of rows, duplicates or errors
  • Remove Rows - Conversely, this allows you to remove rows; top, bottom, alternate, duplicates, blanks or errors. This is extremely helpful if your data source has some rogue rows that will always produce null values
  • Sort - This allows you to sort rows but again this is a preview of the data, not completely inclusive
  • Split Column - This functions similar to 'Text to Column' in Excel. If you need to split out data into one or more columns by a delimiter, number of characters, position, lowercase to uppercase, uppercase to lowercase, digit to non digit and non digit to digit.
  • Group By - Use this to group rows in a table by specifying columns to group by and the desired output
  • Data Type - If you need to change a column data type, highlight the column(s) and you can either click on the icon to the left of the name or select data type from the navigation pane
  • Use first row as Headers - You will use this quite often when you initially load the data, especially if it is a web link to an Excel
  • Replace Values - This is incredibly helpful if you need to translate data, make blank cells null or null cells blank, 1 to yes/0 to no, etc.
  • Merge or Append Queries - This is also incredibly helpful if you have multiple data sources in the same format that you want to combine, you'll typically want to Merge/Append queries as new. You can select two or more queries to combine. The function is pretty much the same but below is a more technical difference:
    • Merge queries: Used when you have two tables with a common column. It creates a new table or modifies an existing one based on the common column.
    • Append queries: Joins two or more queries by stacking them on top of each other. It adds additional rows of data to an existing query.
  • Combine Files - Merge all files in a given column into a single table
  • Text Analytics/Vision/Azure Maching Learning - These are all part of the AI Insights feature, which allows you to enhance your data analysis with advanced machine learning models.

Transform

Some of this can be repetitive from the Home tab:

  • Group By - Use this to group rows in a table by specifying columns to group by and the desired output
  • Use First Row as Headers - You will use this quite often when you initially load the data, especially if it is a web link to an Excel
  • Transpose - This is similar to the Excel function of transposing data from row to columns or columns to rows
  • Reverse Rows - Allows you to invert the rows of the table, switching the last rows to the beginning and visa versa
  • Count Rows - Returns number of rows in the table
  • Data Type - If you need to change a column data type, highlight the column(s) and you can either click on the icon to the left of the name or select data type from the navigation pane
  • Detect Data Type - This will automatically detect the type of data in a highlighted column
  • Rename - Rename a selected column, you can also do this by double clicking on the column to change the name
  • Replace Values - This is incredibly helpful if you need to translate data, make blank cells null or null cells blank, 1 to yes/0 to no, etc.
  • Fill - Fills cell values of neighboring empty cells in the selected columns
  • Pivot Column - Create a pivot table of the data
  • Unpivot Columns - Remove pivot table of the data
  • Move - Move columns to a different position, you can also do this by dragging and dropping columns
  • Convert to List - converts a selected column to a list
  • Split Column - This functions similar to 'Text to Column' in Excel. If you need to split out data into one or more columns by a delimiter, number of characters, position, lowercase to uppercase, uppercase to lowercase, digit to non digit and non digit to digit.
  • Format - This allows you to update the data in selected column(s) to lowercase, uppercase, capitalize each word, trim, clean, add prefix, add suffix
  • Merge Columns - Merge data in 2 or more columns
  • Extract - Allows you to extract data based on length, first characters, last characters, range, text before delimiter, text after delimiter, text between delimiters
  • Parse - Does something with XML or JSON
  • Statistics, Standard, Scientific,Trigonometry, Rounding, Information - Advanced development
  • Date - Format specific type of date
  • Time - Format specific type of time
  • Duration - Format specific type of duration
  • Run R Script and Run Python Script - Advanced development

Add Column

  • Column from Examples - If you need to replicate the steps from another column, you can use this to apply the same logic
  • Custom Column - This is often used to create an additional column with a default value, concatenation, or other formula
  • Invoke Custom Function - Create a custom function that applies to the rows of the data
  • Conditional Column - This is often used if you need to take a single columns value and translate it to something else. For example; if you need to take the Rotation Name and translate it to 'Required'
  • Index Column - This allows you to index your rows
  • Duplicate Column - Basically copy/paste of another column
  • Format - This allows you to update the data in selected column(s) to lowercase, uppercase, capitalize each word, trim, clean, add prefix, add suffix
  • Merge Columns - Merge data in 2 or more columns
  • Extract - Allows you to extract data based on length, first characters, last characters, range, text before delimiter, text after delimiter, text between delimiters
  • Parse -Does something with XML or JSON
  • Statistics, Standard, Scientific,Trigonometry, Rounding, Information - Advanced development
  • Date - Format specific type of date
  • Time - Format specific type of time
  • Duration - Format specific type of duration
  • Run R Script and Run Python Script - Advanced development

View

  • Query Settings - This adds/removes the Query setting pane on the right, you'll likely want to leave this
  • Formula Bar - This adds/removes the formula bar that is between the ribbon and the data
  • Monospaced - Displays contents in a monospaced format
  • Show Whitespace - Adds/removes whitespace, you'll want to have this checked if you want it to wrap text within the cell in this view. By default this is usually on
  • Column quality - When selected, it will evaluate your data to determine % valid, error or empty. This is helpful in checking the quality of your data.
  • Column distribution - When selected, this will evaluate your data and return the distinct vs. unique count of the data within the column
  • Column profile - When selected, this will open a chart below your data that gives you value distribution and column statistics
  • Go to column - Similar to 'Choose Columns' on the Home tab, this will allow you to locate and navigate to a field more quickly
  • Always allow - Allows parameterization in your data source
  • Advanced Editor - Advanced development
  • Query Dependencies - Returns a view of Queries and their dependencies

Tools

  • Diagnose Step - Evaluate and diagnose a query step
  • Start Diagnostic - Starts diagnostic
  • Stop Diagnostic - Stops diagnostic
  • Diagnostic Options - Allows you to set certain parameters when running a diagnostic

Help

There are a lot of resources available to learn Power BI:

  • Guided Learning
  • Documentation
  • Training Videos
  • Support
  • About
  • Power BI Blog
  • Community
  • Power BI for Developers
  • Samples
  • Community Galleries
  • Submit an idea
  • CoPilot is also a really great resource to help troubleshoot, particularly DAX and measures

Get help from HCOM IT

Deleting...