Like what you see? Have a play with our trial version.

Overview

 Data profiling analyzes the data in your view, and determines what actions are available to optimize the efficiency of your view.

Ensure that you have the Data Profiling configuration setting enabled to use this functionality. Navigate to Administration > Admin Console > Data Sources > (click on your data source)  > Usage Parameters > Data Profiling.


Run analysis

A view can only be analyzed after the modeling stage. Make sure the view you're working with has already been modeled before attempting to run a data profiling analysis.

When profiling your data, you can run a basic profile by ignoring the advanced options, or you can set the advanced options according to your needs.

OptionsDescription
Rows to ProfileChoose how many rows of data to profile. More rows will give more relevant results, but will take longer to analyze.
Columns to ProfileChoose to run data profiling on all the fields in the view, or select a single field from the list.
Analysis Types

Select which data analysis you wish to run on the selected field(s) for suggestions to be tested and included.

Selecting none will still run the standard data profiling.

In the steps below, we will be working with a view that is being edited in the Prepare stage.

  1. Ensure your view is in the Prepare stage
  2. Click on the Data Profiling tool from the top toolbar to open the Data Profiling dialog box
  3. Click  on the Advanced link to expand the list of options
  4. Use the table above to change any of the three main advanced options to suit your needs
  5. Click on the Start Profiling button to commence the analysis
    Once the analysis is complete, your view will display histograms under each field

See the sections below for further information on what information you can obtain from the histograms.

Histogram chart types

A histogram chart analyzes the frequency of distinct values in the data and displays it at the top of each profiled field.

Grouped Values or TextNumeric, Date, or Time Values

Values


StatisticDescription
Sample SizeDisplays a count of the records within the field.
Distinct ValuesDisplays a count of the distinct values within the field.
Empty/NULL ValuesDisplays a count of the number of empty values within the field.
Numeric Fields
MedianDisplays the number separating the higher half of the sample from the lower half.
AverageDisplays the mean value.
Standard DeviationDisplays the measure of the dispersion of a set of values.
MinimumDisplays the lowest value.
MaximumDisplays the highest value.
Date/Time Fields
MinimumDisplays the earliest date.
MaximumDisplays the latest date.


Usage


SectionDescription
ReportsDisplays a list of reports that make use of the selected field.
Last ModifiedDisplays the last modified date of each listed report.
UsageDisplays the usage rate of each listed report.


Suggestions

Depending on the type of field being profiled, Yellowfin may make suggestions as to functions that could be applied to it based on the outcomes of the analysis.


SuggestionDescription
Reference Code CheckThe values in the field will be analysed and compared to existing ref codes. It may then be suggested that a ref code be applied, updated, or created.
Null CheckThe values in the field will be analysed and it may be suggested that null values be replaced or filtered out.
Number UniquenessThe values in the field will be analysed and it may be suggested that the fields be grouped.
Date UniquenessThe values in the field will be analysed and it may be suggested that the field be used as part of a date hierarchy.
Geography CheckThe values in the field will be analysed and it may be suggested that the field be linked to a GeoPack.
Date HierarchyThe values in the field will be analysed and it may be suggested that the field be linked in a drill down date hierarchy.



  • No labels