Analytics Builder > Working with Profiles > Generate New Profiles without a Model
Generate New Profiles without a Model
To generate new profiles without a model:
1. On the Profiles list page, click the New button at the top of the profiles list.
A dialog box opens with profile options you can configure.
2. In the dialog box, enter or select values for the following options:
Profile Name – Enter a name for the new set of profiles.
Profile Description – Enter an optional description for the new set of profiles. Text length is limited to 2000 characters.
Data from Existing Dataset – Provide the following information about the dataset you want to generate profiles from:
Dataset – Select the dataset from which to generate profiles.
Goal – Select a goal variable from the selected dataset.
Filter – Optionally, select a filter to apply to the dataset. Alternately, click the Create Filter button to define a new filter for the dataset and then apply it to the profiles generation process. For more information, see Create a Data Filter.
Upload New Data – Click to select new data to upload for the profiles, instead of using the Data from Existing Dataset option. A New Dataset dialog box opens and you will be prompted to upload a JSON metadata file and a CSV file containing data.
Exclude Features – Click to select specific features of the dataset to be excluded from the profiles. A dialog box opens where you can select features to add to or remove from an exclusion list.
Maximize the Goal – Indicates whether you want to look for profiles that will maximize the goal outcome. This check box is selected by default.
Calculation Method – Choose the method that will be used to select sub-populations to include in a profile. The default option is Z Scores (distance from the mean, adjusted for sub-population size) which is more likely to find larger sub-populations that are statistically distinct. The other option is Distance from Mean (not adjusted for sub-population size) which is more likely to find smaller sub-populations of outliers.
Allow Overlaps – Indicates whether you want to identify profiles that are mutually exclusive or overlapping. This check box is not selected by default.
Binning Strategy – Indicates which binning technique should be used to fine-tune the profiles search. Uniform = bins are all of equal width. Density = bin width is determined so that each bin has an equal number of records.
Minimum Population % – A threshold that indicates what percentage of the population must exhibit a given attribute for a profile to be identified. Acceptable values must be greater than 0 but less than 100. Default = 0.25.
Minimum Population % – A threshold that indicates what percentage of the population must exhibit a given attribute for a profile to be identified. Acceptable values must be greater than 0 but less than 100. Default = 0.25.
Max Number of Profiles to be Created – Indicates the maximum number of profiles that should be identified in the dataset. Acceptable values are from 1 to 100. Default = 10.
Max Depth – Indicates how many levels deep the search for profiles should look. In other words, what is the maximum number of features each profile can consider concurrently. The recommended value for best viewing in the chart is 1 to 4. Default = 4.
3. Text Data Only – Provide the following information when generating profiles on text data. These parameters are enabled only when the dataset contains free-form text data of OpType TEXT.
Max N-gram Size – The maximum size of the text units to count. A value of 1 indicates that every word is counted. A value of 2 indicates that phrases of two consecutive words are counted, in addition to individual words. Default = 1.
Max Vocabulary Size – The maximum number of words or phrases for each n-gram size that can be included in the vocabulary. Default = 1000 for each n-gram size.
Min Document Freq. – A threshold filter to count only words or phrases that appear with a minimum level of frequency across rows. Values can range from 0.0 (inclusive) to 1.0 (exclusive). A value of 0 indicates that every word or phrase is counted with no filtering. A value of 0.1 indicates that words and phrases are only counted if they appear in at least 10% of rows in the dataset. Default = 0.
For more information about working with TEXT data, see Transforming Free-Form Text for Analysis.
4. Click Submit.
The dialog box closes and the profile generation process starts. The new profiles job appears in the list at the top of the Profiles page. The State column shows the status of the profiles generation process. When the job is complete, the new profiles table is displayed in the bottom section of the page.
Was this helpful?