TransWikia.com

Classifying Data in Orange: What's the difference between 'Features', 'Target Variables', and 'Meta Attributes'

Data Science Asked on August 2, 2021

I’ve been looking for a more visual way to calculate statics on my spectroscopy data. Python’s Orange Canvas seems like it might be a great alternative to Pipeline Pilot, but I’m having trouble getting started with more simplistic data analyses.

Most notably, when I import data, I’m unclear as to what the difference between a ‘Target Variable’, ‘Feature’, and ‘Meta Attribute’ is.

How do I decide what type of data I have?

2 Answers

Meta variables are meta data, data about data, not used for statistical inference. Features or variables or attributes are the measured inputs of the problem domain, the independent variables. The target variable is the dependent variable or the measure we're trying to model or forecast. Not all problems can be or need to be formulated in such a way. Orange was traditionally designed to accommodate machine learning workflows, hence the naming.

Correct answer by K3---rnc on August 2, 2021

Meta Attributes and Features

Since asking this question, I've found the older Orange 2.7 documentation which at least does a better job describing "meta attributes" and "features":

meta attributes:

Generally meta attributes are names given to a particular sample

Meta attributes hold additional data attached to individual instances.

...

Meta attribute can be marked as “optional”. Non-optional meta attributes are expected to be present in all data instances from that domain. This rule is not strictly enforced. As one of the few places where the difference matters, saving to files fails if a non-optional meta value is missing; optional attributes are not written to the file at all. Also, newly constructed data instances initially have all the non-optional meta attributes.

While the list of features and the class value are immutable, meta attributes can be added and removed at any time

features:

Can be thought of a the properties that describe your class and/or sample

Immutable list of domain attributes without the class variable. Read only.


Variables

While this document doesn't specify what makes something a "Target Variable" it does provide some info about variables:

variables:

List of domain attributes including the class variable. Read only.

class_var:

The class variable (Descriptor) or None. Read only.

class_vars:

A list of additional class attributes. Read only.

Target Variable

My experience with Orange thus far has lead me to believe that the "target variable" is just the variable you want to run an analysis on (or use as a plot axis) with respect to some features.


Class and Domain

Since we are defining the above using the words "class" and "domain", we need should also define those as well.

class:

A class can be though of as describing the type of sample you have given your data's features. For instance: 'color', 'size', 'life span', and 'habitat' might describe the a "bird" class with types 'parrot', 'duck', 'seagul', and 'hawk'. However, if you were to add the boolean features 'hair' and 'fins' to your data set, your data would likely be describing a class 'animal type' with the descriptors such as 'bird', 'mammal', and 'fish'. 'Parrot', 'duck' and 'seagul' would then become "names" of animals within these classes and thus be meta attributes.

"Each data instance corresponds to an animal and is described by the animal’s properties and its type (the class)"

class value are immutable

A domain can have multiple additional class attributes. These are stored similarly to other features except that they are not used for learning. The list of such classes is stored in class_vars. When converting between domains, multiple classes can become ordinary features or the class, and vice versa.

domain:

In Orange, the term domain denotes a set of variables and meta attributes that describe data. A domain descriptor is attached to data instances, data tables, classifiers and other objects. A descriptor is constructed, for instance, after reading data from a file.

Domains consists of ordinary features (from “hair” to “catsize” in the above example), the class attribute (“type”), and meta attributes (“name”).

...

Domains behave like lists: the length of domain is the number of variables including the class variable. Domains can be indexed by integer indices, variable names or instances of Orange.feature.Descriptor

Answered by virtualxtc on August 2, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP