TransWikia.com

Notation for features (general notation for continuous and discrete random variables)

Data Science Asked by Yael M on January 31, 2021

I’m looking for the right notation for features from different types.
Let us say that my samples as $m$ features that can be modeled with $X_1,…,X_m$. The features Don’t share the same distribution (i.e. some categorical, some numerical, etc.). Therefore, while $X_i$ might be a continuous random variable, $X_j$ could be a discrete random variable.

Now, given a data sample $x=(x_1,…,x_m)$, I want to talk about the probability, for example, $P(X_k=x_k)<c$. But $X_k$ might be a continuous variable (i.e. the height of a person). Therefore, $P(X_k=x_k)$ will always be zero. However, it can also be a discrete variable (i.e. categorical feature or number of kids).

I’m looking for a notation that is equivalent to $P(X_k=x_k)$ but can work for both continuous and discrete random variables.

2 Answers

As far as I am concerned, there is no distinction between a continuous and a discrete variable when it comes to notation. So $P(X_k=x_k)$ is perfectly fine for either.

Answered by Valentin Calomme on January 31, 2021

Maybe relying on set notation would work?

$P(X_k in s_k)$ where:

  • $s_k = { x_k }$ if $X_k$ is discrete
  • $s_k = [ x_k-epsilon , x_k+epsilon]$ if $X_k$ is continuous

Answered by Erwan on January 31, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP