Stack Overflow Asked by Kirsten Greed on January 12, 2021
As a C# developer I curious about the formula parameter in R’s lm function
As in the following code
wine<-read.csv("wine_train.csv")
fit<-lm(formula=quality ~ ., data=wine)
In the help shown by
?lm
It says
formula
an object of class "formula" (or one that can be coerced to
that class): a symbolic description of the model to be fitted.
In the help shown by
? formula
It says
Model Formulae
The generic function formula and its
specific methods provide a way of extracting formulae which have been
included in other objects.
So I am confused as to whether formula is a class or a function or an "expression" whatever that is.
In C# we have delegates, lambdas and actions that all seem useful to treat code snippets as memory variables.
R is strongly but dynamically typed where as c# is mostly statically typed
Is there a useful parallel in C# for the formula in R?
[Update]
I get that the left and right side of the tilde act as parameters to be used in lm
Technically, a formula object is an unevaluated function call, along with an associated environment:
f <- as.formula(y ~ x1 + x2)
> str(f)
Class 'formula' language y ~ x1 + x2
..- attr(*, ".Environment")=<environment: R_GlobalEnv>
> is.call(f)
[1] TRUE
Thinking of the y ~ x1 + x2
part as an unevaluated function call is a bit misleading, because we wouldn't typically really use it like we would other functions. But you can see it really is stored that way:
> f[[1]]
`~`
> f[[2]]
y
> f[[3]]
x1 + x2
i.e. it's a quoted string and two arguments, matching the description of a call from ?call
. Of course, we wouldn't really think of ~
as a function in R, but it is one if you check help("~")
you'll see it's a primitive. Using an unevaluated call in this way is one of R's more unique features, in that it is leveraging this ability to compute on the language itself to do something nominally unrelated: create a new "symbolic" object to store the conceptual definition of a model.
You can see how the formula object is actually used in this sense in model.frame.default
, although that code is....fairly dense itself. The basic idea is that R is piggybacking on this ability to create functions with associated environments to allow the formula to be "evaluated" in the context of that environment, like a function would. But instead of "calling a function" it's more a vehicle for transporting model information that is then used to build the matrices you need for the model fitting process.
Correct answer by joran on January 12, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP