How to treat differentials and infinitesimals?

Question

In my Calculus class, my math teacher said that differentials such as $dx$ are not numbers, and should not be treated as such.

In my physics class, it seems like we treat differentials exactly like numbers, and my physics teacher even said that they are in essence very small numbers.

Can someone give me an explanation which satisfies both classes, or do I just have to accept that the differentials are treated differently in different courses?

P.S. I took Calculus 2 so please try to keep the answers around that level.

P.S.S. Feel free to edit the tags if you think it is appropriate.

user12029 · Accepted Answer

(I'm addressing this from the point of view of standard analysis)

I don't think you will have a satisfactory understanding of this until you go to multivariable calculus, because in calculus 2 it's easy to think that $frac{d}{dx}$ is all you need and that there's no need for $frac{partial}{partial x}$ (This is false and it has to do with why in general derivatives do not always behave like fractions). So that's one reason why differentials are not like numbers. There are some ways that differentials are like numbers, however.

I think the most fundamental bit is that if you're told that $f dx=dy$, this means that $y$ can be approximated as $y(x)=y(x_0)+fcdot(x-x_0)+O((x-x_0)^2)$ close to the point $x_0$ (this raises another issue*). Since this first order term is really all that matters after one applies the limiting procedures of calculus, this gives an argument for why such inappropriate treatment of differentials is allowable - higher order terms don't matter. This is a consequence of Taylor's theorem, and it is what allows your physics teacher to treat differentials as very small numbers, because $x-x_0$ is like your "dx" and it IS a real number. What allows you to do things you can't do with a single real number is that that formula for $y(x)$ holds for all $x$, not just some x. This lets you apply all the complicated tricks of analysis.

If I get particularly annoyed at improper treatment of differentials and I see someone working through an example where they write, "Now we take the differential of $x^2+x$ giving us $(2x+1)dx$", I may imagine $dx$ being a standard real number, and that there's a little $+O(dx^2)$ tacked off to the side.

Your math teacher might argue, "You don't know enough about those theorems to apply them properly, so that's why you can't think of differentials as similar to numbers", while your physics teacher might argue, "The intuition is the really important bit, and you'd have to learn complicated math to see it as $O(dx^2)$. Better to focus on the intuition."

I hope I cleared things up instead of making them seem more complicated.

*(The O notation is another can of worms and can also be used improperly. Using the linked notation I am saying "$y(x)-y(x_0)-fcdot(x-x_0)=O((x-x_0)^2)$ as $xto x_0$". Note that one could see this as working against my argument - It's meaningless to say "one value of $x$ satisfies this equation", so when written in this form (which your physics prof. might find more obtuse and your math prof. might find more meaningful) it's less of an equation and more of a logical statement.)

See also: https://mathoverflow.net/questions/25054/different-ways-of-thinking-about-the-derivative

Urs Schreiber · Answer

There is an old tradition, going back all the way to Leibniz himself and carried on a lot in physics departments, to think of differentials intuitively as "infinitesimal numbers". Through the course of history, big minds have criticized Leibniz for this (for instance the otherwise great Bertrand Russell in Chapter XXXI of "A History of Western Philosophy" (1945)) as being informal and unscientific.

But then something profound happened: William Lawvere, one of the most profound thinkers of the foundations of mathematics and of physics,  taught the world about topos theory and in there about "synthetic differential geometry". Among other things, this is a fully rigorous mathematical context in which the old intuition of Leibniz and the intuition of plenty of naive physicists finds a full formal justification. In Synthetic differential geometry those differentials explicitly ("synthetically") exist as infinitesimal elements of the real line.

A basic exposition of how this works is on the nLab at

differentiation -- Exposition of differentiation via infinitesimals

Notice that this is not just a big machine to produce something you already know, as some will inevitably hasten to think. On the contrary, this leads the way to the more sophisticated places of modern physics. Namely the "derived" or "higher geometric" version of synthetic differential geometry includes modern D-geometry which is at the heart for instance of modern topics such as BV-BRST formalism (see e.g. Paugam's survey) for the quantization of gauge theories, or for instance geometric Langlands correspondence, hence S-duality in string theory.

Kagaratsch · Answer

I think your math teacher is right. One way to see that differentials are not normal numbers is to look at their relation to so called 1-forms. I do not know if you already have had forms in calculus 2, but it is easy to look up on the internet.

Since you chose a tag "integrals" in your question, let me give you an example based on an integral. Let's say you have a function $f(x^2+y^2)$ and want to integrate it over some area $A$:

$$int_A f(x^2+y^2) dx dy$$

The important thing to realize here is, that the $dxdy$ is actually just an abbreviation for $dxwedge dy$. This $wedge$ thingy is an operation (wedge product - much like multiplication, but with slightly different rules) that can combine forms (in this case it combines two 1-forms to a 2-form). One important rule for wedge products is anti-commutation:

$$dxwedge dy=-dywedge dx$$

This makes sure that $dxwedge dx=0$ (where a physicist could cheat by saying that he neglects everything of order $O(dx^2)$, but that is like mixing pears and apples, frankly misleading). Why would differentials in integrals behave like this and where is the physical meaning? Well, here you can think about the 'handedness' of a coordinate system. For instance the integration measure $dxwedge dywedge dz$ is cartesian 'right-handed'. You can make it 'left-handed' by commuting the $dx$ with $dy$ to obtain $-dywedge dxwedge dz$, but then the minus sign appears in front, which makes sure that your integration in a 'left-handed' coordinate system still gives you the same result as the initial 'right-handed' one.

In any case, to come back to the above integral example, let's say you like polar coordinates better to perform your integration. So you do the following substitution (assuming you already know how to take total differentials):

$$x = r cos phi~~~,~~~dx = dr cos phi - dphi, r sin phi$$
$$y = r sin phi~~~,~~~dy = dr sin phi + dphi, r cos phi$$

Multiplying out your $dxwedge dy$ you find what you probably already know and expect:

$$dxwedge dy = (dr cos phi - dphi, r sin phi)wedge(dr sin phi + dphi, r cos phi)$$
$$ = underbrace{drwedge dr}_{=0} sin phicos phi + drwedge dphi, r cos^2 phi  - dphiwedge dr, r sin^2 phi - underbrace{dphiwedge dphi}_{=0}, r^2 cos phi sin phi $$
$$=r(drwedge dphi cos^2 phi  - dphiwedge dr sin^2 phi)$$
$$=r(drwedge dphi cos^2 phi  + drwedge dphi sin^2 phi)$$
$$=r, drwedge dphi ( cos^2 phi  + sin^2 phi)$$
$$=r, drwedge dphi $$

With this the integral above expressed in polar coordinates will correctly read:

$$int_A f(r^2)r, dr dphi$$

Where we suppressed the wedge product here. It is important to realize, that if we would not have treated the differentials as 1-forms here, the transformation of the integration measure $dx dy$ into the one involving $dr$ and $dphi$ would not have worked out properly!

Hope this example was down to earth enough and provides some feeling for how differentials are not entirely very small numbers.

Tom-Tom · Answer

In mathematics the notation $defd{mathrm d}d x$ is actually a linear form, this means that $d x$ is a linear function taking a vector a giving a scalar.

Let us take a differentiable function $f$ defined over $defR{mathbf R}R$ and consider it at point $a$. The tangent to the curve of $f$ at the point $a$ has a slope $f'(a)$. The point on this tangent of abscissa $b$ has ordinate $f_a(b)=f(a)+(b-a)f'(a)$. $f_a(b)$ is the linear approximation of $f(b)$ knowing $f$ at point $a$.

We define then $d x(b-a)=b-a$. We have 
$$f_a(b)-f(a)=f'(a)d x(b-a),tag{1}$$ 
and we write $$d f_a=f'(a)d x$$
which is the formula (1) written for linear forms. Indeed the linear form $d f_a$ is defined by 
$$d f_a(epsilon)=f'(a)d x(epsilon)=f'(a)epsilon.$$

In physics one often makes the confusion between $d x$ (the linear form) and $epsilon$ (the argument of $d x$). I hope you understand why when looking at the last equation.

NOTE.
This may seem quite useless but in dimension $n>1$ this becomes more interesting. You have indeed 
$$
defvec#1{boldsymbol{#1}}
defder#1#2{frac{partial #2}{partial #1}}
d f_{vec a}=nabla f(vec a)cdotdvec r=begin{pmatrix}der {x_1}{f(vec a)}vdotsder {x_n}{f(vec a)}end{pmatrix}cdot begin{pmatrix}d x_1vdotsd x_nend{pmatrix}$$
that translates into, for $vecepsilon=(epsilon_1,dots,,epsilon_k)inR^n$,
$$ d f_{vec a}(vecepsilon)=sum_{k=1}^n der{x_k}{f(vec a)}d x_k(vecepsilon)=sum_{k=1}^nder{x_k}{f(vec a)}epsilon_k,$$
because $d x_k(vecepsilon)=epsilon_k$ ($d x_k$ is the $k^{rm th}$ coordinate form).

Tobias · Answer

As you see from the variety of answers there are many possibilities to interpret differentials mathematically exact.

One nice simple interpretation is as coordinates of tangential vectors.

Consider an equation
$$
z = f(x,y)
$$
describing a curved surface in three-dimensional space ($z$ is the height).

Then the equation
$$
dz = frac{partial}{partial x} f(x,y) cdot dx + frac{partial}{partial y} f(x,y) cdot dy
$$
describes the points $(bar x,bar y,bar z)=(x+dx,y+dy,z+dz)$ of the tangential plane at the point $(x,y,z)$ on the surface. This equation is often named tangent equation.

If you have some specific point $(x,y,z)$ given by coordinate values as numbers and would like to have also a specific point on the tangent plane just put numbers in for $dx$, $dy$ and $dz$. Thus, the differentials can stand for numbers. Why not.

So far so good. Now, why should the numbers be small?
We assume that the surface is smooth at the point $(x,y,z)$, meaning that $f$ should be continuously differentiable there. Then
$$
frac{z+dz - f(x+dx,y+dy)}{|(dx,dy)|}rightarrow 0 quadtext{ for } |(dx,dy)|rightarrow 0
$$
where $dz$ fulfills the above tangent equation.
Here $|(dx,dy)|=sqrt{dx^2 + dy^2}$ denotes the Euclidian norm.

The division by $|(dx,dy)|$ lets us look at a scaled picture of the surface around the point $(x,y,z)$. To keep angles as they are we scale the picture evenly in all directions. The picture is always scaled such that the disturbance $(dx,dy)$ from the point $(x,y,z)$ is in the order of magnitude of 1. Even in this up-scaled picture the height $z+dz$ of the disturbed point $(x+dx,y+dy,z+dz)$ on the tangential plane fits better and better the corresponding height $f(x+dx,y+dy)$ on the curved surface.

$sum$: The tangent plane with the local coordinates $dx$, $dy$ and $dz$ fits the better the curved surface the smaller the disturbations $dx,dy,dz$ are.

To clarify things let us consider an example. Let the curved surface be
$$
z=x^2-y.
$$
We pick the specific point with $x=1$ and $y=2$ yielding $z=1^2-2 = -1$.
The tangent equation is
$$
dz = 2xcdot dx - dy,
$$
and at our specific point
$$
dz = 2 dx - dy.
$$
To have a specific point on the tangent plane let us consider the differentials $dx=frac14$ and $dy=1$ yielding
$$
dz = 2cdotfrac14 - 1 = -frac12.
$$

The location of this point on the tangent plane in 3d-space is
$(x+dx,y+dy,z+dy)=left(1+frac14,2+1,-1-frac12right)=left(frac54,3,-frac32right)$.

At the same $x$- and $y$-coordinates we get on the curved surface the height $z'$ with
$$
z' = f(x+dx,y+dy) = fleft(frac54,3right)
= left(frac54right)^2 - 3 = -frac{23}{16} = -1.4375.
$$
It is a little bit off the height $z+dz=-1.5$ of the corresponding point on the tangent plane.

Even if I presented here a numerical example in practice the differentials are more often used as variables to determine relations between the differentials (with their interpretation as tangent coordinates).

In the context of tangent coordinates the differential quotient $frac{dy}{dx}=f'(x)$ is the ratio of the coordinates $dx$ and $dy$ of the tangent on the graph of $f$ at $x$.

As long as you avoid division by zero you can divide through a differential $dx$ (as tangent coordinate).

Guill · Answer

With the objective of keeping complexity to a minimum, the best "unifying" solution, is to think of differentials, infinitesimals, numbers, etc. as mathematical symbols to which certain characteristics, properties, and mathematical operations (rules), are applicable.

Since not all rules are applicable to all symbols, you need to learn which rules are applicable to a particular set of symbols.

Whether you are learning fractions, decimals, differentials, etc., just learn the symbols and their particular rules and operations and that will be sufficient for 99% of the time.

Mikhail Katz · Answer

There is an old tradition going back all the way to Leibniz himself to think of differentials intuitively as "infinitesimal numbers". Through the course of history, big minds have criticized Leibniz for this. Thus, Russell accepted Cantor's claim that infinitesimals are inconsistent and even reproduced it in his book Principles of Mathematics in 1903.

But then something profound happened in 1961: Abraham Robinson, one of the most profound thinkers of the foundations of mathematics, taught the world a rigorous construction of infinitesimals in the traditional framework of the Zermelo-Fraenkel set theory, expressed in terms of the theory of types. Among other things, this is a fully rigorous mathematical context in which the old intuition of Leibniz and the intuition of plenty of naive physicists finds a full formal justification. In Robinson's framework those differentials explicitly exist as infinitesimal elements of a suitable real closed field.

A detailed exposition of how this works is in Robinson's 1966 book but simpler treatments have been developed since, such as the books by Martin Davis or by Robert Goldblatt, including exposition of differentiation via infinitesimals.

Notice that this is not just a big machine to produce something you already know, as some will inevitably hasten to think. On the contrary, this leads the way to the more sophisticated places of modern physics, as developed in detail in the book by Albeverio et al.:

Albeverio, Sergio; Høegh-Krohn, Raphael; Fenstad, Jens Erik; Lindstrøm, Tom. Nonstandard methods in stochastic analysis and mathematical physics. Pure and Applied Mathematics, 122. Academic Press, Inc., Orlando, FL, 1986. xii+514 pp.

Note 1. Lawvere's contribution in the framework of category theory dates from the 1970s.

Note 2. (In response to user Ovi's question) Robinson's framework is part of traditional analysis in the sense that it uses the traditional Zermelo-Fraenkel foundations and classical logic (as opposed to Lawvere's approach which relies on intuitionistic logic in a break with classical mathematics). Robinson's framework is an active research area today, featuring its own journal: Journal of Logic and Analysis (see http://logicandanalysis.org/) and an ever increasing number of monographs; most recently by Loeb and Wolff (see http://www.springer.com/us/book/9789401773263).

Charles Francis · Answer

The rigorous mathematical meaning of infinitesimals is given in analysis using the $epsilon$-$delta$ definition of a limit. As far as physics is concerned $epsilon$
generally refers to experimental precision, or margin of error. It means that $delta$ is a small enough number then making it any smaller makes no practical difference to predictions.
The formal $epsilon$-$delta$ definition of a limit strictly does not mean the end point of an infinite process, but simply means that to continue the process further is empirically meaningless.
The reason mathematicians will not use $dx$ as a number is that $dx$ is only defined as part of an expression, not as something in itself. In physics $dx$ can be taken to mean $delta x$, a number sufficiently small that experimental precision is not affected.

How to treat differentials and infinitesimals?

8 Answers

Add your own answers!

Ask a Question