8 Fisher’s information

A second measure of disorder, besides entropy, exists which is called Fisher information [9]. The importance of this second type of “entropy”for the mathematical form of the laws of physics - in particular for the terms related to the kinetic energy - has been stressed in a number of publications by Frieden and coworkers [1011] and has been studied further by Hall [16], Reginatto [36] and others. The Fisher functional I [ρ ] is defined by
           ∫                (    ′     )2
                               ρ  (x )
I [ρ ] =       dx   ρ (x  )    --------     ,
                               ρ (x  )

where ρ ′ denotes ∂ ρ ∕ ∂ x in the present one-dimensional case, and the n -component vector D  ρ  =   ( ∂ ρ ∕ ∂ x1,   ...,   ∂ ρ ∕ ∂ xn  ) if x   =   (x1,   ...,  xn   ) . Since the time variable t does not play an important role it will frequently be suppressed in this section.

The Boltzmann-Shannon entropy (44) and the Fisher information (48) have a number of crucial statistical properties in common. We mention here, for future reference, only the most important one, namely the composition law (41); a more complete list of common properties may be found in the literature [48]). Using the notation introduced in section 7 [see the text preceeding Eq. (41)] it is easy to see that Eq. (48) fulfills the relation
I [ρ   ρ  ] =   I (1 )[ρ   ] +  I (2 )[ρ   ],
     1   2               1               2

in analogy to Eq. (41) for the entropy S . The most obvious difference between (44) and (48) is the fact that (48) contains a derivative while (44) does not. As a consequence, extremizing these two functionals yields fundamentally different equations for ρ , namely a differential equation for the Fisher functional I and an algebraic equation for the entropy functional S .

The two measures of disorder, S and I , are related to each other. To find this relation, it is necessary to introduce a generalized version of (44), the so-called “relative entropy”. It is defined by
                                         ρ (x )
G  [ρ, α  ] =   -       dx   ρ (x  ) ln         ,
                                         α (x  )

where α (x  ) is a given probability density, sometimes referred to as the “prior” [the constant k in Eq. (44) has been suppressed here]. It provides a reference point for the unknown ρ ; the best choice for ρ is to be determined from the requirement of maximal relative entropy G  [ρ,  α ] under given constraints, where α represents the state of affairs (or of our knowledge of the state of affairs) prior to consideration of the constraints. The quantity -   G  [ρ,  α ] agrees with the “Kullback-Leibler distance” between two probability densities ρ and α  [28].

It has been pointed out that “all entropies are relative entropies” [6]. In fact, all physical quantities need reference points in order to become observables. The Boltzmann-Shannon entropy (44) is no exception. In this case, the ‘probability density’ α  (x ) is a number of value 1 , and of the same dimension as ρ (x  ) ; it describes absence of any knowledge or a completely disordered state. We mention also two other more technical points which imply the need for relative entropies. The first is the requirement to perform invariant variable transformations in the sample space [6], the second is the requirement to perform a smooth transition from discrete to continuous probabilities [18].

Thus, the concept of relative entropies is satisfying from a theoretical point of view. On the other hand it seems to be useless from a practical point of view since it requires - except in the trivial limit α  =   1 - knowledge of a new function α  (x  ) which is in general just as unknown as the original unknown function ρ (x  ) . A way out of this dilemma is to identify α (x  ) with a function ρ (T   x ) , which can be obtained from ρ (x ) by replacing the argument x by a transformed argument T  x . In this way we obtain from (50) a quantity G  [ρ;  pT  ] which is a functional of the relevant function ρ alone; in addition it is an ordinary function of the parameters pT characterizing the transformation. The physical meaning of the relative entropy remains unchanged, the requirement of maximal relative entropy G  [ρ;  p   ]
          T becomes a condition for the variation of ρ in the sample space between the points T  x and x .

If further consideration is restricted to translations T  x  =   x  +   Δx (it would be interesting to investigate other transformations, in particular if the sample space agrees with the configuration space) then the relative entropy is written as

G   [ρ; Δx   ] =   -        dx   ρ (x ) ln                   .
                                            ρ (x   +  Δx    )

Expanding the integrand on the r.h.s. of (51) up to terms of second order in Δx and using the fact that ρ and ρ ′ have to vanish at infinity one obtains the relation
                .     Δx
G  [ρ;  Δx   ] =   -  ------- I [ρ ].

This, then is the required relation between the relative entropy G and and the Fisher information I ; it is valid only for sufficiently small Δx . The relative entropy G  [ρ; Δx   ] cannot be positive. Considered as a function of Δx it has a maximum at Δx    =   0 (taking its maximal value 0 ) provided I  >   0 . This means that the principle of maximal entropy implies no change at all relative to an arbitrary reference density. This provides no criterion for ρ since it holds for arbitrary ρ . But if (52) is considered, for fixed Δx , as a functional of ρ , the principle of maximal entropy implies, as a criterion for the spatial variation of ρ , a principle of minimal Fisher information.

Thus, from this overview (see Frieden‘s book [11] for more details and several other interesting aspects) we would conclude that the principle of minimal Fisher information should not be considered as a completely new and exotic matter. Rather it should be considered as an extension or generalization of the classical principle of maximal disorder to a situation where a spatially varying probability exists, which contributes to disorder. This requires, in particular, that this probability density is to be determined from a differential equation and not from an algebraic equation. We conclude that the principle of minimal Fisher I is very well suited for our present purpose. As a next step we have to set up proper constraints for the extremal principle.