# Shannon's information theory and foundations of mathematics

Ellerman, David david at ellerman.org
Sat Jun 25 05:47:08 EDT 2022

```Disinformation! Shannon's theory gives a negative mutual information for
certain sets of three random variables, e.g., in the standard example that
pairwise independence for three variables is not the same as joint
independence. The joke is: "We didn't know what "negative information" was
until Trump got elected."
That is why Shannon himself never defined mutual information for three or
more variables. But all of Shannon's compound notions of joint,
conditional, and mutual information satisfy the usual Venn diagrams as if
the Shannon formula was a measure on a set (which it is not). But the usual
inclusion-exclusion principle shows how Shannon information can be extended
to many-variable Venn diagrams where the negative mutual information pops
up. In the best book on the Shannon theory by Cover and Thomas, they have
the surprisingly casual statement "There isn’t really a notion of mutual
information common to three random variables." [p. 49]  without any
further explanation or analysis.
The connection to logic and foundations requires a little background.
Normally "logic" is identified as reasoning about propositions. But the
mathematical logic underlying propositional logic is the Boolean logic of
subsets of which propositional logic is the special case where the
universet set has one element. And mathematically (category theory), the
notion of subsets is dual to the notion of partitions (or equivalence
relations or quotient sets). Hence there is a dual mathematical logic, the
logic of partitions, that is equally basic from that mathematical viewpoint
as the logic of subsets. Each logic has a quantitative version. The
quantitative version of the Boolean logic of subsets was also developed by
Boole, namely finite probability theory. The quantitative version of the
logic of partitions is the theory of logical entropy which was recently
covered in a Special Issue of the open-access journal 4Open here
entropy is a measure in the sense of measure theory so it naturally has all
the Venn diagram notions of simple, joint, conditional, and mutual logical
information. Moreover, all those compound notions of Shannon entropy are
the result of a non-linear dit-bit transform that transforms logical
entropy into Shannon entropy, and that transform preserves Venn
diagrams--which accounts for Shannon entropy satisfying those diagrams when
it is not a measure in the sense of measure theory. But the dit-bit
transform does not preserve non-negativity of the logical entropy
measure--which allows the negative mutual Shannon information.
Best,
David Ellerman
www.ellerman.org

On Sat, Jun 25, 2022 at 12:41 AM Vaughan Pratt <pratt at cs.stanford.edu>
wrote:

> Recently Rohit Parikh suggested to me that disinformation was not
> information.  As I've always considered disinformation about any given
> proposition to be less likely that the conventional wisdom about it, it
> seemed to me that with Shannon's information theory, a less likely message
> contains more information than a more likely one.  Hence in particular
>
> Is there a foundational way of approaching these seemingly conflicting
> notions of information that isn't too wildly ad hoc?
>
> Vaughan Pratt
>

--
__________________
David Ellerman

Email: david at ellerman.org