extending probabilistic formal frameworks to propositions more general than just arithmetic

Wed Jul 13 23:44:40 EDT 2022

X.Y. Newberry wrote:

> The Shannon information of disinformation cannot be negative. Even if the
> propositions coded by the sentences are false, they still had to be coded
> by a positive number of bits. In other words, regardless if your true
> knowledge of a subject matter increases or decreases, a finite number of
> bits had to be pushed through the communication channel.

I will explain my idea concerning the "Shannon information of
disinformation". Consider a family of probability distributions Pr[x;
theta] parametrized by theta, i.e., a statistical model. Let theta_0 and
theta_A be the null information (obtained from mainstream media as the main
communication channel) and the alternative information (disinformation
obtained from an alternative communication channel), respectively (I am
using this terminology in analogy with the theory of hypothesis testing). I
think that the amount of disinformation that makes agent Alice (audience)
change her views about reality from model theta_0 to model theta_A is the
relative Shannon entropy (aka KL divergence) of Pr[x; theta_0] from Pr[x;
theta_A]. The motivation for this definition is that disinformation can be
measured as the expected excess surprise from thinking that reality is
described by Pr[x; theta_A] when it is actually described by Pr[x; theta_0]
according to the official narrative. The larger the amount of
disinformation, the larger the effort the agent Dimitry (disinformation
source) needs to apply to Alice (audience) to change her conception of
reality.

I would like to point out the parallel between disinformation and having
the wrong model of a physical theory. Here is a quote from Ed Witten (I
transformed the formulae in the text when it wasn't possible to copy the
mathematical symbol):

One can motivate the definition of relative entropy as follows. Suppose
> that we are observing a random variable X, for example the final state in
> the decays of a radioactive nucleus. We have a theory that predicts a
> probability distribution Q_X for the final state, say the prediction is
> that the probability to observe final state X = x_i , where i runs over a
> set of possible outcomes {1, 2, · · · s}, is q_i = Q_X(x_i). But maybe our
> theory is wrong and the decay is actually described by some different
> probability distribution P_X, such that the probability of X = x_i is p_i =
> PX(x_i). After observing the decays of N atoms, how sure could we be that
> the initial hypothesis is wrong? [...] The chance of falsely excluding a
> correct hypothesis, because of a large fluctuation that causes the data to
> be more accurately simulated by P_X than by Q_X, decays for large N as 2 to
> the power product of N and the relative entropy of P_X from Q_X.

Witten, Edward. "A mini-introduction to information theory." *La Rivista
del Nuovo Cimento* 43.4 (2020): 187-227.
publication: https://link.springer.com/article/10.1007/s40766-020-00004-5
arxiv: https://arxiv.org/pdf/1805.11965.pdf
lecture: https://youtu.be/XYugyhoohhY

Notice that in my framework I can only consider propositions of the form
theta = x. Any suggestion to extend this framework to more general
propositions, specifically non-arithmetic propositions?

Kind regards,
Jose M.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/fom/attachments/20220713/20f4b3ef/attachment.html>