a foundational approach to artificial hallucinations in reinforcement learning

Patrik Eklund peklund at cs.umu.se
Thu Mar 9 01:28:17 EST 2023

Thanks, José. This is very good. Reinforcement is needed.

It may be a heating/cooling cycle, it may also be a stabilizing cycle. 
The pole balancing problem is a typical example in some reinforcement 
techniques in particular where parameterized rules are "tuned" to 
balance as elegantly and precisely as possible. There may be stochastic 
modifiers involved to "shake" the system (heat it a bit) so that the 
learning/reinforcing mechanisms will cool it down.

All this is useful, but it's so numerical. It relies on diff calculus 
with reals, so some discrete diff calculus is needed, and it exists, for 
instance on differential lambda calculus (Ehrhardt et al) and 
differential categories (Blume et al).


And, if you ask me, we need many-valued logic. Some will say we don't 
because computationally everything boils down to binary. Maybe so, maybe 
not, but indeed, computers also use the stroke operator as a convenient 
and economic representation in computational formulas, so why not use a 
many-valued representation of truth if societal and business decision 
making calls for it. Or roughly speaking, the arithmetics with Likert 
scales is Pippi Longstocking math, but things like "quantaloid Likerts" 
might open up a wide spectrum of algebraic logic aspects, indeed where 
it would be desirable to have that diff calculus e.g. for reinforcing to 
compute "correctly" or "desirably".


Sorry to post now more than I usually do, so I'll try to keep quiet for 
a while.


Thanks also to Yu for illuminations that surely broadens our thinking.



On 2023-03-09 01:15, José Manuel Rodríguez Caballero wrote:

> Yu Li wrote about the conversation with:
>> I would like to share my conversations with ChatGPT about Gödel's 
>> incompleteness theorem.
> [...]
>> Overall, ChatGPT is surprisingly performing!
> Firstly, if systems based on reinforcement learning, such as ChatGPT, 
> are to perform mathematics in real-life scenarios, such as controlling 
> an industry automatically, then the foundations of this new method of 
> mathematical problem-solving should be incorporated into the field of 
> foundations of mathematics. Proof assistants, which are more stable 
> than reinforcement learning systems, are already a subject of 
> discussion in the foundations of mathematics community. Each category 
> of machine learning method that performs mathematical tasks should have 
> its own foundations, as it is unlikely that there exists a single 
> theory to encompass them all. Limiting the foundations of mathematics 
> to human-made mathematics is a bias known as an anthropocentric 
> worldview [2]. This bias can prevent us from recognizing forms of 
> non-human intelligence, such as animal cognition [3].
> Mikhail Gromov said that only mathematicians and schizophrenics may 
> trust a chain of ten consecutive arguments [1]. In life, one or two 
> arguments are sufficient because the chain tends to break down. It is 
> easy to estimate the length of the chain of consecutive arguments that 
> a given reinforcement learning system at a given time can handle when 
> performing mathematical tasks. Establish a hierarchy of elementary 
> mathematical problems that involve increasingly more non-trivial steps 
> in the solutions. I emphasize that these steps should be non-trivial 
> because the neural network easily handles trivial steps.
> As an experiment, which can be replicated by anyone, I challenged 
> ChatGPT to solve the following problem known as "Cows in the Meadow" 
> [4]: The grass in a meadow grew equally thick and fast. It was known 
> that 70 cows could eat it up in 24 days, while 30 cows could do it in 
> 60 days. How many cows would crop the grass of the whole meadow in 96 
> days? The solution is that 20 cows would have eaten up all the grass in 
> 96 days, but the reasoning involves several non-trivial steps and many 
> people get confused by that. After writing equations like a student on 
> the blackboard, ChatGPT concluded:
>> Therefore, 35 cows would be needed to eat all the grass in 96 days.
> I asked it:
>> Can you find an error in the above reasoning?
> and it answered "Yes" and the explanation of its own mistake included:
>> Equating the cow-days required by 70 cows and 30 cows, we get:
>> 70 x 24 = 30 x 60
>> 1680 = 1800
>> This is a contradiction, which means that there is no solution for the 
>> number of cows needed to eat the grass in 96 days with the given 
>> information.
> The task for foundations of mathematics concerning reinforcement 
> learning systems is not to increase their creativity, which is the data 
> scientist's job. Foundations of mathematics may be interested in 
> developing a method to prevent nonsensical solutions, also known as 
> artificial hallucinations. In humans, this issue was solved to some 
> extent by the axiomatic method. Therefore, the problem could be to 
> develop an axiomatic system that will resonate with the way 
> reinforcement learning works. This is a particular case of the 
> subaxiomatic foundations' goal: to develop an efficient foundation of 
> mathematics that is more computationally efficient than the 
> human-centered axiomatic foundations.
> From a statistical mechanics point of view, using reinforcement 
> learning is like cooling the system, and the user interacting with the 
> system is like heating it. When the temperature is too high, there will 
> be a phase transition, that is perceived by the user as an artificial 
> hallucination. The main theoretical problem is to find a device that 
> will shoot down the system before it becomes too hot. Maybe proof 
> assistants are that device.
> Kind regards,
> Jose M.
> [1] Mikhail Gromov - 1/2 Probability by Homology, URL: 
> https://youtu.be/buThBDcUYZI?t=1180
> [2] Boslaugh, S. E. (2016, January 11). anthropocentrism. Encyclopedia 
> Britannica. https://www.britannica.com/topic/anthropocentrism
> [3] Pepperberg, Irene M. "Grey parrot numerical competence: a review." 
> Animal cognition 9 (2006): 377-391.
> [4] Yakov Perelman, "Algebra Can Be Fun", MIR. 1979.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/fom/attachments/20230309/870a0af8/attachment.html>

More information about the FOM mailing list