The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed is investigated as the second contribution of the research. The optimal values for g were found and the variation pattern of the action-value function (Q) in the learning process was obtained. A probabilistic policy for selecting actions is used in this work for the first time instead of greedy policies employed in previous works.