Reliability of power distribution systems is of very crucial concern due to cases of mass power outages that occur worldwide. Once an unscheduled outage takes place in power grids, the service restoration is triggered to rapidly return the system to normal conditions and minimize the severity of consequences. This paper proposes a self-healing power distribution grid restoration technique based on decentralized multi-agent systems with reinforcement learning. The system architecture is based on two types of zone agents: Inactive Zone Agent (IZA) and Active Zone Agent (AZA), where the IZA is activated provided that an agent is within the out-of-service area. This study contributes to the advancement of service restoration by endowing agents with learning ability. The reward computation proposed in this paper is based on the load priority factor, and also it ensures preserving the constraints within the limits. Case studies include a comparison of service restoration outcomes with load priority factor and DGs incorporated into the network. All simulations are implemented in the PowerWorld simulator for the medium voltage network of 11kV with 29 buses. The results of the study prove that embedding Q-learning algorithm into service restoration significantly improves the performance metrics and thus, increases the reliability of the distribution grids.