Local feature extraction is one of the most important tasks to build robust video representation in human action recognition. Recent advances in computing visual features, especially deep-learned features, have achieved excellent performance on a variety of action datasets. However, the extraction process is computing-intensive and extremely time-consuming when conducting it on large-scale video data. Consequently, to extract video features over big data, most of the existing methods that run on single machine become inefficient due to the limit of computation power and memory capacity. In this paper, we propose the elastic solutions for feature extraction based on the Spark framework. Particularly, exploiting the in-memory computing capability of Spark, the process of computing features are parallelized by partitioning video data into videos or frames and place them into resilient distributed datasets (RDDs) for the subsequent processing. Then, we present the parallel algorithms to extract the state-of-the-art deep-learned features on the Spark cluster. Subsequently, using the distributed encoding, the extracted features are aggregated into the global representation which is fed into the learned classifier to recognize actions in videos. Experimental results on a benchmark dataset demonstrate that our proposed methods can significantly speed up the extraction process and achieve the promising scalability performance.