Abstract:In response to the explosive growth of online video in multimedia social networks, the use of crawlers in stand-alone mode to extract new video pages is inefficient. a parallel algorithm based on Map/Reduce is proposed, which greatly improves the crawler efficiency. But in order to further handle the problem of data redundancy and reduce outdated page updates, a Improved Accuracy-aware incremental updating algorithm is proposed. The monitoring technique is used to monitor the web page changes, analyze the web page update mode, increase the freshness assessment and dimensionality reduction, and use the improved Mixed Integer Quadratic Programming(MIQP) so to make the optimal Refresh strategy. Experiments show that compared with the frequent refresh strategy in the stand-alone mode, the parallel incremental method achieves 79% of the information accuracy with the original refresh rate of 36.7%, and the crawler efficiency is improved by 167 times.