The proliferation of street view images (SVIs) and the constant advancements in deep learning techniques have enabled urban analysts to extract and evaluate urban perceptions from large-scale urban streetscapes. However, many existing analytical frameworks have been found to lack interpretability due to their end-to-end structure and "black-box" nature, thereby limiting their value as a planning support tool. In this context, we propose a five-step machine learning framework for extracting neighborhood-level urban perceptions from panoramic SVIs, specifically emphasizing feature and result interpretability. By utilizing the MIT Place Pulse data, the developed framework can systematically extract six dimensions of urban perceptions from the given panoramas, including perceptions of wealth, boredom, depression, beauty, safety, and liveliness. The practical utility of this framework is demonstrated through its deployment in Inner London, where it was used to visualize urban perceptions at the Output Area (OA) level and to verify against real-world crime rate.
Artificial Intelligence, Environmental sciences