Comparative Study of CNN and Transformer Models for Image Classification
DOI:
https://doi.org/10.63856/shncer44Keywords:
CNN, Vision Transformer, Image Classification, Deep Learning, Comparative AnalysisAbstract
The introduction of deep learning architectures has brought about a tremendous change in image classification. Although Convolutional Neural Networks (CNNs) has received widespread use over the last decade because they xhibit superior feature-extraction performance, during the last few years, transformer-based models have shown to be performing equally or even better in a variety of benchmarks. This paper is based on the main aim to compare CNN and transformer models in relation to image classification and their performance in terms of accuracy, computational cost, interpretability, and robustness. The experiment is performed using CIFAR-10 and a small group of ImageNet by evaluating a classical ResNet-50 and a Vision Transformer (ViT-Base). Findings indicate that CNNs are very efficient though it takes a shorter period to train and their generalization, but the transformer models perform better with a bigger dataset and they also tend to capture the global image dependence better. The results show that transformers have potential in large-scale classification, whereas CNNs have useful benefits in resource-limited settings.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Integrative Studies (IJIS)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.



