Comparative Study of CNN and Transformer Models for Image Classification

Authors

  • Dr. Mandeep Kaur

DOI:

https://doi.org/10.63856/shncer44

Keywords:

CNN, Vision Transformer, Image Classification, Deep Learning, Comparative Analysis

Abstract

The introduction of deep learning architectures has brought about a tremendous change in image classification. Although Convolutional Neural Networks (CNNs) has received widespread use over the last decade because they xhibit superior feature-extraction performance, during the last few years, transformer-based models have shown to be performing equally or even better in a variety of benchmarks. This paper is based on the main aim to compare CNN and transformer models in relation to image classification and their performance in terms of accuracy, computational cost, interpretability, and robustness. The experiment is performed using CIFAR-10 and a small group of ImageNet by evaluating a classical ResNet-50 and a Vision Transformer (ViT-Base). Findings indicate that CNNs are very efficient though it takes a shorter period to train and their generalization, but the transformer models perform better with a bigger dataset and they also tend to capture the global image dependence better. The results show that transformers have potential in large-scale classification, whereas CNNs have useful benefits in resource-limited settings.

Downloads

Published

2026-01-27

How to Cite

Comparative Study of CNN and Transformer Models for Image Classification. (2026). International Journal of Integrative Studies (IJIS), 1(11), 30-35. https://doi.org/10.63856/shncer44

Similar Articles

11-20 of 46

You may also start an advanced similarity search for this article.