Image representation via machine learning is an approach to quantitatively represent histopathological images of head and neck tumors for future applications of artificial intelligence-assisted pathological diagnosis systems. Objective: This study compares image representations produced by a pre-trained convolutional neural network (VGG16) to those produced by a vision transformer (ViT-L/14) in terms of the classification performance of head and neck tumors. Methods: W hole-slide images of five oral t umor categories (n = 319 cases) were analyzed. Image patches were created from manually annotated regions at 4096, 2048, and 1024 pixels and rescaled to 256 pixels. Image representations were classified by logistic regression or multiclass Support Vector Machines for binary or multiclass classifications, respectively. Results: VGG16 with 1024 pixels performed best for benign and malignant salivary gland tumors (BSGT and MSGT) (F1 = 0.703 and 0.803). VGG16 outperformed ViT for BSGT and MSGT with all magnification levels. However, ViT outperformed VGG16 for maxillofacial bone tumors (MBTs), odontogenic cysts (OCs), and odontogenic tumors (OTs) with all magnification levels (F1 = 0.780; 0.874; 0.751). Conclusion: Being more texture-biased, VGG16 performs better in representing BSGT and MSGT in high magnification while the more shape-biased ViT-L/14 performs better in representing MBT, OC, and OT.


  1. Elazab N, Soliman H, El-Sappagh S, Islam SMR, Elmogy M. Objective diagnosis for histopathological images based on machine learning techniques: Classical approaches and new trends. Mathematics. 2020; 8(11):1863.

  2. Komura D, Kawabe A, Fukuta K, Sano K, Umezaki T, Koda H, Suzuki R, Tominaga K, Ochi M, Konishi H, Masakado F, Saito N, Sato Y, Onoyama T, Nishida S, Furuya G, Katoh H, Yamashita H, Kakimi K, Seto Y, Ushiku T, Fukayama M, Ishikawa S. Universal encoding of pan-cancer histology by deep texture representations. Cell Rep. 2022; 38(9):110424.

  3. Halicek M, Shahedi M, Little JV, Chen AY, Myers LL, Sumer BD, Fei B. Head and neck cancer detection in digitized whole-slide histology using convolutional neural networks. Sci Rep. 2019; 9(1):14043.

  4. Lou P, Wang C, Guo R, Yao L, Zhang G, Yang J, Yuan Y, Dong Y, Gao Z, Gong T, Li C. HistoML, a markup language for representation and exchange of histopathological features in pathology images. Sci Data. 2022; 9(1):387.

  5. Nagpal K, Foote D, Liu Y, Chen PC, Wulczyn E, Tan F, Olson N, Smith JL, Mohtashamian A, Wren JH, Corrado GS, MacDonald R, Peng LH, Amin MB, Evans AJ, Sangoi AR, Mermel CH, Hipp JD, Stumpe MC. Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer. NPJ Digit Med. 2019; 2:48.

  6. Shmatko A, Ghaffari Laleh N, Gerstung M, Kather JN. Artificial intelligence in histopathology: Enhancing cancer research and clinical oncology. Nat Cancer. 2022; 3(9):1026-38.

  7. Lin TY, Maji S. Visualizing and Understanding Deep Texture Representations [Internet]. arXiv; 2016 [cited 2022 Oct 12]. Available from: http:// arxiv.org/abs/1511.05197.

  8. Xu Y, Li F, Chen Z, Liang J, Quan Y. Encoding Spatial Distribution of Convolutional Features for Texture Representation. In: Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Vaughan JW, editors. Advances in Neural Information Processing Systems [Internet]. Curran Associates, Inc.; 2021. p. 22732-44.

  9. Geirhos R, Rubisch P, Michaelis C, Bethge M, Wichmann FA, Brendel W. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness [Internet]. arXiv; 2019 [cited 2022 Oct 12]. Available from: http://arxiv.org/abs/1811.12231.

  10. Naseer M, Ranasinghe K, Khan S, Hayat M, Khan FS, Yang MH. Intriguing Properties of Vision Transformers [Internet]. arXiv; 2021 [cited 2022 Oct 12]. Available from: http://arxiv.org/ abs/2105.10497.

  11. akamoto K. Morita K ichi, Ikeda T, Kayamori K. Deep-learning-based identification of odontogenic keratocysts in hematoxylin- and eosin-stained jaw cyst specimens [Internet]. arXiv; 2019 [cited 2022 Oct 12]. Available from: http://arxiv.org/ abs/1901.03857.

  12. Liu M, Yi M, Wu M, Wang J, He Y. Breast pathological image classification based on VGG16 feature concatenation. J Shanghai Jiaotong Univ (Sci). 2022; 27(4):473-84.

  13. Zhou P, Cao Y, Li M, Ma Y, Chen C, Gan X, Wu J, Lv X, Chen C. HCCANet: Histopathological image grading of colorectal cancer using CNN based on multichannel fusion attention mechanism. Sci Rep. 2022; 12(1):15103.

  14. Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S. Learning Transferable Visual Models From Natural Language Supervision [Internet]. arXiv; 2021 [cited 2022 Oct 12]. Available from: http://arxiv.org/abs/2103.00020.

  15. Si monya n K , Zi s s e r m a n A . Ve r y D e e p Convolutional Networks for Large-Scale Image Recognition. 2014 [cited 2022 Oct 17]; Available from: https://arxiv.org/abs/1409.1556.

  16. Chowdhury A, Jiang M, Chaudhuri S, Jermaine C. Few-shot Image Classification: Just Use a Library of Pre-trained Feature Extractors and a Simple Classifier. 2021 [cited 2022 Oct 17]; Available from: https://arxiv.org/abs/2101.00562.

  17. Seo S, Kim Y, Han HJ, Son WC, Hong ZY, Sohn I, Shim J, Hwang C. Predicting successes and failures of clinical trials with outer product-based convolutional neural network. Front Pharmacol. 2021; 12:670670.

  18. Xu Y, Jia Z, Wang LB, Ai Y, Zhang F, Lai M, Chang EI. Large scale tissue histopathology image classification, segmentation, and visualization via deep convolutional activation features. BMC Bioinformatics. 2017; 18(1):281.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.