From Diffusion Models to Instruction-Tuned LLMs: A Unified Taxonomy and Trend Analysis of Generative AI
DOI:
https://doi.org/10.70112/ajcst-2026.15.1.4443Keywords:
Generative Artificial Intelligence, Large Language Models, Diffusion Models, Instruction Tuning, Taxonomy and TrendsAbstract
Generative AI has brought about swifter and more profound changes in the computing environment, with machines capable of creating high-quality text, images, audio, video, code, and multimodal data. This paper presents a detailed taxonomy and trend overview of generative AI models and products, which follow the historical development of the earlier diffusion-based models all the way to instruction-tuned large language models (LLM). The study is a systematic taxonomy of models based on architecture, modality, training paradigm, deployment type, and scale, and map research innovations to commercial products, such as chat assistants, image and video generators, code assistants, and multimodal systems. The analysis illustrates common patterns of model scaling, benchmark performance, patterns of adoption, and the increased convergence of the vision and language modalities. The present study also points to the important gaps and challenges such as alignment, hallucination, ethical issues, and limitations in evaluation and suggests possible research and industrial implementation directions. This work offers a visionary reference to future scholars, practitioners, and policymakers who want to grasp the development, abilities, and potentials of generative AI by merging technical, commercial, and societal attitudes.
References
[1] J. B. Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y. Hasson, A. M. Gordon, Y. M. Song, S. Sukthankar, C. Schmid, and K. Simonyan, “Flamingo: A visual language model for few-shot learning,” Advances in Neural Information Processing Systems, vol. 35, pp. 23716–23736, 2022.
[2] K. Li, Y. He, Y. Wang, Y. Li, W. Wang, P. Luo, X. Wang, Z. Li, and Y. Qiao, “VideoChat: Chat-centric video understanding,” Science China Information Sciences, vol. 68, no. 10, p. 200102, 2025.
[3] Anthropic, “Introducing Claude 3.5 Sonnet,” Anthropic, 2024.
[4] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proceedings of the 34th International Conference on Machine Learning (ICML), PMLR, 2017, pp. 214–223.
[5] Y. Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, et al., “Training a helpful and harmless assistant with reinforcement learning from human feedback,” arXiv preprint arXiv:2204.05862, 2022.
[6] R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, et al., “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258, 2021.
[7] M. Pawelczyk, S. Neel, and H. Lakkaraju, “In-context unlearning: Language models as few-shot unlearners,” arXiv preprint arXiv:2310.07579, 2023.
[8] V. Capraro, A. Lentsch, D. Acemoglu, S. Akgun, A. Akhmedova, E. Bilancini, et al., “The impact of generative artificial intelligence on socioeconomic inequalities and policy making,” PNAS Nexus, vol. 3, no. 6, p. pgae191, 2024.
[9] M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. D. O. Pinto, J. Kaplan, et al., “Evaluating large language models trained on code,” arXiv preprint arXiv:2107.03374, 2021.
[10] X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollár, and C. L. Zitnick, “Microsoft COCO captions: Data collection and evaluation server,” arXiv preprint arXiv:1504.00325, 2015.
[11] A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, et al., “PaLM: Scaling language modeling with Pathways,” Journal of Machine Learning Research, vol. 24, no. 240, pp. 1–113, 2023.
[12] P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,” Advances in Neural Information Processing Systems, vol. 30, 2017.
[13] H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, et al., “Scaling instruction-finetuned language models,” Journal of Machine Learning Research, vol. 25, no. 70, pp. 1–53, 2024.
[14] A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath, “Generative adversarial networks: An overview,” IEEE Signal Processing Magazine, vol. 35, no. 1, pp. 53–65, 2018.
[15] M. Conover, M. Hayes, A. Mathur, X. Meng, J. Xie, J. Wan, A. Ghodsi, P. Wendell, and M. Zaharia, “Hello Dolly: Democratizing the magic of ChatGPT with open models,” Databricks Blog, Mar. 24, 2023.
[16] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2019, pp. 4171–4186.
[17] P. Dhariwal and A. Nichol, “Diffusion models beat GANs on image synthesis,” Advances in Neural Information Processing Systems, vol. 34, pp. 8780–8794, 2021.
[18] Y. Du and I. Mordatch, “Implicit generation and modeling with energy-based models,” Advances in Neural Information Processing Systems, vol. 32, 2019.
[19] G. Team, R. Anil, S. Borgeaud, J.-B. Alayrac, J. Yu, R. Soricut, et al., “Gemini: A family of highly capable multimodal models,” arXiv preprint arXiv:2312.11805, 2023.
[20] R. Gao, Y. Song, B. Poole, Y. N. Wu, and D. P. Kingma, “Learning energy-based models by diffusion recovery likelihood,” arXiv preprint arXiv:2012.08125, 2020.
[21] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, et al., “Generative adversarial nets,” Advances in Neural Information Processing Systems, vol. 27, 2014.
[22] Y. Goyal, T. Khot, D. Summers-Stay, D. Batra, and D. Parikh, “Making the V in VQA matter: Elevating the role of image understanding in visual question answering,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[23] A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, et al., “The LLaMA 3 herd of models,” arXiv preprint arXiv:2407.21783, 2024.
[24] D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt, “Measuring massive multitask language understanding,” arXiv preprint arXiv:2009.03300, 2020.
[25] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “GANs trained by a two time-scale update rule converge to a local Nash equilibrium,” Advances in Neural Information Processing Systems, vol. 30, 2017.
[26] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851, 2020.
[27] Z. Shi, Y. Wang, F. Yin, X. Chen, K. W. Chang, and C. J. Hsieh, “Red teaming language model detectors with language models,” Transactions of the Association for Computational Linguistics, vol. 12, pp. 174–189, 2024.
[28] J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, et al., “Scaling laws for neural language models,” arXiv preprint arXiv:2001.08361, 2020.
[29] T. Kynkäänniemi, T. Karras, S. Laine, J. Lehtinen, and T. Aila, “Improved precision and recall metric for assessing generative models,” Advances in Neural Information Processing Systems, vol. 32, 2019.
[30] P. Sharma, M. Kumar, H. K. Sharma, and S. M. Biju, “Generative adversarial networks (GANs): Introduction, taxonomy, variants, limitations, and applications,” Multimedia Tools and Applications, vol. 83, no. 41, pp. 88811–88858, 2024.
[31] D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” arXiv preprint arXiv:1312.6114, 2013.
[32] Z. Kong, W. Ping, J. Huang, K. Zhao, and B. Catanzaro, “DiffWave: A versatile diffusion model for audio synthesis,” arXiv preprint arXiv:2009.09761, 2020.
[33] Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato, and F. Huang, “A tutorial on energy-based learning,” in Predicting Structured Data, 2006.
[34] R. Li, L. B. Allal, Y. Zi, N. Muennighoff, D. Kocetkov, C. Mou, et al., “StarCoder: May the source be with you!,” arXiv preprint arXiv:2305.06161, 2023.
[35] Y. Li, D. Choi, J. Chung, N. Kushman, J. Schrittwieser, R. Leblond, et al., “Competition-level code generation with AlphaCode,” Science, vol. 378, no. 6624, pp. 1092–1097, 2022.
[36] P. Liang, R. Bommasani, T. Lee, D. Tsipras, D. Soylu, M. Yasunaga, et al., “Holistic evaluation of language models,” arXiv preprint arXiv:2211.09110, 2022.
[37] S. K. Bharti and K. S. Babu, “Automatic keyword extraction for text summarization: A survey,” arXiv preprint arXiv:1704.03242, 2017.
[38] Y. Tian, X. Li, H. Zhang, C. Zhao, B. Li, X. Wang, and F. Y. Wang, “VistaGPT: Generative parallel transformers for vehicles with intelligent systems for transport automation,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 9, pp. 4198–4207, 2023.
[39] Y. Liu, K. Zhang, Y. Li, Z. Yan, C. Gao, R. Chen, et al., “Sora: A review on background, technology, limitations, and opportunities of large vision models,” arXiv preprint arXiv:2402.17177, 2024.
[40] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, et al., “GPT-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
[41] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27730–27744, 2022.
[42] K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, “BLEU: a method for automatic evaluation of machine translation,” in Proc. 40th Annu. Meeting Assoc. Comput. Linguistics, Philadelphia, PA, USA, pp. 311–318, Jul. 2002.
[43] H. Pearce, B. Tan, B. Ahmad, R. Karri, and B. Dolan-Gavitt, “Examining zero-shot vulnerability repair with large language models,” in 2023 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, pp. 2339–2356, May 2023.
[44] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
[45] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,” OpenAI Blog, 2019.
[46] R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,” Advances in Neural Information Processing Systems, vol. 36, pp. 53728–53741, 2023.
[47] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, et al., “Exploring the limits of transfer learning with a unified text-to-text transformer,” Journal of Machine Learning Research, vol. 21, no. 140, pp. 1–67, 2020.
[48] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with CLIP latents,” arXiv preprint arXiv:2204.06125, 2022.
[49] A. Razavi, A. van den Oord, and O. Vinyals, “Generating diverse high-fidelity images with VQ-VAE-2,” Advances in Neural Information Processing Systems, vol. 32, 2019.
[50] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 10684–10695, 2022.
[51] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, et al., “Photorealistic text-to-image diffusion models with deep language understanding,” Advances in Neural Information Processing Systems, vol. 35, pp. 36479–36494, 2022.
[52] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training GANs,” Advances in Neural Information Processing Systems, vol. 29, 2016.
[53] J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” arXiv preprint arXiv:2010.02502, 2020.
[54] A. Srivastava, A. Rastogi, A. Rao, A. A. M. Shoeb, A. Abid, A. Fisch, et al., “Beyond the imitation game: Quantifying and extrapolating the capabilities of language models,” Transactions on Machine Learning Research, 2023.
[55] Y. Dubois, C. X. Li, R. Taori, T. Zhang, I. Gulrajani, J. Ba, et al., “AlpacaFarm: A simulation framework for methods that learn from human feedback,” Advances in Neural Information Processing Systems, vol. 36, pp. 30039–30069, 2023.
[56] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
[57] J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, et al., “Finetuned language models are zero-shot learners,” arXiv preprint arXiv:2109.01652, 2021.
[58] L. Weidinger, J. Mellor, M. Rauh, C. Griffin, J. Uesato, P. S. Huang, et al., “Ethical and social risks of harm from language models,” arXiv preprint arXiv:2112.04359, 2021.
[59] M. Wessel, M. Adam, A. Benlian, A. Majchrzak, and F. Thies, “Generative AI and its transformative value for digital platforms,” Journal of Management Information Systems, vol. 42, no. 2, pp. 346–369, 2025.
[60] G. I. Winata, H. Zhao, A. Das, W. Tang, D. D. Yao, S. X. Zhang, and S. Sahu, “Preference tuning with human feedback on language, speech, and vision tasks: A survey,” Journal of Artificial Intelligence Research, vol. 82, pp. 2595–2661, 2025.
[61] J. K. Wiredu, N. Seidu Abuba, and H. Zakaria, “Impact of generative AI in academic integrity and learning outcomes: A case study in the Upper East Region,” Asian Journal of Research in Computer Science, vol. 17, no. 8, pp. 10–9734, 2024.
[62] L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, et al., “Diffusion models: A comprehensive survey of methods and applications,” ACM Computing Surveys, vol. 56, no. 4, pp. 1–39, 2023.
[63] U. Mittal, S. Sai, V. Chamola, and D. Sangwan, “A comprehensive review on generative AI for education,” IEEE Access, vol. 12, pp. 142733–142759, 2024.
[64] N. Cheng, S. Wu, X. Wang, Z. Yin, C. Li, W. Chen, and F. Chen, “AI for UAV-assisted IoT applications: A comprehensive review,” IEEE Internet of Things Journal, vol. 10, no. 16, pp. 14438–14461, 2023.
[65] Z. Huang, X. Zhang, Z. Tang, F. Xu, M. Datcu, and J. Han, “Generative artificial intelligence meets synthetic aperture radar: A survey,” IEEE Geoscience and Remote Sensing Magazine, 2024.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Centre for Research and Innovation

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
