Researchers from IIIT Allahabad propose T2CI GAN: a deep learning model that generates compressed images from text

In recent years, the creation of textual descriptions of visual data has become an essential research issue. However, formulating the problem to produce visual data from written descriptions is still much more difficult because it requires the fusion of natural language processing and computer vision techniques. Available techniques create uncompressed images from textual descriptions using generative adversarial networks (GANs). Generative adversarial networks are a type of machine learning framework that can produce text, photos, videos, and voice recordings. Previously, GANs have been successfully used to produce image datasets for other deep learning algorithms to train, to produce movies or animations for special purposes, and to produce appropriate captions for Pictures.

In reality, most visual input is processed and transmitted in a compressed form. In order to achieve storage and computational efficiency, the suggested work strives to directly output visual data as a compressed representation using deep convolutional GANs (DCGANs). A new model based on GAN, T2CI-GAN, was recently created by researchers from the Computer Vision and Biometrics Laboratory of IIIT Allahabad and Vignan University in India, which can produce compressed images from text descriptions. This approach could serve as a starting point to explore several options for storing images and sharing content between various smart devices.

In previous work, researchers have used GANs and other deep learning models to handle various tasks, such as feature extraction from data, text and image data segmentation, detecting words in long text snippets and creating compressed JPEG images. This new model builds on these earlier initiatives to tackle a computational problem that has so far received little attention in the literature. Only a few deep learning-based techniques used by other research teams to create images from textual descriptions produce compressed images. Additionally, most existing systems for producing and compressing images address the problem of doing so independently, which increases computational workload and processing time.

The suggested T2CI-GAN is a deep learning-based model that produces compressed visual images from input text descriptions. This is a significant departure from traditional approaches that generate visual representations from textual descriptions and further compress those images. The main selling feature of the template is its ability to map text descriptions and generate compressed images directly.

The research team created two GAN-based models to produce compressed images from textual descriptions. A compressed JPEG DCT (discrete cosine transform) image dataset was used to train the first of these models. After training, this model could produce compressed images from textual descriptions. On the other hand, a set of RGB photos was used to train the researchers’ second GAN-based model. This model developed the ability to produce DCT representations of JPEG-compressed images, which explicitly express a series of data points as an equation. Suggested patterns were evaluated using RGB and JPEG compressed versions of the well-known open source Oxford-102 Flower images benchmark dataset. In the compressed JPEG domain, the model achieved very encouraging peak performance.

When the provided photos are intended to be easily shared with smartphones or other smart devices, the T2CI-GAN model can be used to enhance automated image retrieval systems. Additionally, it can be a valuable tool for media and communications experts, allowing them to find lighter versions of particular photographs to post online.

Due to recent advancements in technology, our world is moving towards machine-to-machine and human-to-machine connections. T2CI-GAN will be crucial in this situation because machines need facts in compressed form to read or understand them. The template currently only creates photos in JPEG compressed form. Thus, the researchers’ long-term goal is to extend it to produce images in any compressed form without restriction on the compression algorithm. After the publication of the team’s research paper, the source code of the model will also be made available to the general public.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'T2CI-GAN: Text to Compressed Image generation using Generative Adversarial Network'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and reference article.

Please don’t forget to subscribe Our ML subreddit

Khushboo Gupta is an intern consultant at MarktechPost. She is currently pursuing her B.Tech from Indian Institute of Technology (IIT), Goa. She is passionate about the fields of machine learning, natural language processing and web development. She likes to learn more about the technical field by participating in several challenges.

Comments are closed.