Facial image noise classification and denoising using neural network

Image denoising is an important aspect of image processing. Noisy images are produced as a result of technical and environmental flaws. As a result, it is reasonable to consider image denoising an important topic to research, as it also aids in the resolution of other image processing issues. The challenge, however, is that the traditional techniques used are time-consuming and inflexible. This article purposed a system of classifying and denoising noised images. A CNN and UNET based model architecture is designed, implement, and evaluated. The facial image dataset is processed and then it is used to train, valid and test the models. During preprocessing, the images are resized into 48*48, normalize, and various noises are added to the image. The preprocessing for each model is a bit different. The training and validation accuracy for the CNN model is 99.87% and 99.92% respectively. The UNET model is also able to get optimal PSNR and SSIM values for different noises.


Introduction
Image Denoising is a crucial topic in image processing and a lot of work is currently being done on it, but there is very little attention towards automating the task of classifying the noised image. Few researchers have work in this field and many papers still only focus on the latter part of denoising. I want to design a convolutional neural network that classifies the noised images into different classes and a UNET based model for denoising the noised image. Since the manual selection of images consumes huge time, automatic classification and denoising save a lot of time and effort.

Literature review
Image denoising is a crucial task in image processing and deep learning [11][12][13][14][15]. Different classical techniques and modern development are explained in his paper [1]. Different classical techniques like Spatial domain filtering, Transform Domain filtering, and modern techniques like CNN-based denoising methods are discussed is explained. Ali Awad [6] purposed a method to remove the noise from an image corrupted from impulse, Gaussian, or a mixture of both. The method is based on divided into two, in which the first is removing the small noise component, and subsequent steps are based on principal component analysis. The process is assumed to remove the majority of noise in the first stage and smaller ones later. Olaf Ronneberger [2] purposed a UNET structure for the first time for the segmentation of biomedical images. In this paper, a Ushaped model was introduced where unlike old models a skip connected between encoding and decoding layer was introduced which allows some data to flow and help in better image generation. Irfan Ali [3] purposed an AutoEncoder model for image denoising with Color Scheme. The work investigates performing denoising on the RGB dataset. Gaussian noise of 0.2 factor is added in all the images of the dataset and an autoencoder is used to remove the noise. Latha H N [4] purposed a local modified UNET Architecture for Image Denoising. The work investigates the UNET model [2] in removing the noise and compares it with the local modified UNET Architecture. The model is trained in three types of noises Gaussian, Salt&Pepper, and Camera Shake. D. Sil [6] purpose a convolutional neural network for classification and denoising of images. VGG-16 and Inception-v3 were used for the classification of the noised image while a CNN-based denoising method FFDNet was used to denoise noise. J. Gurrola-Ramos [7] purposed a residual Dense U-Net Neural Network to the denoise image. The purposed model has many features like the denoising process does not need knowledge regarding noise before denoising. The model can gain an optimal PSNR and SSIM value. S. Ghose [8] purposed a CNN model to remove noise from an image and restore it to a high-quality image. The analysis is done only for Gaussian noise for different percentage Gaussian white noise and comparison traditional method is also done. O. Sheremet [9] proposed a CNN-based model for denoising images in Info communication systems. Hyun Park [16] presented a PCA reconstruction-based denoising approach for removing complicated color noise components on human faces that are difficult to remove with vectorial color filters. The projected methodology consists of the subsequent six steps: coaching of canonical eigenface area exploitation PCA, automatic extraction of countenance exploitation active look model and alignment of the input face to mean form, reconstruction of the associate initial noise-free face, relighting of reconstructed face employing a bilateral filter, extraction of noise regions exploitation the variances of the coloring of coaching information, and reconstruction exploitation partial info of input pictures and mixing of the reconstructed image with the first image.
All the papers discuss the possible solutions of image denoising, but a complete solution to the problem is not provided. This paper aims to bridge that gap by providing a complete automated system of image classification based on noise and denoising. The models are deployed in a web application to provide users an interactive and easy tool to perform image denoising.

System overview
Our work aim is to classify and denoise images. A general overview of the system is presented in Figure.1. During the denoising, the noise type determines which UNET to activate. .

Deep learning model
Two models are used for classification and denoising respectively. A CNN custom model is designed and implemented for the classification of the image and a UNET based model is used for denoising.

CNN
A custom CNN model is designed and implemented. To avoid overfitting, a bottom-to-top approach for model building is used. The model which can give optimal results is used. The final gained optimal model is shown in Figure.2.

UNET
Autoencoder is commonly used for image manipulation functions such as deblurring, denoising, encoding, and so on. The dimensionality of the image can be preserved using an autoencoder model, but the linear comparison of the input results in a bottleneck that does not relay all of the features. The UNET, on the other hand, overcomes this constraint by including a skip relation that enables feature representations to move through. UNET was developed for Biomedical Image Segmentation [2], but it can also be used for image denoising and other image processing activities. Figure.3 depicts the architecture of the UNET model used in this experiment. Certain changes in the original architecture [2] are done as per the requirement while experimenting.

Basic components of CNN and UNET
The basic components which are required to build the model are described below.

Convolution
A convolution is a combined integration of two functions that demonstrates how one modifies the other. Equation (1) and (2) is the mathematical representation of the operation. (1) There are three major items of this operation: input image, feature detector, and feature map. The matrix representation of the input image is multiplied element-wise with the feature detector to gain a feature map. Another thing is stride which is the shift of the number of pixels over the input image. Figure.4 shows an example of the working of convolution.  By randomly dropping out nodes during training, a single model can be used to simulate having a large number of different network architectures. A dropout is a regularization approach that reduces over fitting and improves generalization error in deep neural networks of all kinds. It is computationally cheap and surprisingly efficient.

Activation function
It is a critical component of the neural network that introduces non-linear properties. This enables a neural network to learn complex, non-linear mappings between inputs and outputs. There are many types of Activation Function. The ones which are used in the network are Sigmoid, relu, and softmax.

Relu
Relu is the abbreviation for Rectified Linear Unit. If x is positive, it outputs x; otherwise, it outputs zero. It can be mathematically summarized as in equation (3).

Softmax
It returns a vector containing the probability distributions of a set of possible outcomes. The mathematical representation of the softmax is given in equation (4).
Sigmoid It compresses a vector in the range (0,1). The mathematical representation of the softmax is given in equation (5).

Optimizer
Optimizers are algorithms or methods for changing the characteristics of neural networks, such as weights and learning rate, to minimize losses. There are different types of optimizers such as Gradient Descent, Momentum, Adagrad, RMSProp, etc. In this paper, an Adam optimizer is used.

Results
Both models require different ways of processing data and training. So, the explanation is divided into two parts.

Generation of noisy images
Our dataset consists of about 34034 images collected from a website [10]. The images consist of facial images with different types of facial reactions. To perform the intended operation, these images need to be preprocessed. Figure.6 shows the sample of the dataset.
The preprocessing step is shown in Figure.7 and Figure.8 is the visualization of the dataset after preprocessing. A noise factor of 0.1 is added to each image which the images very unclear which is good for training the images as the model will be able to distinguish images with low noise factor efficiently.   Figure.9 and Figure.10 clearly show the model training validation accuracy and loss respectively. The testing accuracy of the model is shown in the form of a confusion matrix Figure.11 as it can convey more detailed information. The model gives the training and validation accuracy of 99.87% and 99.92% respectively. Figure 9. Training accuracy vs. validation accuracy Figure 10. Training accuracy vs. validation loss Figure 11. Confusion matrix for test data Visualizing output Finally, the actual and predicted class of the test image is shown in Figure.12. Figure 12. Test Image actual and predicted class

Generation of noisy images
Initially, images are gained as pixel value as they are represented in terms of this form. To perform the intended operation, these images need to be preprocessed. Figure.6 shows the sample of the dataset. The preprocessing step is shown in Figure.13.  Table 2.
MSE stands for mean square error. Its mathematical representation is shown in (7). The m*n represents noisefree monochrome image 'I' having 'K' as noise approximation. is the maximum pixel values per pixel. SSIM stands for structural Similarity. The PSNR is not highly indicative of the perceived similarity of the image. So, SSIM is used to address the shortcoming by taking texture into account. Equation (8) is the mathematical representation of SSIM.
SSIM consists of three parts. These parts are represented in (9). The first part represents the loss of correlation, the second part represents luminance distortion and the last part represents contrast distortion.
Evaluation of the trained model Thirdly, the model is used to generate a clear image using noised test image, and PSNR and SSIM between the original image and generated image are calculated. The PSNR is not highly indicative of the perceived similarity of the image. So, SSIM is used to address the shortcoming by taking texture into account. The PSNR and SSIM values of the UNET model are shown in Table 3. It clearly shows optimal values. The model can generate noise-free images with great efficiency in the case of Poisson and Salt & Pepper noise. The image generated in the case of Gaussian has also gained optimal value but less compared to other noises. Finally, the models are deployed in a web application. This application is made using HTML, CSS, and Flask. Flask is a web-based framework for the backend and HTML, CSS is used for the frontend. The output gained after passing the image in the web application is shown in Figure. 13. Figure 13. Web application

Conclusion
The experiment shows that the proposed CNN model can classify the images based on the noise they are overlapped with optimal training and validation accuracy. It also gives an optimal result while testing. Also, the proposed UNET model can denoise images with optimal PSNR and SSIM values. Thus, the proposed system provides a complete solution for denoising images and also can be used for other image processing tasks.