[LapSRN 리뷰] Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution (CVPR 17)

Notice

Recent Posts

Recent Comments

Today

Total

작심삼일

[LapSRN 리뷰] Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution (CVPR 17) 본문

Deep Learning/Image Enhancement

[LapSRN 리뷰] Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution (CVPR 17)

yun_s 2021. 7. 9. 10:37

728x90

My Summary & Opinion

Vision 분야에 Deep learning이 사용되기 전에는 알고리즘적인 다양한 방법들이 존재했다.

Laplacian pyramid가 그중 하나이다.

이전까지의 CNN 모델들은 단순히 층을 쌓는 등의 network 단에서의 성능 향상을 꾀했다면, 이 논문은 Laplacian pyramid의 구조를 본떠서 네트워크 구조를 설계했다.

영상처리를 전공한 입장에서는 친숙한 구조를 네트워크에 녹여냈다는 사실에 반가운 논문이었고, 이런 식으로 다른 vision 알고리즘을 적용하는 논문들이 앞으로 나올 것이라 예상된다.

Introduction

딥러닝을 사용해 SR을 하기 위해 SRCNN이 나왔고, 더 깊은 네트워크를 쌓는 등의 방식으로 발전해왔다.

하지만 이런 방법들은 크게 세 가지 문제가 있다.

첫째, 현존하는 방법들은 bicubic interpolation 등을 사용해 미리 upsampling 한다.

이 과정은 불필요한 계산을 하게하고, reconstruction artifact를 생성한다.

둘째, 현존하는 방법들은 $l_2$ loss를 사용한다.

$l_2$ loss를 사용하면 복원된 HR 이미지들은 overly-smooth 되며 그 결과는 human visual perception에 멀다.

마지막으로, 대부분의 방법들은 한 번만 upsampling을 진행해 $8 \times$와 같은 large scaling factor에 대해서는 학습하기가 어렵다.

위의 세 가지 단점을 해결하기 위해 Laplacian Pyramid Super-Resolution Network (LapSRN)을 제안한다.

제안하는 방법은 기존의 방법보다 정확성, 속도, progressive reconstruction의 세 가지 측면에서 뛰어나다.

Deep Laplacian Pyramid Network for SR

1. Network Architecture

제안하는 모델은 LR이미지를 upscale한 것이 아닌 LR 이미지 그대로를 입력으로 받아서 $\log_2S$단계로 점진적으로 residual을 예측한다.

Feature Extraction

$s$ level에서 feature extraction은 $d$ conv layers와 upscaling을 위한 transposed conv layer 한 개로 이루어져 있다.

각 transposed conv layer의 output은 두 층으로 연결된다: 1) level $s$에서 residual image의 복원을 위한 conv layer, 2) finer level $s+1$에서 feature를 추출하기 위한 conv layer

여기서 중요한 것은, coarse resolution에서 feature extraction을 진행하고 fine resolution에서 feature map을 만든다는 것이다.

이는 현존하는 네트워크들이 fine resolution에서 feature extraction을 진행하고 feature map을 만드는 것과 상반된다.

그로 인해 계산량을 줄일 수 있다.

Image reconstruction

level $s$에서 input image는 transposed conv layer로 2배 크기로 upsampling 된다.

Upscale 된 이미지는 feature extraction 단에서 예측된 residual image와 합쳐진다.

2. Loss function

제안하는 네트워크에서는 모든 level $s$에 대해서 각각 loss function이 존재한다.

$L(\hat{y}, y; \theta) = {1 \over N} \displaystyle \sum ^N_{i=1} \sum ^L_{s=1} \rho (\hat{y_s}^{(i)}-y_s^{(i)}) = {1 \over N} \displaystyle \sum ^N_{i=1} \sum ^L_{s=1} \rho ((\hat{y_s}^{(i)} - x_s^{(i)}) - r_s^{(i)})$

$r_s$: residual image at level $s$

3. Implementation and training details

모든 conv layer는 $3 \times 3$의 64 filters를 사용했다.

Transposed conv filter는 $4 \times 4$이다.

LReLU를 사용했다.

Experiment Results

속도도 더 빠르고 성능도 더 좋다.

Conclusions

본 논문에서는 Laplacian pyramid framework를 사용한 CNN을 제안한다.

이 모델은 속도와 정확성이 높다.

High-frequency residual을 점진적으로 예측한다.

Reference

Lai, Wei-Sheng, et al. "Deep laplacian pyramid networks for fast and accurate super-resolution." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

728x90

'Deep Learning > Image Enhancement' 카테고리의 다른 글

[DnCNN 리뷰] Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising (IEEE TIP 17) (0)	2021.08.02
[DRRN 리뷰] Image Super-Resolution via Deep Recursive Residual Network (CVPR 17) (0)	2021.07.22
[SENet 리뷰] Squeeze-and-Excitation Networks (CVPR 18) (0)	2021.05.20
[EnhanceNet 리뷰] Single Image Super-ResolutionThrough Automated Texture Synthesis (ICCV 17) (0)	2021.05.10
[Memnet 리뷰] MemNet: A Persistent Memory Network for Image Restoration (ICCV 17) (0)	2021.05.04