Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
445 views
in Technique[技术] by (71.8m points)

python - Convolving image with kernel in Fourier domain

I'm using zero padding around my image and convolution kernel, converting them to the Fourier domain, and inverting them back to get the convolved image, see code below. The result, however, is wrong. I was expecting a blurred image, but the output is four shifted quarters. Why is the output wrong, and how can I fix the code?

Input image:

input image

Result of convolution:

output image

from PIL import Image,ImageDraw,ImageOps,ImageFilter
import numpy as np 
from scipy import fftpack
from copy import deepcopy
import imageio
## STEP 1 ##
im1=Image.open("pika.jpeg")
im1=ImageOps.grayscale(im1)
im1.show()
print("s",im1.size)
## working on this image array
im_W=np.array(im1).T
print("before",im_W.shape)
if(im_W.shape[0]%2==0):
im_W=np.pad(im_W, ((1,0),(0,0)), 'constant')
if(im_W.shape[1]%2==0):
im_W=np.pad(im_W, ((0,0),(1,0)), 'constant')
print("after",im_W.shape)
Boxblur=np.array([[1/9,1/9,1/9],[1/9,1/9,1/9],[1/9,1/9,1/9]])
dim=Boxblur.shape[0]

##padding before frequency domain multipication
pad_size=(Boxblur.shape[0]-1)/2
pad_size=int(pad_size)
##padded the image(starts here)

p_im=np.pad(im_W, ((pad_size,pad_size),(pad_size,pad_size)), 'constant')
t_b=(p_im.shape[0]-dim)/2
l_r=(p_im.shape[1]-dim)/2
t_b=int(t_b)
l_r=int(l_r)

##padded the image(ends here)

## padded the kernel(starts here)
k_im=np.pad(Boxblur, ((t_b,t_b),(l_r,l_r)), 'constant')
print("hjhj",k_im)
print("kernel",k_im.shape)

##fourier transforms image and kernel
fft_im = fftpack.fftshift(fftpack.fft2(p_im))
fft_k  = fftpack.fftshift(fftpack.fft2(k_im))
con_in_f=fft_im*fft_k
ifft2 = abs(fftpack.ifft2(fftpack.ifftshift(con_in_f)))
convolved=(np.log(abs(ifft2))* 255 / np.amax(np.log(abs(ifft2)))).astype(np.uint8)
final=Image.fromarray(convolved.T)
final.show()
u=im1.filter(ImageFilter.Kernel((3,3), [1/9,1/9,1/9,1/9,1/9,1/9,1/9,1/9,1/9], scale=None, offset=0))
u.show()
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The Discrete Fourier transform (DFT) and, by extension, the FFT (which computes the DFT) have the origin in the first element (for an image, the top-left pixel) for both the input and the output. This is the reason we often use the fftshift function on the output, so as to shift the origin to a location more familiar to us (the middle of the image).

This means that we need to transform a 3x3 uniform weighted blurring kernel to look like this before passing it to the FFT function:

1/9  1/9  0  0  ... 0  1/9
1/9  1/9  0  0  ... 0  1/9
  0    0  0  0  ... 0    0
...  ...               ...
  0    0  0  0  ... 0    0
1/9  1/9  0  0  ... 0  1/9

That is, the middle of the kernel is at the top-left corner of the image, with the pixels above and to the left of the middle wrapping around and appearing at the right and bottom ends of the image.

We can do this using the ifftshift function, applied to the kernel after padding. When padding the kernel, we need to take care that the origin (middle of the kernel) is at location k_im.shape // 2 (integer division), within the kernel image k_im. Initially the origin is at [3,3]//2 == [1,1]. Usually, the image whose size we're matching is even in size, for example [256,256]. The origin there will be at [256,256]//2 == [128,128]. This means that we need to pad a different amount to the left and to the right (and bottom and top). We need to be careful computing this padding:

sz = img.shape  # the sizes we're matching
kernel = np.ones((3,3)) / 9
sz = (sz[0] - kernel.shape[0], sz[1] - kernel.shape[1])  # total amount of padding
kernel = np.pad(kernel, (((sz[0]+1)//2, sz[0]//2), ((sz[1]+1)//2, sz[1]//2)), 'constant')
kernel = fftpack.ifftshift(kernel)

Note that the input image, img, does not need to be padded (though you can do this if you want to enforce a size for which the FFT is cheaper). There is also no need to apply fftshift to the result of the FFT before multiplication, and then reverse this shift right after, these shifts are redundant. You should use fftshift only if you want to display the Fourier domain image. Finally, applying logarithmic scaling to the filtered image is wrong.

The resulting code is (I'm using pyplot for display, not using PIL at all):

import numpy as np
from scipy import misc
from scipy import fftpack
import matplotlib.pyplot as plt

img = misc.face()[:,:,0]

kernel = np.ones((3,3)) / 9
sz = (img.shape[0] - kernel.shape[0], img.shape[1] - kernel.shape[1])  # total amount of padding
kernel = np.pad(kernel, (((sz[0]+1)//2, sz[0]//2), ((sz[1]+1)//2, sz[1]//2)), 'constant')
kernel = fftpack.ifftshift(kernel)

filtered = np.real(fftpack.ifft2(fftpack.fft2(img) * fftpack.fft2(kernel)))
plt.imshow(filtered, vmin=0, vmax=255)
plt.show()

Note that I am taking the real part of the inverse FFT. The imaginary part should contain only values very close to zero, which are the result of rounding errors in the computations. Taking the absolute value, though common, is incorrect. For example, you might want to apply a filter to an image that contains negative values, or apply a filter that produces negative values. Taking the absolute value here would create artefacts. If the output of the inverse FFT contains imaginary values significantly different from zero, then there is an error in the way that the filtering kernel was padded.

Also note that the kernel here is tiny, and consequently the blurring effect is tiny too. To better see the effect of the blurring, make a larger kernel, for example np.ones((7,7)) / 49.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...