python - resize a 2D numpy array excluding NaN - (2024)


i'm trying resize 2d numpy array of given factor, obtaining smaller array in output.

the array read image file , of values should nan (not number, np.nan numpy): result of remote sensing measurements satellite , pixels weren't measured.

the suitable package found scypy.misc.imresize, each pixel in output array containing nan set nan, if there valid data in original pixels interpolated together.

my solution appended here, i've done :

  • create new array based on original array shape , desired reduction factor
  • create index array address pixels of original array averaged each pixel in new
  • cycle through new array pixels , average not-nan pixel obtain new array pixel value; there nan, output nan.

i'm planning add keyword choice between different output (average, median, standard deviation of input pixels , on).

it working expected, on ~1mpx image takes around 3 seconds. due lack of experience in python i'm searching improvements.

do have suggestion how better , more efficiently?

do know library implements stuff?

thanks.

here have example output random pixel input generated code here below:

python - resize a 2D numpy array excluding NaN - (1)

import numpy np import pylab plt scipy import misc def resize_2d_nonan(array,factor): """ resize 2d array different factor on 2 axis sipping nan values. if new pixel contains nan, set nan parameters ---------- array : 2d np array factor : int or tuple. if int x , y factor wil same returns ------- array : 2d np array scaled factor created on mon jan 27 15:21:25 2014 @author: damo_ma """ xsize, ysize = array.shape if isinstance(factor,int): factor_x = factor factor_y = factor elif isinstance(factor,tuple): factor_x , factor_y = factor[0], factor[1] else: raise nameerror('factor must tuple (x,y) or integer') if not (xsize %factor_x == 0 or ysize % factor_y == 0) : raise nameerror('factors must intger multiple of array shape') new_xsize, new_ysize = xsize/factor_x, ysize/factor_y new_array = np.empty([new_xsize, new_ysize]) new_array[:] = np.nan # saves assignment in loop below # submatrix indexes : average box on original matrix subrow, subcol = np.indices((factor_x, factor_y)) # new matrix indexs row, col = np.indices((new_xsize, new_ysize)) # output testing #for i, j, ind in zip(row.reshape(-1), col.reshape(-1),range(row.size)) : # print '----------------------------------------------' # print 'i: %i, j: %i, ind: %i ' % (i, j, ind) # print 'subrow+i*new_ysize, subcol+j*new_xsize :' # print i,'*',new_xsize,'=',i*factor_x # print j,'*',new_ysize,'=',j*factor_y # print subrow+i*factor_x,subcol+j*factor_y # print '---' # print 'array[subrow+i*factor_x,subcol+j*factor_y] : ' # print array[subrow+i*factor_x,subcol+j*factor_y] i, j, ind in zip(row.reshape(-1), col.reshape(-1),range(row.size)) : # define small sub_matrix view of input matrix subset sub_matrix = array[subrow+i*factor_x,subcol+j*factor_y] # modified any(a) , all(a) a.any() , a.all() # see https://stackoverflow.com/a/10063039/1435167 if not (np.isnan(sub_matrix)).all(): # if haven't nan if (np.isnan(sub_matrix)).any(): # if haven no nan @ msub_matrix = np.ma.masked_array(sub_matrix,np.isnan(sub_matrix)) (new_array.reshape(-1))[ind] = np.mean(msub_matrix) else: # if haven nan (new_array.reshape(-1))[ind] = np.mean(sub_matrix) # case assign nan if have nan missing due # standard values of new_array return new_array row , cols = 6, 4 = 10*np.random.random_sample((row , cols)) a[0:3,0:2] = np.nan a[0,2] = np.nan factor_x = 2 factor_y = 2 a_misc = misc.imresize(a, .5, interp='nearest', mode='f') a_2d_nonan = resize_2d_nonan(a,(factor_x,factor_y)) print print print a_misc print print a_2d_nonan plt.subplot(131) plt.imshow(a,interpolation='nearest') plt.title('original') plt.xticks(arange(a.shape[1])) plt.yticks(arange(a.shape[0])) plt.subplot(132) plt.imshow(a_misc,interpolation='nearest') plt.title('scipy.misc') plt.xticks(arange(a_misc.shape[1])) plt.yticks(arange(a_misc.shape[0])) plt.subplot(133) plt.imshow(a_2d_nonan,interpolation='nearest') plt.title('my.func') plt.xticks(arange(a_2d_nonan.shape[1])) plt.yticks(arange(a_2d_nonan.shape[0])) 

edit

i add modification address chrisprosser comment.

if substitute nan other value, let average of not-nan pixels, affect subsequent calculation: difference between resampled original array , resampled array nan substituted shows 2 pixels changed values.

my goal skip nan pixels.

# substitute nan average value ind_nonan , ind_nan = np.where(np.isnan(a) == false), np.where(np.isnan(a) == true) a_substitute = np.copy(a) a_substitute[ind_nan] = np.mean(a_substitute[ind_nonan]) # substitute nan average on not-nan a_substitute_misc = misc.imresize(a_substitute, .5, interp='nearest', mode='f') a_substitute_2d_nonan = resize_2d_nonan(a_substitute,(factor_x,factor_y)) print a_2d_nonan-a_substitute_2d_nonan [[ nan -0.02296697] [ 0.23143208 0. ] [ 0. 0. ]] 

python - resize a 2D numpy array excluding NaN - (2)

** 2nd edit**

to address hooked's answer put additional code. iteresting idea, sadly interpolates new values on pixels should "empty" (nan) , small example generate more nan values.

x , y = np.indices((row , cols)) x_new , y_new = np.indices((row/factor_x , cols/factor_y)) scipy.interpolate import cloughtocher2dinterpolator intp c = intp((x[ind_nonan],y[ind_nonan]),a[ind_nonan]) a_interp = c(x_new , y_new) print print print a_interp [[ nan, nan], [ nan, nan], [ nan, 6.32826577]]) 

python - resize a 2D numpy array excluding NaN - (3)

you operating on small windows of array. instead of looping through array make windows, array can efficiently restructured manipulating strides. numpy library provides as_strided() function that. example provided in scipy cookbook stride tricks game of life.

the following use generalized sliding window function found @ efficient overlapping windows numpy - include @ end.

determine shape of new array:

rows, cols = a.shape new_shape = rows / 2, cols / 2 

restructure array windows need, , create indexing array identifying nans:

# 2x2 windows of original array windows = sliding_window(a, (2,2)) # make windowed boolean array indexing notnan = sliding_window(np.logical_not(np.isnan(a)), (2,2)) 

the new array can made using list comprehension or generator expression.

# using list comprehension # make list of means of windows, disregarding nan's means = [window[index].mean() window, index in zip(windows, notnan)] new_array = np.array(means).reshape(new_shape) # generator expression # produces means of windows, disregarding nan's means = (window[index].mean() window, index in zip(windows, notnan)) new_array = np.fromiter(means, dtype = np.float32).reshape(new_shape) 

the generator expression should conserve memory. using itertools.izip() instead of `zip should if memory problem. used list comprehension solution.

your function:

def resize_2d_nonan(array,factor): """ resize 2d array different factor on 2 axis skipping nan values. if new pixel contains nan, set nan parameters ---------- array : 2d np array factor : int or tuple. if int x , y factor wil same returns ------- array : 2d np array scaled factor created on mon jan 27 15:21:25 2014 @author: damo_ma """ xsize, ysize = array.shape if isinstance(factor,int): factor_x = factor factor_y = factor window_size = factor, factor elif isinstance(factor,tuple): factor_x , factor_y = factor window_size = factor else: raise nameerror('factor must tuple (x,y) or integer') if (xsize % factor_x or ysize % factor_y) : raise nameerror('factors must integer multiple of array shape') new_shape = xsize / factor_x, ysize / factor_y # non-overlapping windows of original array windows = sliding_window(a, window_size) # windowed boolean array indexing notnan = sliding_window(np.logical_not(np.isnan(a)), window_size) #list of means of windows, disregarding nan's means = [window[index].mean() window, index in zip(windows, notnan)] # new array new_array = np.array(means).reshape(new_shape) return new_array 

i haven't done time comparisons original function, should faster.

many solutions i've seen here on vectorize operations increase speed/efficiency - don't quite have handle on , don't know if can applied problem. searching window, array, moving average, vectorize, , numpy should produce similar questions , answers reference.

sliding_window() efficient overlapping windows numpy:

import numpy np numpy.lib.stride_tricks import as_strided ast itertools import product def norm_shape(shape): ''' normalize numpy array shapes they're expressed tuple, one-dimensional shapes. parameters shape - int, or tuple of ints returns shape tuple ''' try: = int(shape) return (i,) except typeerror: # shape not number pass try: t = tuple(shape) return t except typeerror: # shape not iterable pass raise typeerror('shape must int, or tuple of ints') def sliding_window(a,ws,ss = none,flatten = true): ''' return sliding window on in number of dimensions parameters: - n-dimensional numpy array ws - int (a 1d) or tuple (a 2d or greater) representing size of each dimension of window ss - int (a 1d) or tuple (a 2d or greater) representing amount slide window in each dimension. if not specified, defaults ws. flatten - if true, slices flattened, otherwise, there dimension each dimension of input. returns array containing each n-dimensional window ''' if none ss: # ss not provided. windows not overlap in direction. ss = ws ws = norm_shape(ws) ss = norm_shape(ss) # convert ws, ss, , a.shape numpy arrays can math in every # dimension @ once. ws = np.array(ws) ss = np.array(ss) shape = np.array(a.shape) # ensure ws, ss, , a.shape have same number of dimensions ls = [len(shape),len(ws),len(ss)] if 1 != len(set(ls)): raise valueerror(\ 'a.shape, ws , ss must have same length. %s' % str(ls)) # ensure ws smaller in every dimension if np.any(ws > shape): raise valueerror(\ 'ws cannot larger in dimension.\ a.shape %s , ws %s' % (str(a.shape),str(ws))) # how many slices there in each dimension? newshape = norm_shape(((shape - ws) // ss) + 1) # shape of strided array number of slices in each dimension # plus shape of window (tuple addition) newshape += norm_shape(ws) # strides tuple array's strides multiplied step size, plus # array's strides (tuple addition) newstrides = norm_shape(np.array(a.strides) * ss) + a.strides strided = ast(a,shape = newshape,strides = newstrides) if not flatten: return strided # collapse strided has 1 more dimension window. i.e., # new array flat list of slices. meat = len(ws) if ws.shape else 0 firstdim = (np.product(newshape[:-meat]),) if ws.shape else () dim = firstdim + (newshape[-meat:]) # remove dimensions size 1 dim = filter(lambda : != 1,dim) return strided.reshape(dim) 

python - resize a 2D numpy array excluding NaN - (2024)

References

Top Articles
Latest Posts
Article information

Author: Ray Christiansen

Last Updated:

Views: 6182

Rating: 4.9 / 5 (69 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Ray Christiansen

Birthday: 1998-05-04

Address: Apt. 814 34339 Sauer Islands, Hirtheville, GA 02446-8771

Phone: +337636892828

Job: Lead Hospitality Designer

Hobby: Urban exploration, Tai chi, Lockpicking, Fashion, Gunsmithing, Pottery, Geocaching

Introduction: My name is Ray Christiansen, I am a fair, good, cute, gentle, vast, glamorous, excited person who loves writing and wants to share my knowledge and understanding with you.