python - pandas iloc vs ix vs loc explanation? -


can explain how these 3 methods of slicing different?
i've seen the docs, , i've seen these answers, still find myself unable explain how 3 different. me, seem interchangeable in large part, because @ lower levels of slicing.

for example, want first 5 rows of dataframe. how 3 of these work?

df.loc[:5] df.ix[:5] df.iloc[:5] 

can present 3 cases distinction in uses clearer?

note: in pandas version 0.20.0 , above, ix deprecated , use of loc , iloc encouraged instead. have left parts of answer describe ix intact reference users of earlier versions of pandas. examples have been added below showing alternatives ix.


first, recap:

  • loc works on labels in index.
  • iloc works on positions in index (so takes integers).
  • ix tries behave loc falls behaving iloc if label not in index.

it's important note subtleties can make ix tricky use:

  • if index of integer type, ix use label-based indexing , not fall position-based indexing. if label not in index, error raised.

  • if index not contain only integers, given integer, ix use position-based indexing rather label-based indexing. if ix given type (e.g. string), can use label-based indexing.


to illustrate differences between 3 methods, consider following series:

>>> s = pd.series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5]) >>> s 49   nan 48   nan 47   nan 46   nan 45   nan 1    nan 2    nan 3    nan 4    nan 5    nan 

then s.iloc[:3] returns first 3 rows (since looks @ position) , s.loc[:3] returns first 8 rows (since looks @ labels):

>>> s.iloc[:3] # slice first 3 rows 49   nan 48   nan 47   nan  >>> s.loc[:3] # slice , including label 3 49   nan 48   nan 47   nan 46   nan 45   nan 1    nan 2    nan 3    nan  >>> s.ix[:3] # integer in index s.ix[:3] works loc 49   nan 48   nan 47   nan 46   nan 45   nan 1    nan 2    nan 3    nan 

notice s.ix[:3] returns same series s.loc[:3] since looks label first rather going position (and index of integer type).

what if try integer label isn't in index (say 6)?

here s.iloc[:6] returns first 6 rows of series expected. however, s.loc[:6] raises keyerror since 6 not in index.

>>> s.iloc[:6] 49   nan 48   nan 47   nan 46   nan 45   nan 1    nan  >>> s.loc[:6] keyerror: 6  >>> s.ix[:6] keyerror: 6 

as per subtleties noted above, s.ix[:6] raises keyerror because tries work loc can't find 6 in index. because our index of integer type doesn't fall behaving iloc.

if, however, our index of mixed type, given integer ix behave iloc instead of raising keyerror:

>>> s2 = pd.series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5]) >>> s2.index.is_mixed() # index mix of types true >>> s2.ix[:6] # behaves iloc given integer   nan b   nan c   nan d   nan e   nan 1   nan 

keep in mind ix can still accept non-integers , behave loc:

>>> s2.ix[:'c'] # behaves loc given non-integer   nan b   nan c   nan 

as general advice, if you're indexing using labels, or indexing using integer positions, stick loc or iloc avoid unexpected results - try not use ix.


combining position-based , label-based indexing

sometimes given dataframe, want mix label , positional indexing methods rows , columns.

for example, consider following dataframe. how best slice rows , including 'c' and take first 4 columns?

>>> df = pd.dataframe(np.nan,                        index=list('abcde'),                       columns=['x','y','z', 8, 9]) >>> df     x   y   z   8   9 nan nan nan nan nan b nan nan nan nan nan c nan nan nan nan nan d nan nan nan nan nan e nan nan nan nan nan 

in earlier versions of pandas (before 0.20.0) ix lets quite neatly - can slice rows label , columns position (note columns, ix default position-based slicing since label 4 not column name):

>>> df.ix[:'c', :4]     x   y   z   8 nan nan nan nan b nan nan nan nan c nan nan nan nan 

in later versions of pandas, can achieve result using iloc , of method:

>>> df.iloc[:df.index.get_loc('c') + 1, :4]     x   y   z   8 nan nan nan nan b nan nan nan nan c nan nan nan nan 

get_loc() index method meaning "get position of label in index". note since slicing iloc exclusive of endpoint, must add 1 value if want row 'c' well.

there further examples in pandas' documentation here.


Comments

Popular posts from this blog

javascript - Karma not able to start PhantomJS on Windows - Error: spawn UNKNOWN -

c# - Display ASPX Popup control in RowDeleteing Event (ASPX Gridview) -

Nuget pack csproj using nuspec -