python - pandas iloc vs ix vs loc explanation? -

- February 15, 2014

can explain how these 3 methods of slicing different?
i've seen the docs, , i've seen these answers, still find myself unable explain how 3 different. me, seem interchangeable in large part, because @ lower levels of slicing.

for example, want first 5 rows of dataframe. how 3 of these work?

df.loc[:5] df.ix[:5] df.iloc[:5]

can present 3 cases distinction in uses clearer?

note: in pandas version 0.20.0 , above, ix deprecated , use of loc , iloc encouraged instead. have left parts of answer describe ix intact reference users of earlier versions of pandas. examples have been added below showing alternatives ix.

first, recap:

loc works on labels in index.
iloc works on positions in index (so takes integers).
ix tries behave loc falls behaving iloc if label not in index.

it's important note subtleties can make ix tricky use:

if index of integer type, ix use label-based indexing , not fall position-based indexing. if label not in index, error raised.
if index not contain only integers, given integer, ix use position-based indexing rather label-based indexing. if ix given type (e.g. string), can use label-based indexing.

to illustrate differences between 3 methods, consider following series:

>>> s = pd.series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5]) >>> s 49   nan 48   nan 47   nan 46   nan 45   nan 1    nan 2    nan 3    nan 4    nan 5    nan

then s.iloc[:3] returns first 3 rows (since looks @ position) , s.loc[:3] returns first 8 rows (since looks @ labels):

>>> s.iloc[:3] # slice first 3 rows 49   nan 48   nan 47   nan  >>> s.loc[:3] # slice , including label 3 49   nan 48   nan 47   nan 46   nan 45   nan 1    nan 2    nan 3    nan  >>> s.ix[:3] # integer in index s.ix[:3] works loc 49   nan 48   nan 47   nan 46   nan 45   nan 1    nan 2    nan 3    nan

notice s.ix[:3] returns same series s.loc[:3] since looks label first rather going position (and index of integer type).

what if try integer label isn't in index (say 6)?

here s.iloc[:6] returns first 6 rows of series expected. however, s.loc[:6] raises keyerror since 6 not in index.

>>> s.iloc[:6] 49   nan 48   nan 47   nan 46   nan 45   nan 1    nan  >>> s.loc[:6] keyerror: 6  >>> s.ix[:6] keyerror: 6

as per subtleties noted above, s.ix[:6] raises keyerror because tries work loc can't find 6 in index. because our index of integer type doesn't fall behaving iloc.

if, however, our index of mixed type, given integer ix behave iloc instead of raising keyerror:

>>> s2 = pd.series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5]) >>> s2.index.is_mixed() # index mix of types true >>> s2.ix[:6] # behaves iloc given integer   nan b   nan c   nan d   nan e   nan 1   nan

keep in mind ix can still accept non-integers , behave loc:

>>> s2.ix[:'c'] # behaves loc given non-integer   nan b   nan c   nan

as general advice, if you're indexing using labels, or indexing using integer positions, stick loc or iloc avoid unexpected results - try not use ix.

combining position-based , label-based indexing

sometimes given dataframe, want mix label , positional indexing methods rows , columns.

for example, consider following dataframe. how best slice rows , including 'c' and take first 4 columns?

>>> df = pd.dataframe(np.nan,                        index=list('abcde'),                       columns=['x','y','z', 8, 9]) >>> df     x   y   z   8   9 nan nan nan nan nan b nan nan nan nan nan c nan nan nan nan nan d nan nan nan nan nan e nan nan nan nan nan

in earlier versions of pandas (before 0.20.0) ix lets quite neatly - can slice rows label , columns position (note columns, ix default position-based slicing since label 4 not column name):

>>> df.ix[:'c', :4]     x   y   z   8 nan nan nan nan b nan nan nan nan c nan nan nan nan

in later versions of pandas, can achieve result using iloc , of method:

>>> df.iloc[:df.index.get_loc('c') + 1, :4]     x   y   z   8 nan nan nan nan b nan nan nan nan c nan nan nan nan

get_loc() index method meaning "get position of label in index". note since slicing iloc exclusive of endpoint, must add 1 value if want row 'c' well.

there further examples in pandas' documentation here.

Search This Blog

Dil

python - pandas iloc vs ix vs loc explanation? -

combining position-based , label-based indexing

Comments

Post a Comment

Popular posts from this blog

c# - Store DBContext Log in other EF table -

c# - Display ASPX Popup control in RowDeleteing Event (ASPX Gridview) -

Nuget pack csproj using nuspec -