python - pandas iloc vs ix vs loc explanation? -
can explain how these 3 methods of slicing different?
i've seen the docs, , i've seen these answers, still find myself unable explain how 3 different. me, seem interchangeable in large part, because @ lower levels of slicing.
for example, want first 5 rows of dataframe
. how 3 of these work?
df.loc[:5] df.ix[:5] df.iloc[:5]
can present 3 cases distinction in uses clearer?
note: in pandas version 0.20.0 , above, ix
deprecated , use of loc
, iloc
encouraged instead. have left parts of answer describe ix
intact reference users of earlier versions of pandas. examples have been added below showing alternatives ix
.
first, recap:
loc
works on labels in index.iloc
works on positions in index (so takes integers).ix
tries behaveloc
falls behavingiloc
if label not in index.
it's important note subtleties can make ix
tricky use:
if index of integer type,
ix
use label-based indexing , not fall position-based indexing. if label not in index, error raised.if index not contain only integers, given integer,
ix
use position-based indexing rather label-based indexing. ifix
given type (e.g. string), can use label-based indexing.
to illustrate differences between 3 methods, consider following series:
>>> s = pd.series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5]) >>> s 49 nan 48 nan 47 nan 46 nan 45 nan 1 nan 2 nan 3 nan 4 nan 5 nan
then s.iloc[:3]
returns first 3 rows (since looks @ position) , s.loc[:3]
returns first 8 rows (since looks @ labels):
>>> s.iloc[:3] # slice first 3 rows 49 nan 48 nan 47 nan >>> s.loc[:3] # slice , including label 3 49 nan 48 nan 47 nan 46 nan 45 nan 1 nan 2 nan 3 nan >>> s.ix[:3] # integer in index s.ix[:3] works loc 49 nan 48 nan 47 nan 46 nan 45 nan 1 nan 2 nan 3 nan
notice s.ix[:3]
returns same series s.loc[:3]
since looks label first rather going position (and index of integer type).
what if try integer label isn't in index (say 6
)?
here s.iloc[:6]
returns first 6 rows of series expected. however, s.loc[:6]
raises keyerror since 6
not in index.
>>> s.iloc[:6] 49 nan 48 nan 47 nan 46 nan 45 nan 1 nan >>> s.loc[:6] keyerror: 6 >>> s.ix[:6] keyerror: 6
as per subtleties noted above, s.ix[:6]
raises keyerror because tries work loc
can't find 6
in index. because our index of integer type doesn't fall behaving iloc
.
if, however, our index of mixed type, given integer ix
behave iloc
instead of raising keyerror:
>>> s2 = pd.series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5]) >>> s2.index.is_mixed() # index mix of types true >>> s2.ix[:6] # behaves iloc given integer nan b nan c nan d nan e nan 1 nan
keep in mind ix
can still accept non-integers , behave loc
:
>>> s2.ix[:'c'] # behaves loc given non-integer nan b nan c nan
as general advice, if you're indexing using labels, or indexing using integer positions, stick loc
or iloc
avoid unexpected results - try not use ix
.
combining position-based , label-based indexing
sometimes given dataframe, want mix label , positional indexing methods rows , columns.
for example, consider following dataframe. how best slice rows , including 'c' and take first 4 columns?
>>> df = pd.dataframe(np.nan, index=list('abcde'), columns=['x','y','z', 8, 9]) >>> df x y z 8 9 nan nan nan nan nan b nan nan nan nan nan c nan nan nan nan nan d nan nan nan nan nan e nan nan nan nan nan
in earlier versions of pandas (before 0.20.0) ix
lets quite neatly - can slice rows label , columns position (note columns, ix
default position-based slicing since label 4
not column name):
>>> df.ix[:'c', :4] x y z 8 nan nan nan nan b nan nan nan nan c nan nan nan nan
in later versions of pandas, can achieve result using iloc
, of method:
>>> df.iloc[:df.index.get_loc('c') + 1, :4] x y z 8 nan nan nan nan b nan nan nan nan c nan nan nan nan
get_loc()
index method meaning "get position of label in index". note since slicing iloc
exclusive of endpoint, must add 1 value if want row 'c' well.
there further examples in pandas' documentation here.
Comments
Post a Comment