python - pandas iloc vs ix vs loc explanation? -
can explain how these 3 methods of slicing different?
i've seen the docs, , i've seen these answers, still find myself unable explain how 3 different. me, seem interchangeable in large part, because @ lower levels of slicing.
for example, want first 5 rows of dataframe. how 3 of these work?
df.loc[:5] df.ix[:5] df.iloc[:5] can present 3 cases distinction in uses clearer?
note: in pandas version 0.20.0 , above, ix deprecated , use of loc , iloc encouraged instead. have left parts of answer describe ix intact reference users of earlier versions of pandas. examples have been added below showing alternatives ix.
first, recap:
locworks on labels in index.ilocworks on positions in index (so takes integers).ixtries behavelocfalls behavingilocif label not in index.
it's important note subtleties can make ix tricky use:
if index of integer type,
ixuse label-based indexing , not fall position-based indexing. if label not in index, error raised.if index not contain only integers, given integer,
ixuse position-based indexing rather label-based indexing. ifixgiven type (e.g. string), can use label-based indexing.
to illustrate differences between 3 methods, consider following series:
>>> s = pd.series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5]) >>> s 49 nan 48 nan 47 nan 46 nan 45 nan 1 nan 2 nan 3 nan 4 nan 5 nan then s.iloc[:3] returns first 3 rows (since looks @ position) , s.loc[:3] returns first 8 rows (since looks @ labels):
>>> s.iloc[:3] # slice first 3 rows 49 nan 48 nan 47 nan >>> s.loc[:3] # slice , including label 3 49 nan 48 nan 47 nan 46 nan 45 nan 1 nan 2 nan 3 nan >>> s.ix[:3] # integer in index s.ix[:3] works loc 49 nan 48 nan 47 nan 46 nan 45 nan 1 nan 2 nan 3 nan notice s.ix[:3] returns same series s.loc[:3] since looks label first rather going position (and index of integer type).
what if try integer label isn't in index (say 6)?
here s.iloc[:6] returns first 6 rows of series expected. however, s.loc[:6] raises keyerror since 6 not in index.
>>> s.iloc[:6] 49 nan 48 nan 47 nan 46 nan 45 nan 1 nan >>> s.loc[:6] keyerror: 6 >>> s.ix[:6] keyerror: 6 as per subtleties noted above, s.ix[:6] raises keyerror because tries work loc can't find 6 in index. because our index of integer type doesn't fall behaving iloc.
if, however, our index of mixed type, given integer ix behave iloc instead of raising keyerror:
>>> s2 = pd.series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5]) >>> s2.index.is_mixed() # index mix of types true >>> s2.ix[:6] # behaves iloc given integer nan b nan c nan d nan e nan 1 nan keep in mind ix can still accept non-integers , behave loc:
>>> s2.ix[:'c'] # behaves loc given non-integer nan b nan c nan as general advice, if you're indexing using labels, or indexing using integer positions, stick loc or iloc avoid unexpected results - try not use ix.
combining position-based , label-based indexing
sometimes given dataframe, want mix label , positional indexing methods rows , columns.
for example, consider following dataframe. how best slice rows , including 'c' and take first 4 columns?
>>> df = pd.dataframe(np.nan, index=list('abcde'), columns=['x','y','z', 8, 9]) >>> df x y z 8 9 nan nan nan nan nan b nan nan nan nan nan c nan nan nan nan nan d nan nan nan nan nan e nan nan nan nan nan in earlier versions of pandas (before 0.20.0) ix lets quite neatly - can slice rows label , columns position (note columns, ix default position-based slicing since label 4 not column name):
>>> df.ix[:'c', :4] x y z 8 nan nan nan nan b nan nan nan nan c nan nan nan nan in later versions of pandas, can achieve result using iloc , of method:
>>> df.iloc[:df.index.get_loc('c') + 1, :4] x y z 8 nan nan nan nan b nan nan nan nan c nan nan nan nan get_loc() index method meaning "get position of label in index". note since slicing iloc exclusive of endpoint, must add 1 value if want row 'c' well.
there further examples in pandas' documentation here.
Comments
Post a Comment