python - How to get a dataframe index based on subselection -
i have dataframe (szen_df
) , select portion of dataframe assign dataframe (orb_df
). when try , index of subselected dataframe still has whole index of original dataframe. level 0 index of new dataframe. eg.
start = datetime(2007, 1, 25, 12, 49, 0) end = datetime(2007, 1, 25, 14, 30, 0) orb_df = szen_df.loc[start:end]
orb_df
shows:
if query index of new dataframe has dates of old dataframe.
orb_df.index.levels[0]
shows:
datetimeindex(['2007-01-25 00:00:00', '2007-01-25 00:10:00', '2007-01-25 00:20:00', '2007-01-25 00:30:00', '2007-01-25 00:40:00', '2007-01-25 00:50:00', '2007-01-25 01:00:00', '2007-01-25 01:10:00', '2007-01-25 01:20:00', '2007-01-25 01:30:00', ... '2007-01-25 22:20:00', '2007-01-25 22:30:00', '2007-01-25 22:40:00', '2007-01-25 22:50:00', '2007-01-25 23:00:00', '2007-01-25 23:10:00', '2007-01-25 23:20:00', '2007-01-25 23:30:00', '2007-01-25 23:40:00', '2007-01-25 23:50:00'], dtype='datetime64[ns]', name=u'time', length=144, freq=none, tz=none)
has 144 elements. according subselection should have have 11 elements. need index starts @ 2007-01-25 12:50:00
, ends @ 2007-01-25 14:30:00
. in other words level 0 index of new subselection.
here 1 way it. first reset_index(level='pos')
break down multi-level index , use set_index('pos', append=true)
rebuild multi-level index.
import pandas pd import numpy np # simulate data np.random.seed(0) multi_index = pd.multiindex.from_product([pd.date_range('2007-02-01 00:00:00', periods=100, freq='10min'), ['left', 'center', 'right']], names=['time', 'pos']) szen_df = pd.dataframe(np.random.randn(300, 3), index=multi_index, columns=['lat', 'lon', 'szen']) out[48]: lat lon szen time pos 2007-02-01 00:00:00 left 1.7641 0.4002 0.9787 center 2.2409 1.8676 -0.9773 right 0.9501 -0.1514 -0.1032 2007-02-01 00:10:00 left 0.4106 0.1440 1.4543 center 0.7610 0.1217 0.4439 right 0.3337 1.4941 -0.2052 2007-02-01 00:20:00 left 0.3131 -0.8541 -2.5530 center 0.6536 0.8644 -0.7422 right 2.2698 -1.4544 0.0458 2007-02-01 00:30:00 left -0.1872 1.5328 1.4694 center 0.1549 0.3782 -0.8878 right -1.9808 -0.3479 0.1563 2007-02-01 00:40:00 left 1.2303 1.2024 -0.3873 center -0.3023 -1.0486 -1.4200 right -1.7063 1.9508 -0.5097 ... ... ... ... 2007-02-01 15:50:00 left -0.4367 -1.6430 -0.4061 center -0.5353 0.0254 1.1542 right 0.1725 0.0211 0.0995 2007-02-01 16:00:00 left 0.2274 -1.0167 -0.1148 center 0.3088 -1.3708 0.8657 right 1.0814 -0.6314 -0.2413 2007-02-01 16:10:00 left -0.8782 0.6994 -1.0612 center -0.2225 -0.8589 0.0510 right -1.7942 1.3265 -0.9646 2007-02-01 16:20:00 left 0.0599 -0.2125 -0.7621 center -0.8878 0.9364 -0.5256 right 0.2712 -0.8015 -0.6472 2007-02-01 16:30:00 left 0.4722 0.9304 -0.1753 center -1.4219 1.9980 -0.8565 right -1.5416 2.5944 -0.4040 [300 rows x 3 columns] start_time = '2007-02-01 12:50:00' end_time = '2007-02-01 14:30:00' orb_df = szen_df.reset_index(level='pos').loc[start_time:end_time].set_index('pos', append=true) orb_df.index out[50]: multiindex(levels=[[2007-02-01 12:50:00, 2007-02-01 13:00:00, 2007-02-01 13:10:00, 2007-02-01 13:20:00, 2007-02-01 13:30:00, 2007-02-01 13:40:00, 2007-02-01 13:50:00, 2007-02-01 14:00:00, 2007-02-01 14:10:00, 2007-02-01 14:20:00, 2007-02-01 14:30:00], ['center', 'left', 'right']], labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10], [1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0, 2]], names=['time', 'pos']) orb_df.index.levels[0] out[59]: datetimeindex(['2007-02-01 12:50:00', '2007-02-01 13:00:00', '2007-02-01 13:10:00', '2007-02-01 13:20:00', '2007-02-01 13:30:00', '2007-02-01 13:40:00', '2007-02-01 13:50:00', '2007-02-01 14:00:00', '2007-02-01 14:10:00', '2007-02-01 14:20:00', '2007-02-01 14:30:00'], dtype='datetime64[ns]', name='time', freq=none, tz=none)
Comments
Post a Comment