Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
966 views
in Technique[技术] by (71.8m points)

python - DASK - AttributeError: 'DataFrame' object has no attribute 'sort_values'

I am just trying to order a dask dataframe by a specific column.

CODE 1 - If I call it it shows as indeed a ddf

my_ddf

OUTPUT 1

npartitions=1   
headers .....

CODE 2

my_ddf.sort_values('id', ascending=False)

OUTPUT 2

AttributeError                            Traceback (most recent call last)
<ipython-input-374-35ce4bd06557> in <module>
----> 1 my_ddf.sort_values('id', ascending=False) #.head(20)
      2 # df.sort_values(columns, ascending=True)

~/anaconda3/envs/rapids/lib/python3.7/site-packages/dask/dataframe/core.py in __getattr__(self, key)
   3619             return self[key]
   3620         else:
-> 3621             raise AttributeError("'DataFrame' object has no attribute %r" % key)
   3622 
   3623     def __dir__(self):

AttributeError: 'DataFrame' object has no attribute 'sort_values'

Tried Solutions

  • This is an example from the official dask documentation df.sort_values(columns, ascending=False).head(n)
  • pandas only - DataFrame object has no attribute 'sort_values'
  • pandas only - 'DataFrame' object has no attribute 'sort'
  • DASK answer - https://stackoverflow.com/a/40378896/10270590
    • I don't want to set it in to index because I want to have only my current index values.
    • The following answer is a bit strange and I am not sure that it would work when I have more partition (currently I have 1 because if previous group by of the data) or how to not to have just a random big number "1000000000". Or how to make it Increasing from top to bottom in the dask dataframe my_ddf.nlargest(1000000000, 'id').compute()
question from:https://stackoverflow.com/questions/65924692/dask-attributeerror-dataframe-object-has-no-attribute-sort-values

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

AFAIK, sort across partitions is not implemented (yet?). If the dataset is small enough to fit in memory you can do ddf = ddf.compute() and then run sorting on the pandas dataframe.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...