Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
508 views
in Technique[技术] by (71.8m points)

.net - Parallel.Foreach spawning way too many threads

The problem

Although the code about which I will talk here I wrote in F#, it is based on the .NET 4 framework, not specifically depending on any particularity of F# (at least it seems so!).

I have some pieces of data on my disk that I should update from the network, saving the latest version to the disk:

type MyData =
    { field1 : int;
      field2 : float }

type MyDataGroup =
    { Data : MyData[];
      Id : int }

// load : int -> MyDataGroup
let load dataId =
    let data = ... // reads from disk
    { Data = data;
      Id = dataId }

// update : MyDataGroup -> MyDataGroup
let update dg =
    let newData = ... // reads from the network and process
                      // newData : MyData[]

    { dg with Data = dg.Data
                     |> Seq.ofArray
                     |> Seq.append newData
                     |> processDataSomehow
                     |> Seq.toArray }

// save : MyDataGroup -> unit
let save dg = ... // writes to the disk

let loadAndSaveAndUpdate = load >> update >> save

The problem is that to loadAndSaveAndUpdate all my data, I would have to execute the function many times:

{1 .. 5000} |> loadAndSaveAndUpdate

Each step would do

  • some disk IO,
  • some data crunching,
  • some network IO (with possibility of lots of latency),
  • more data crunching,
  • and some disk IO.

Wouldn't it be nice to have this done in parallel, to some degree? Unfortunately, none of my reading and parsing functions are "async-workflows-ready".

The first (not very good) solutions I came up with

Tasks

The first thing I've done was to set up a Task[] and start them all:

let createTask id = new Task(fun _ -> loadAndUpdateAndSave id)
let tasks = {1 .. 5000}
            |> Seq.map createTask
            |> Seq.toArray

tasks |> Array.iter (fun x -> x.Start())
Task.WaitAll(tasks)

Then I hit CTRL+ESC just to see how many threads it was using. 15, 17, ..., 35, ..., 170, ... until killed the application! Something was going wrong.

Parallel

I did almost the same thing but using Parallel.ForEach(...) and the results were the same: lots and lots and lots of threads.

A solution that works... kind of

Then I decided to start only n threads, Task.WaitAll(of them), then other n, until there were no more tasks available.

This works, but the problem is that when it has finished processing, say, n-1 tasks, it will wait, wait, wait for the damn last Task that insist on blocking due to lots of network latency. This is not good!

So, how would you attack this problem? I'd appreciate to view different solutions, involving either Async Workflows (and in this case how to adapt my non-async functions), Parallel Extensions, weird parallel patterns, etc.

Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

ParallelOptions.MaxDegreeOfParallelism limits the number of concurrent operations run by Parallel method calls


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...