Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
451 views
in Technique[技术] by (71.8m points)

Speed up BigQuery query job to import from Cloud SQL

I am performing a query to generate a new BigQuery table of of size ~1 Tb (a few billion rows), as part of migrating a Cloud SQL table to BigQuery, using Federated query. I use the BigQuery Python client to submit the query job, in the query I select all from the the Cloud SQL database table and use EXTERNAL_QUERY.

I find that the query can take 6+ hours (and fails with "Operation timed out after 6.0 hour")! Even if it didn't fail, I would like to speed it up as I may need to perform this migration again.

I see that the PostgreSQL egress is 20Mb/sec, consistent with a job that would take half a day. Would it help if I consider something more distributed with Dataflow? Or simpler, extend my Python code using the BigQuery client to generate multiple queries, which can run async by BigQuery?

Or is it possible to still use that single query but increase the egress traffic (database configuration)?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think it is more suitable to use the dump export.

Running a query on large table is an inefficient job.

I recommend to export Cloud SQL data to a CSV file.

BigQuery can import CSV format file, So you can use this file to create your new bigQuery table.

enter image description here

I'm not sure of how long this job will takes, But at least will not be failed.

Refer here to get more detailed job about export Cloud SQL to CSV dump.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...