Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
764 views
in Technique[技术] by (71.8m points)

datatable - Moving Excel Data To SQL Server - Can This Be More Efficient C#

This is going to be a rather long post, since I want to explain why I implemented certain styles of coding. The main goal is to see if I can make this program run faster with help from you guys!!

Overview

Starting off, I have 23 different Excel Sheets that sit in a file. Some sheets are bigger than others but I have to import ALL of the data within each of them. Each of the column header names in these Excel Sheets have been created as classes in C# - there are 23 of these classes. In order to grab the data, there is a front-end that accepts an Excel file and then ExcelDataReader is used to read the tables in a way that they can become DataTable objects.

There is a general style that importing each Excel Sheet follows. I will use my vCPU class as an example First, the program cleans up the DataTable object that is being looked at by removing unnecessary rows. Then a List<vCPUs> object is made. Each row in the vCPU Excel Sheet makes up one vCPU object, so there is a foreach loop that goes through each DataRow in the vCPU_Table. That looks something like this:

public class vCPU
    {
        public static int Insert(DataTable vCPU_Table, int assessment_id, int objectCount)
        {
            //Cleans up the Data Table that is being accessed by removing unneccesary rows
            for (int i = vCPU_Table.Rows.Count - 1; i >= 0; i--)
            {
                if (vCPU_Table.Rows[i][0].ToString() == String.Empty)
                {
                    vCPU_Table.Rows.RemoveAt(i);
                }
            }

            List<RvtoolsVCpu> vCPUs = new List<RvtoolsVCpu>();
            using (var context = new DataWarehouseContext())
            {
                foreach (DataRow dr in vCPU_Table.Rows)
                {
                    if (dr["VM"].ToString() == "") //In case too many rows are selected, the name is always filled for corresponding data
                    {
                        context.RvtoolsVCpu.AddRange(vCPUs);
                        context.SaveChanges();
                        return objectCount;
                    }
                    int VmId = vInfo.GetID(dr["VM"].ToString(), dr["Host"].ToString(), dr["Datacenter"].ToString(), dr["Cluster"].ToString(), assessment_id);
                    int? Amount = (dr.Table.Columns.Contains("CPUs") && dr["CPUs"] != DBNull.Value) ? Convert.ToInt32(dr["CPUs"]) : (int?)null; //Amount is NumCPUs
                    int? Sockets = (dr.Table.Columns.Contains("Sockets") && dr["Sockets"] != DBNull.Value) ? Convert.ToInt32(dr["Sockets"]) : (int?)null;
                    int? CoresPerSocket = (dr.Table.Columns.Contains("Cores p/s") && dr["Cores p/s"] != DBNull.Value) ? Convert.ToInt32(dr["Cores p/s"]) : (int?)null;
                    int? Max = (dr.Table.Columns.Contains("Max") && dr["Max"] != DBNull.Value) ? Convert.ToInt32(dr["Max"]) : (int?)null;
                    int? Overall = (dr.Table.Columns.Contains("Overall") && dr["Overall"] != DBNull.Value) ? Convert.ToInt32(dr["Overall"]) : (int?)null; 
                    string Level = (dr.Table.Columns.Contains("Level") && dr["Level"] != DBNull.Value) ? dr["Level"].ToString() : "";
                    int? Shares = (dr.Table.Columns.Contains("Shares") && dr["Shares"] != DBNull.Value) ? Convert.ToInt32(dr["Shares"]) : (int?)null; 
                    int? Reservation = (dr.Table.Columns.Contains("Reservation") && dr["Reservation"] != DBNull.Value) ? Convert.ToInt32(dr["Reservation"]) : (int?)null; 
                    int? Entitlement = (dr.Table.Columns.Contains("Entitlement") && dr["Entitlement"] != DBNull.Value) ? Convert.ToInt32(dr["Entitlement"]) : (int?)null; 
                    int? DrsEntitlement = (dr.Table.Columns.Contains("DRS Entitlement") && dr["DRS Entitlement"] != DBNull.Value) ? Convert.ToInt32(dr["DRS Entitlement"]) : (int?)null;
                    int? Limit = (dr.Table.Columns.Contains("Limit") && dr["Limit"] != DBNull.Value) ? Convert.ToInt32(dr["Limit"]) : (int?)null; 
                    bool? HotAdd = (dr.Table.Columns.Contains("Hot Add") && dr["Hot Add"] != DBNull.Value) ? bool.Parse(dr["Hot Add"].ToString()) : (bool?)null; 
                    bool? HotRemove = (dr.Table.Columns.Contains("Hot Remove") && dr["Hot Remove"] != DBNull.Value) ? bool.Parse(dr["Hot Remove"].ToString()) : (bool?)null;
                    string Annotation = (dr.Table.Columns.Contains("Annotation") && dr["Annotation"] != DBNull.Value) ? dr["Annotation"].ToString() : "";

                    vCPUs.Add(new RvtoolsVCpu()
                    {
                        VmId = VmId, //Get VM_ID
                        Amount = Amount,
                        Sockets = Sockets,
                        CoresPerSocket = CoresPerSocket,
                        Max = Max,
                        Overall = Overall,
                        Level = Level,
                        Shares = Shares,
                        Reservation = Reservation,
                        Entitlement = Entitlement,
                        DrsEntitlement = DrsEntitlement,
                        Limit = Limit,
                        HotAdd = HotAdd,
                        HotRemove = HotRemove,
                        Annotation = Annotation,
                        AssessmentId = assessment_id
                    });
                    objectCount += 16;
                }
                context.RvtoolsVCpu.AddRange(vCPUs);
                context.SaveChanges();
                return objectCount;
            }
        }
    }

This vCPU class is rather small compared to some of the other Excel Sheets that have to be imported... you might be able to see now why I'm searching for optimization.

Within the foreach loop, you can ignore the vInfo.GetID( ... ); since that is already optimized. Looking at the objects within each row, however, is where things get bulky. Here's why I have what I have in there, but any suggestions for better solutions would be greatly appreciated:

  1. dr.Table.Columns.Contains("CPUs") -- This is used since there are different versions of these Excel files. Some of them don't have the same columns, so when ExcelDataReader tries to access a column that isn't there, the program breaks and the import fails.

  2. dr["CPUs"] != DBNull.Value -- This is used since sometimes there is nothing in a specific Excel cell, and that tends to run into an error when you try to convert nothing to an integer.

Thus, if both of these conditionals return true, the data cell is converted to the correct object type. Otherwise, it is converted to the null value of that object type. This works really great for avoiding errors! Unfortunately, it doesn't allow the program to chug along fast.

After each object in a row is either converted or set to null, Entity Framework Core comes into play. The List<vCPUs> object keeps gaining objects until the end of the Excel Sheet has been reached. All of the data is then pushed to a SQL Server Database and the program moves onto the next Excel Sheet.

Where can I improve on this program?

question from:https://stackoverflow.com/questions/65830909/moving-excel-data-to-sql-server-can-this-be-more-efficient-c-sharp

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...