Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
380 views
in Technique[技术] by (71.8m points)

parsing - Parse a unique CSV format Snowflake COPY INTO

I have a csv file formatted like below and I am trying parse into snowflake. Currently my file format looks like this:

file_format = (type = 'csv'
                  field_delimiter = ','
                  skip_header = 1
                  null_if = ('NULL', 'null')
                  empty_field_as_null = true
                  skip_blank_lines = TRUE
                  trim_space = TRUE
                  FIELD_OPTIONALLY_ENCLOSED_BY='"'

My csv example is below, I can handle the "Supervisor" column by the FIELD_OPTIONALLY_ENCLOSED_BY='"' parameter. However, this throws off the "Customer_name" column since there is " " usually around the middle name and sometimes only one ".

"Supervisor","ID","Customer_name","Deleted"
"Abrhama, Thomas","280397212","Alibaba "Trey" Jones",""
"Susan, Smith","290940940","Sarah Rose" Miller",""

This is what my ideal table would look like.

 Supervisor          ID       Customer_name            Deleted
Abrhama, Thomas  280397212   Alibaba "Trey" Jones
Susan, Smith     290940940    Sarah Rose" Miller

Any help is greatly appreciated!

question from:https://stackoverflow.com/questions/65924714/parse-a-unique-csv-format-snowflake-copy-into

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It looks as if you'll need a custom parser. You can do that inside of Snowflake.

The first thing you want to do is bring the whole line into Snowflake as a single column. To do that, specify a table with only one column. For the column delimiter in the file format, you can use a symbol that you'll never encounter including a multi-char delimiter.

Then run the resulting table through a custom parser, such as this user defined table function:

create or replace temp table csv(line string);
create or replace temp table people(supervisor string, ID number, customer_name string, deleted string);

insert into csv select $$"Abrhama, Thomas","280397212","Alibaba "Trey" Jones",""$$;
insert into csv select $$"Susan, Smith","290940940","Sarah Rose" Miller",""$$;

-- Start with a table like this before running through the UDTF:
select * from csv;

CREATE OR REPLACE FUNCTION PARSE_CSV(LINE string)
    RETURNS TABLE (supervisor string, id string, customer_name string, deleted string)
    LANGUAGE JAVASCRIPT
    AS 
$${
        processRow: function f(row, rowWriter, context){
        
           var s = parseRow(row.LINE);
        
           rowWriter.writeRow({SUPERVISOR: s[0], ID: s[1], CUSTOMER_NAME: s[2], DELETED: s[3]});

        function parseRow(s) {
            var c = 0;
            var out = new Array(4);
            var p = s.split(",");
            for (var i = 0; i < p.length; i++) {
                if (p[i].charAt(p[i].length - 1) != '"') {
                    out[c++] = p[i].substring(1) + p[i+1].substring(0, p[i+1].length - 1);
                    i++;
                } else {
                    out[c++] = p[i].substring(1, p[i].length - 1);
                }
            }
            return out;
        }
    }
}
$$;

select   SUPERVISOR
        ,ID
        ,CUSTOMER_NAME
        ,DELETED
from csv, lateral(table(parse_csv(LINE)));

-- You can then insert from that select:
insert into people
select   SUPERVISOR
        ,ID::int
        ,CUSTOMER_NAME
        ,DELETED
from csv, lateral(table(parse_csv(LINE)));

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...