Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
224 views
in Technique[技术] by (71.8m points)

sql - How do I apply count and distinct on multiple columns from table1 before inserting into table2

Situation:

Our company just purchased 20 million rows from a vendor. The vendor inserted the data from Excel into TABLE1 (RDBMS is postgresql) and unfortunately TABLE1 doesn't have a PK. My manager wants me to insert all 20 million rows into TABLE2 but with distinct records only.

EMPCODE is the only column with unique values. TABLE2 is empty prior to insertion. TABLE1 doesn't have a primary key and last insert date or date_updated column. I created TABLE2 with a primary key, int NOT NULL identity primary key.

What is the correct sql script to get the correct count to get the desired results?

insert into TABLE2 (EMPCODE, FN, LN, STATE, TYPE)
select distinct(EMPCODE), FN, LN, STATE, TYPE from table1

or

insert into TABLE2 (EMPCODE, FN, LN, STATE, TYPE)
select distinct(EMPCODE, FN, LN, STATE, TYPE) from table1

or

insert into TABLE2 (EMPCODE, FN, LN, STATE, TYPE)
select EMPCODE, FN, LN, LAST(STATE), TYPE from table1
group by EMPCODE, FN, LN, STATE, TYPE

TABLE1

EMPCODE  FirstName  LastName        STATE   TYPE

ID111    Fred       Perry           CO      CONSULTANT
ID111    Fred       Perry           UT      CONSULTANT
ID111    Fred       Perry           AZ      CONSULTANT
ID215    Tommy      Hilfiger        IL      INTERN
ID215    Tommy      Hilfiger        IN      INTERN
ID215    Tommy      Hilfiger        OH      INTERN
ID215    Tommy      Hilfiger        NY      INTERN
ID467    David      Abercrombie     TX      REGULAR
ID467    David      Abercrombie     CA      REGULAR
ID875    Ezra       Fitch           NV      TERMINATED
ID875    Ezra       Fitch           OR      TERMINATED
ID875    Ezra       Fitch           WA      TERMINATED
ID875    Ezra       Fitch           UT      TERMINATED
ID875    Ezra       Fitch           AZ      TERMINATED

Desired result in TABLE2

EMPCODE  FirstName  LastName    STATE   TYPE

ID111    Fred      Perry        CO      CONSULTANT
ID215    Tommy     Hilfiger     IL      INTERN
ID467    David     Abercrombie  TX      REGULAR
ID875    Ezra      Fitch        NV      TERMINATED

The problem is that I'm unable to see all data (RAM issues and pgadmin4 very slow) to decide which sql script works best.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you are using Postgres and you want one arbitrary row per empcode, then you can use distinct on

insert into TABLE2 (EMPCODE, FN, LN, STATE, TYPE)
    select distinct on (EMPCODE) EMPCODE, FN, LN, STATE, TYPE
    from table1
    order by EMPCODE;

This will not prevent duplicates that are already in table2. If you want to avoid such duplicates, then create a unique index or constraint on table2(empcode) and use an on conflict clause.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.8k users

...