Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

sql - Delete all records except the most recent one?

I have two DB tables in a one-to-many relationship. The data looks like this:

select * from student, application

Resultset:

+-----------+---------------+---------------------+
| StudentID | ApplicationID | ApplicationDateTime |
+-----------+---------------+---------------------+
| 1         | 20001         | 12 April 2011       |
| 1         | 20002         | 15 May 2011         |
| 2         | 20003         | 02 Feb 2011         |
| 2         | 20004         | 13 March 2011       |
| 2         | 20005         | 05 June 2011        |
+-----------+---------------+---------------------+

I want to delete all applications except for the most recent one. In other words, each student must only have one application linked to it. Using the above example, the data should look like this:

+-----------+---------------+---------------------+
| StudentID | ApplicationID | ApplicationDateTime |
+-----------+---------------+---------------------+
| 1         | 20002         | 15 May 2011         |
| 2         | 20005         | 05 June 2011        |
+-----------+---------------+---------------------+

How would I go about constructing my DELETE statement to filter out the correct records?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
DELETE FROM student
WHERE ApplicationDateTime <> (SELECT max(ApplicationDateTime) 
                              FROM student s2
                              WHERE s2.StudentID  = student.StudentID)

Given the long discussion in the comments, please note the following:

The above statement will work on any database that properly implements statement level read consistency regardless of any changes to the table while the statement is running.

Databases where I definitely know that this works correctly even with concurrent modifications to the table: Oracle (the one which this question is about), Postgres, SAP HANA, Firebird (and most probably MySQL using InnoDB). Because they all guarantee a consistent view of the data at the point in time when the statement started. Changing the <> to < will not change anything for them (including Oracle which this question is about)

For the above mentioned databases, the statement is not subject to the isolation level because phantom reads or non-repeatable reads can only happen between multiple statements - not within a single statement.

For database that do not implement MVCC properly and rely on locking to manage concurrency (thus blocking concurrent write access) this might actually yield wrong results if the table is updated concurrently. For those the workaround using < is probably needed.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...