Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
783 views
in Technique[技术] by (71.8m points)

postgresql - Questions about Postgres track_commit_timestamp (pg_xact_commit_timestamp)

I'm working on a design for a concurrency-safe incremental aggregate rollup system,and track_commit_timestamp (pg_xact_commit_timestamp) sounds perfect. But I've found very little commentary on it generally, and couldn't figure out how it works in detail from the source code.

Hopefully, someone knows the answers to one or more of my questions:

  • Is it possible for the commit timestamp feature to produce times out of order? What I'm after is a way to identify records that have been changed since a specific time so that I can get any later changes for processing. If there are identical timestamps, I don't need them in perfect commit sequence.

  • How many bytes are added to each row in the final implementation? The discussions I saw seemed to be ranging from 12-24 bytes. There was discussion of adding in extra bytes for "just in case." This is pre 9.5, so a world ago.

  • Are the timestamps indexed internally? With a B-tree? I ask for capacity-planning reasons.

  • I've seen on StackOverflow and the design discussions that the timestamps are not kept indefinitely, but can't find the details on exactly how long they are stored.

  • Any rules of thumb on the performance impact of enabling track_commit_timestamp? I don't need the data on all tables but, where I do, it sounds like it might work perfectly.

  • Any gotchas? I tried running VACUUM FULL on a test table and none of the pg_xact_commit_timestamp changed. It seems like a physical operation like VACUUM shouldn't change anything, but there could easily be something I've not thought of. And, honestly, my quick VACUUM test might not even mean anything.

Many thanks for any assistance!


I've edited my question to clarify what I'm trying to accomplish, I'm looking to track processed and unprocessed data based on update stamps.

select max(pg_xact_commit_timestamp(xmin)) from scan;--   2019-07-07 20:46:14.694288+10

update scan set quantity = 5 where quantity = 1; --       Change some data.

select max(pg_xact_commit_timestamp(xmin)) from scan; --  2019-07-10 09:38:17.920294+10

-- Find the changed row(s):
select * 
  from scan 
 where pg_xact_commit_timestamp(xmin) > '2019-07-07 20:46:14.694288+10'; 

The idea is to do a rollup on rows incrementally and regularly. So,

-- Track the last rolled up timestamp. -- Wait for 5 minutes (or whatever.) -- Find the current max commit timestamp. -- Search for rows where the commit timestamp is between the last processed timestamp and the max. -- Roll them up.

Transaction IDs alone can't work because they can commit out of order very easily. And this timestamp system doesn't have to be 100% perfect, but I'm aiming for something very close to perfect. So, a bit of clock wiggle and even a bit of confusion around overlapping start/end times is likely tolerable.

Is there a glaring flaw in this plan?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As this subject doesn't seem to show up in the archives very much, I want to add a bit of detail before moving on. I asked related questions on several lists, forums, and by direct communication. Several people were kind enough to review the source code, provide historical background, and clear this up for me. Hopefully, leaving some detail here will help someone else down the track. Errors are all mine, obviously, corrections and enhancements more than welcome.

  • Commit timestamps are assigned when the transaction's work is completed, but that's not the same was when it is committed. The WAL writer doesn't update the stamps to keep them in chronological sequence.

  • Therefore, commit timestamps are definitely not a reliable mechanism for finding changes rows in order.

  • Multiple clocks. Self-adjusting clocks. Oh the humanity!

  • If you do want an in order-change sequence, logical decoding or replication are options. (I tried out logical replication a couple of weeks ago experimentally. Coolest. Thing. Ever.)

  • The cost of timestamp tracking is 12 bytes per transaction, not per row. So, not so bad. (Timestamps are 8 bytes, transaction IDs are 4 bytes.)

  • This is all part of the existing transaction system, so the realities of transaction ID rollaround apply here too. (Not scary in my case.) See:

    https://www.postgresql.org/docs/current/routine-vacuuming.html

  • For the record, you can enable this option on RDS via a parameter group setting. Just set track_commit_timestamp to 1 and restart. (The setting is 'on' in an postgres.conf.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...