Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
487 views
in Technique[技术] by (71.8m points)

regex - Regular expression with multiline matching (subtitles strings)

Need some help in regexp matching pattern. The text goes like here (it's subtitles for video)

...

223
00:20:47,920 --> 00:20:57,520
- Hello! This is good subtitle text. 
- Yes! How are you, stackoverflow?

224
00:20:57,520 --> 00:21:11,120
Wow, seems amazing.
- We're good, thanks. 
Like, you know, everyone is happy around here with their laptops.

225
00:21:11,120 --> 00:21:14,440
- Understood. Some dumb text 

...

I need a set of groups: startTime, endTime, text

For now my achievements are not very good. I can get startTime, endTime and some text, but not all the text, only the last sentence. I've attached a screenshot.

enter image description here

As you can see, group 3 is capturing text, but only last sentence.

Please, explain me what I'm doing wrong.

Thank you.

question from:https://stackoverflow.com/questions/65845271/regular-expression-with-multiline-matching-subtitles-strings

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Accounting for the possibility there is no new-line character after the final text of your string; Would the following work for you:

(dd:dd:dd,ddd)[ >-]*?((?1))
(.*?(?=

|))

See the online demo


  • (dd:dd:dd,ddd) - The same pattern as you used to capture starting time in 1st capture group.
  • [ >-]*? - 0+ (but lazy) character from the character class up to:
  • ((?1)) - A 2nd capture group which matches the same pattern as 1st group.
  • - A newline-character.
  • (.*?(?= |)) - A 3rd capture group that captures anything (including newline with the s-flag) up to a positive lookahead for either two newline characters or the end of the whole string.

Note, some (not all) engines allow for backreferencing a previous subpattern. I guess the app you are using does not. Therefor you can swap the (?1) with your own pattern to capture the 2nd group.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...