Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
298 views
in Technique[技术] by (71.8m points)

unix - Shell Script for Splitting Sentences into new format in output text file?

I have converted a bible to be a plain text file which comes out like this

$$  Genesis 40:1 It came to pass after these things that the butler and the baker of the king of Egypt ..

$$  Genesis 40:2 And Pharaoh was angry with his two officers, the chief butler and the chief baker.

$$  Genesis 40:3 So he put them in custody in the house of the captain of the guard, in the prison, the ..

I would like to be able to run a shell script on the text file and have it run through the file outputing a new file that looks like this

$$ Genesis 40:1

It came to pass after these things that the butler and the baker of the king of Egypt ..

$$ Genesis 40:2

And Pharaoh was angry with his two officers, the chief butler and the chief baker.

$$ Genesis 40:3

So he put them in custody in the house of the captain of the guard, in the prison, the ..

I figure somehow I need to have it parse the first X number of characters on each line then split the lines at that point however, I'm new at shell scripting and can't seem to figure out the best way to process the file to accomplish this.

Any Thoughts?

question from:https://stackoverflow.com/questions/66051527/shell-script-for-splitting-sentences-into-new-format-in-output-text-file

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Since you just need to replace the space after the numbers with two newline characters, you can use this command:

sed 's/([0-9]) /1

/' <textfile >newfile

- substitute (the first) one digit followed by space with that same digit followed by two .

this worked really well until it got to a line that read “1 John 1:1 something written here” then it split the line in the wrong spot. How can I account for this?

To account for lines having number and space before the name, we can include a letter and everything before the final digit in the pattern:

sed 's/([a-z].*[0-9]) /1

/' <textfile >newfile

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...