Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
835 views
in Technique[技术] by (71.8m points)

validation - Regex for a valid 32-bit signed integer

I'm pretty sure this hasn't actually been answered yet on this site. For once and for all, what is the smallest regex that matches a numeric string that is in the range of a 32-bit signed integer, in the range -2147483648 to 2147483647.

I must use regex for validation - that is the only option available to me.

I have tried

d{1,10}

but I can't figure out how to restrict it to the valid number range.


To aid developing in regex, it should match:

-2147483648
-2099999999
-999999999
-1
0
1
999999999
2099999999
2147483647

It should not match:

-2147483649
-2200000000
-11111111111
2147483648
2200000000
11111111111

I have set up an on-line live demo (on rubular) that has my attempt and the test cases above.


Note: The shortest regex that works will be accepted. Efficiency of regex will not be considered (unless there's a tie for shortest length).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I really hope it is just puzzler and no one will use regex for this problem in real world. Proper solution would be converting number from string to numeric type like BigInteger. This should allow us to check its range using proper methods or operators, like compareTo, >, <.


To make life easier you can use this page (dead link) to generate regex for ranges. So regex for range 0 - 2147483647 can look like

([0-9]{1,9}|1[0-9]{9}|2(0[0-9]{8}|1([0-3][0-9]{7}|4([0-6][0-9]{6}|7([0-3][0-9]{5}|4([0-7][0-9]{4}|8([0-2][0-9]{3}|3([0-5][0-9]{2}|6([0-3][0-9]|4[0-7])))))))))

(friendlier way)

(
 [0-9]{1,9}|
1[0-9]{9}|
2(0[0-9]{8}|
  1([0-3][0-9]{7}|
       4([0-6][0-9]{6}|
            7([0-3][0-9]{5}|
                 4([0-7][0-9]{4}|
                      8([0-2][0-9]{3}|
                           3([0-5][0-9]{2}|
                                6([0-3][0-9]|
                                     4[0-7]
)))))))))

and range 0 - 2147483648

([0-9]{1,9}|1[0-9]{9}|2(0[0-9]{8}|1([0-3][0-9]{7}|4([0-6][0-9]{6}|7([0-3][0-9]{5}|4([0-7][0-9]{4}|8([0-2][0-9]{3}|3([0-5][0-9]{2}|6([0-3][0-9]|4[0-8])))))))))

So we can just combine these ranges and write it as

range of 0-2147483647 OR "-" range of 0-2147483648

which will give us

([0-9]{1,9}|1[0-9]{9}|2(0[0-9]{8}|1([0-3][0-9]{7}|4([0-6][0-9]{6}|7([0-3][0-9]{5}|4([0-7][0-9]{4}|8([0-2][0-9]{3}|3([0-5][0-9]{2}|6([0-3][0-9]|4[0-7])))))))))|-([0-9]{1,9}|1[0-9]{9}|2(0[0-9]{8}|1([0-3][0-9]{7}|4([0-6][0-9]{6}|7([0-3][0-9]{5}|4([0-7][0-9]{4}|8([0-2][0-9]{3}|3([0-5][0-9]{2}|6([0-3][0-9]|4[0-8]))))))))).

[edit]

Since Bohemian noticed in his comment final regex can be in form -?regex1|-2147483648 so here is little shorter version (also changed [0-9] to d)

^-?(d{1,9}|1d{9}|2(0d{8}|1([0-3]d{7}|4([0-6]d{6}|7([0-3]d{5}|4([0-7]d{4}|8([0-2]d{3}|3([0-5]d{2}|6([0-3]d|4[0-7])))))))))$|^-2147483648$

If you will use it in Java String#matches(regex) method on each line you can also skip ^ and $ parts since they will be added automatically to make sure entire string matches regex.

I know this regex is very ugly, but just shows why regex is not good tool for range validation.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...