Below are some example test inputs.
Test inputs are ASCII-encoded strings.
TEST CASE INPUTS
arrhar = Array(100)
arrhar[1] = "Low Carb Orzo Low Carb Rice, High Protein, Great Low Carb Bread Company, Low Carb Pasta Rice, 7 g per pack"
arrhar[2] = "Helios Certified Organic Greek Orzo Pasta, 500gr"
arrhar[3] = "Barilla Orzo Pasta 15.73 oz."
arrhar[4] = "Pasta Granoro Il Primo Orzo 6 ounces per bag"
arrhar[5] = "Authentic Italian Orzo -- 6 OUNCE per bag"
arrhar[6] = "ORZO PASA 4 U! 1 BAGGY IZ 4.39-GRM"
arrhar.trim()
# `trim()` removes all elements of the array which have memory allocated, but no value assigned.
TEST CASE OUTPUTS
out[1] = "7 g"
out[2] = "500gr"
out[3] = "15.73 oz"
out[4] = "6 ounces"
out[5] = "6 OUNCE"
out[6] = "4.1-grm"
English Description of Regular Expression
Suppose that we represent a string-matching pattern as a bulleted list.
bullet (1) is the left-most part of the string.
bullet (2) is the right-most part of the string.
- Numeric Quantity
- Zero or more Latin numerals
- zero or one decimal points or commas
- Zero or more Latin numerals
- Optional Delimiter
- Zero or more of any character except chars from the classes
[A-Z]
, [a-z]
, and d
- Unit
- Grams
- Any case insensitive subsequence of "GRAMS"
a. "g"
b. "GRMS"
c. "gs"
d. "Gms"
e. et cetera...
- Ounces
- Z-ounces ... any case-insensitive substring of
OUNCEZ
- S-ounces ... any case-insensitive substring of
OUNCES
Regex Peices
Appropriate regular expressions the left-part (integer-part) of a numeric quantity might be:
d*
d{0,}
[0-9]{0,}
[0123456789]*
A regex for zero or one decimal points is [.,]?
A decimal number is d*[.,]d
There might, or be not be, a delimiter between the number and the unit-specification.
56.1gr
56.1 gr
56.1-grams
A suitable regexp for the delimiter might be [^a-zA-Z0-9]*
Suppose that we write a regex for the number and delimiter, but not the units (e.g. "ounces"). We might have:
d*[.,]?d[^a-zA-Z0-9]*?
I hope that the above would match "4.91...."
or "4.91 "
A regex for subsequences of "GRAMS" might be: [Gg]?[Rr]?[Aa]?[Mm]?[Ss]?
A regex which captures something like "4.1-grm"
is shown below:
d*[.,]?d[^a-zA-Z0-9]*?[Gg]?[Rr]?[Aa]?[Mm]?[Ss]?
How can we get both grams and ounces.
question from:
https://stackoverflow.com/questions/65647077/how-can-we-write-a-regular-expression-regex-to-identify-quantities-with-units 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…