The format values entered for the format overlay are dependent on the attribute types defined in the Capture Base. Especially, we have:
A regular expression, or shortened as “regex”, is a powerful way to search and filter strings. You could build a search pattern using character literals, operators, or constructs to match specific types of characters in a string. The OCA uses the Rust regex Flavour, with the full documentation that could be found here.
The character classes and ranges would appear inside square brackets ([...]). A character class would match a single character with the following pattern:
[abc] matches character a, b, or c.[^abc] matches any character that is not a, b, or c.[a-c] matches any character in the range a-c, that is, a, b, or c.[a-zA-Z] matches any character in the range of a-z or A-Z, that is, any uppercase or lowercase letter.[0-9] matches any single digit.abc|xyz matches abc or xyz.a* matches a repeated for zero or more times.a+ matches a repeated for one or more times.a? matches a repeated for zero or one time.a{n, m} matches a repeated for at least n and at most m times.You may have noticed that the regular expressions allow partial matching by default. E.g. abc actually matches any string that contains the pattern abc; it could be abc, abcde, or &0xyzabcdef-().
The following anchors would help.
^ matches the beginning of the string. ^abc only matches strings that begin with abc.$ matches the end of the string. abc$ only matches strings that end with abc.^abc$ only matches the exact string abc.. matches for any character.\ matches the following special character literally. \* matches the literal *.\d matches any single digit. It is equivalent to [0-9].\w matches any letter, digit or underscore character.\s matches any space character.| Target Strings | Regex Pattern |
|---|---|
| codes contain any capital latter | [A-Z] |
| codes with only capital letters | ^[A-Z]*$ |
| codes with only capital letters, or only with lowercase letters | ^([A-Z]*\|[a-z]*)$ |
| 10 characters codes, with capital and lowercase letters only | ^[A-Za-z]{10}$ |
| 5-10 characters codes, with capital and small letters only | ^[A-Za-z]{5,10}$ |
| messages, 250 characters max | ^.{0,250}$ |
Canadian postal codes (A1A 1A1) |
^[A-Z][0-9][A-Z]\s[0-9][A-Z][0-9]$ |
In OCA, we also use regex to deal with numeric attributes. However, regex does not understand numbers; it only matches them as characters. E.g. if we want an integer within the range $[10, 30]$, we actually use pattern ^([1-2][0-9])|(20)$ to match “character 1 or 2 followed by any digit; or characters 30”. It may be tricky to deal with complicated numeric conditions.
| Target Strings | Regex Pattern |
|---|---|
any string starts with a digit 0-5 |
^[0-5] |
| integer numbers between 1 and 50 | ^([1-9]\|[1-4][0-9]\|50)$ |
| integer numbers between -50 and 50 | ^-?([0-9]\|[1-4][0-9]\|50)$ |
any integer or decimal number, may begin with + or - |
^[-+]?\d*\.?\d+$ |
| decimal numbers between 0 and 1, inclusive | ^\+?((0?\.\d+)\|(1(\.0+)?))$ |
| decimal numbers between -90 and 90, inclusive | ^[-+]?(90(\.0+)?\|[1-8]?\d?(\.\d+)?)$ |
| decimal numbers between -180 and 180, inclusive | ^[-+]?(180(\.0+)?\|((1[0-7]\d)\|([1-9]?\d?))(\.\d+)?)$ |
| latitude and longitude (combination of the two above, separated with a single comma and space), see visualization below | ^[-+]?(90(\.0+)?\|[1-8]?\d?(\.\d+)?),\s*[-+]?(180(\.0+)?\|((1[0-7]\d)\|([1-9]?\d?))(\.\d+)?)$ |
A useful website for testing regular expressions is Regular Expressions 101. You could input any regex and type a series of test strings, then any matches found will be marked out.
If this is not done by default, please remember to check the regex flags g and m for easier testing.
A MIME type, defined by ITEF RFC6838, indicates the format for mostly file types.
All MIME types follow a basic template of two parts, separated by a single slash (/):
type/subtype
You could find a complete list of MIME types here. The following are some frequently used MIME types.
| Image | Video | Audio | Application | Text |
|---|---|---|---|---|
| image/png | video/mp4 | audio/mpeg | application/pdf | text/csv |
| image/jpeg | video/raw | audio/ogg | text/xml | |
| image/tiff | text/markdown |
ISO 8601 specifies an international format of date and time data. You could find a summary of the standard by Markus Kuhn here.
By ISO 8601, you could use the following representations:
YYYY for years, MM for months (in two digits, 01 through 12), and DD for days. Separated by a single dash (-) or nothing.Www, the literal W and two-digit week number ww, could be used after the year instead. An optional following D represents the weekday number, from 1 through 7, beginning with Monday. Separated by a single dash (-) or nothing.DDD, the ordinal date, could be used after the year instead. It is a three-digit number of days in a year from 001 through 365 or 366.hh for hours, mm for minutes, and ss (or ss.sss for a certain number of decimal places) for seconds. The time is led by a literal T and separated by a single colon (:) or nothing.Z for the time in UTC, or ±hh:mm ±hhmm ±hh for other time zones after the time representation.PnYnMnDTnHnMnS, PnW, or P<date>T<time>, with all capital letters being literals and all n’s being numbers, could be used to represent durations.<start>/<end>, <start>/<duration>, <duration>/<end>, or <duration> could be used to represent time intervals.Rn/<interval> or R/<interval>, with n for the number of repetitions, could be used to represent repeated intervals.The following are some ISO 8601 DateTime examples.
| Type | ISO 8601 Format | Example of a DateTime Allowed |
|---|---|---|
| date (year, month, and day) | YYYY-MM-DD |
2001-02-03 |
| date (year and month) | YYYY-MM |
2001-02 |
| date (year, month, and day), basic format | YYYYMMDD |
20010203 |
| date (year, month, and day) and time | YYYY-MM-DDThh:mm:ss.sss |
2001-02-03T04:00:00 |
| date (year, month, and day) and time, in UTC | YYYY-MM-DDThh:mm:ss.sssZ |
2001-02-03T04:00:00Z |
| time, with time zone offset (in hours) | Thh:mm:ss.sss±hh |
T04:00:00-05 |
| durations (in years, months, days, and hours) | PnYnMnDTnH |
P1Y2M3DT4H |