Quantifiers
- (.) - matches any single character except newline.
- (*) - between zero and unlimited times, as many times as possible, giving back as needed [greedy]
- (.*) - matches zero or more number of characters (except newline)
- (.*?) - matches any character (except newline)
- (*?) - between zero and unlimited times, as few times as possible, expanding as needed [lazy]
Example:
Test String: <xyz>vikram</xyz><abc>ffasdfsaf</abc><xyz>awdfafsd</xyz>safasf
1. Regular Expression: <xyz>(.*)<\/xyz>
Output: "vikram</xyz><abc>ffasdfsaf</abc><xyz>awdfafsd"
2. Regular Expression: <xyz>(.*?)<\/xyz>
Output: "vikram"
In the first regular expression the (.*) matches till the end of Test String(i.e (.*) is greedy) and then Reg-ex Engine backtracks to first occurrence of right boundary from the end .
In the second regular expression the (.*?) matches some elements of Test String(i.e (.*?) is lazy) from the beginning and checks for right boundary, if does not exists expands further until right boundary matches.
Use (.*?) instead of (.*) for efficient extraction.
No comments:
Post a Comment