I do Regex searches on web pages to locate DOIs (digital object identifiers). It works well normally, but fails on web pages of the journal the Lancet. I noticed that on one page the HTML was almost 875777 bytes! When I truncated it to the leftmost 700000 bytes the search worked (and was very fast).
Is there a know limit to the length of the string Regex searches? If so, is this adjustable?