I would like to extract lines given a huge list that looks mostly like this:
35/00 DZTPROMAEBKVF ENCLOSURE SERVICES FAILURE
35/01 DZTPROMAEBKVF UNSUPPORTED ENCLOSURE FUNCTION
35/02 DZTPROMAEBKVF ENCLOSURE SERVICES UNAVAILABLE
35/03 DZTPROMAEBKVF ENCLOSURE SERVICES TRANSFER FAILURE
35/04 DZTPROMAEBKVF ENCLOSURE SERVICES TRANSFER REFUSED
35/05 DZT ROMAEBKVF ENCLOSURE SERVICES CHECKSUM ERROR
ASC/ . . . . .
ASCQ DZTPROMAEBKVF Description
3B/00 T SEQUENTIAL POSITIONING ERROR
3B/01 T TAPE POSITION ERROR AT BEGINNING-OF-MEDIUM
3B/02 T TAPE POSITION ERROR AT END-OF-MEDIUM
3B/03 TAPE OR ELECTRONIC VERTICAL FORMS UNIT NOT READY
3B/04 SLEW FAILURE
3B/05 PAPER JAM
3B/06 FAILED TO SENSE TOP-OF-FORM
3B/07 FAILED TO SENSE BOTTOM-OF-FORM
3B/08 T REPOSITION ERROR
3B/09 READ PAST END OF MEDIUM
3B/0A READ PAST BEGINNING OF MEDIUM
3B/0B POSITION PAST END OF MEDIUM
3B/0C T POSITION PAST BEGINNING OF MEDIUM
3B/0D DZT ROM BK MEDIUM DESTINATION ELEMENT FULL
3B/0E DZT ROM BK MEDIUM SOURCE ELEMENT EMPTY
Where the T and M fields are loaded. I want to get them out so that the above would parse down to:
35/01 DZTPROMAEBKVF UNSUPPORTED ENCLOSURE FUNCTION
35/02 DZTPROMAEBKVF ENCLOSURE SERVICES UNAVAILABLE
35/03 DZTPROMAEBKVF ENCLOSURE SERVICES TRANSFER FAILURE
35/04 DZTPROMAEBKVF ENCLOSURE SERVICES TRANSFER REFUSED
35/05 DZT ROMAEBKVF ENCLOSURE SERVICES CHECKSUM ERROR
3B/00 T SEQUENTIAL POSITIONING ERROR
3B/01 T TAPE POSITION ERROR AT BEGINNING-OF-MEDIUM
3B/02 T TAPE POSITION ERROR AT END-OF-MEDIUM
3B/08 T REPOSITION ERROR
3B/0C T POSITION PAST BEGINNING OF MEDIUM
3B/0D DZT ROM BK MEDIUM DESTINATION ELEMENT FULL
3B/0E DZT ROM BK MEDIUM SOURCE ELEMENT EMPTY
I’ve tried the obvious of “^\d\d/\d\d\s+…T…M…\s+.*”, but it seems to stop parsing at the odd lines (lines 7 and 8 above) and there are lots more odd lines … Also, I’d like to get lines where either the T or the M is set instead of both.
[quote=374389:@Kem Tekinay]There is a disconnect here. Your pattern looks for lines that start with two digits, but some of these lines start with digit-letter.
Edit: I’m assuming that the lines can start with hex digits/hex digits, so I’ll craft the pattern that way.[/quote]
D’oh! I copied and pasted from a different section of the document than where I was looking. Glad I did, otherwise the HEX stuff would have been a mystery!
Yes - the format is always like that. ASC/ASCQ in hex, 2 spaces, list of 13 flags (and I only want the T and M entries), 2 spaces, and ending with the plain text.
This will identify lines that have either M or T, not both:
(?x)
# starts with hex*2/hex*2
^[[:xdigit:]]{2} / [[:xdigit:]]{2}
\\x20\\x20
# next fifteen can contain T or M, not both
(?=.{0,11}[MT]) # lookahead to make sure M or T in the next 12
(?|
([DZTPROAEBKVF\\x20]{12}) # no M
| # or
([DZPROMAEBKVF\\x20]{12}) # no T
)
# rest of the line
\\x20\\x20
.*
Note: This assumes the codes given are the only ones allowed.
Basically, the SCSI ASC/ASCQ codes are standardized, but the T-10 committee’s list includes all 13 device categories in this huge listing. I just need the tape (T) and medium changer (M) entries from the 1,000s of lines.
I guess it’s all a matter of perspective. I believe your solution to be more complicated, but I would modify it to use Mid to grab the codes instead of splitting on double-spaces.