When creating a password, people will often make substitutions to make it a readable word while still using symbols. For example, they might use “pa55word” because the “5” looks like “s”. Other examples: “p4ssword”, “p4ssw0rd” (that’s a zero), etc.
I came up with a list of the potential substitutions that might be used:
So, “a”, “h”, “4”, and “@” are interchangeable, “b”, “e”, “3”, and “8” are interchangeable, etc. (Ignore the slashes, they are there because these will be turned into regular expressions.)
Have I missed any?
(I’ve created a method that will check a given password against the list of the 10k most common passwords, but want to include variations based on these substitutions.)
Seriously, these things have already been well mapped out by crackers. They have tools that take dictionary words and try any number of crazy substitutions. These days, the only truly “secure” password is a completely random and long one.
Eric: Hmm, I’d have to do those manually. I’ll think about whether that’s worth it, but I thank you.
Thom: Your advice is don’t… what? Don’t check against the list of 10k passwords, or don’t try to eliminate based on variations? Since I’m just creating a tool for programmers, I’m not sure what I’m not supposed to be doing. I also don’t see the downside of restricting variations if I were to put this into practice.
I added both "" and “/” as aliases for “i” and “l”, “m” as an alias for “a” (to get those cases when someone did “AA” to mean “M”, and then added this code:
Here is the code so far. The basic idea is that it takes the characters of the given string and turns it into a regular expression that is run against the 10k most common passwords. If it finds something, it returns it in the result. So “\/\/ord” would return “password”, for example.
The name of the method will probably change.
Protected Function VariationOn10K(pw As String) As String
// Checks to see if a variation of the given password is on the 10,000 list.
// Makes some common substitutions.
static substitutions() as string = Array( _
"ahm4@", "be38", "c({[", "d)]}", "g69", "il|1!/\", "o0q", "s5$", "t+7", "vw" _
)
static allSubstitutions as string = join( substitutions, "" ).ReplaceAll( "\", "" )
static subPatterns() as string
if subPatterns.Ubound = -1 then
redim subPatterns( substitutions.Ubound )
for subIndex as integer = 0 to substitutions.Ubound
dim group as string = substitutions( subIndex )
dim chars() as string = group.Split( "" )
for charIndex as integer = 0 to chars.Ubound
dim thisChar as string = chars( charIndex )
if ( thisChar >= "0" and thisChar <="9" ) or ( thisChar >= "a" and thisChar <= "z" ) then
// Do nothing
else
chars( charIndex ) = "\\x" + EncodeHex( thisChar )
end if
next charIndex
subPatterns( subIndex ) = join( chars, "" )
next subIndex
end if
// Massage the password
pw = pw.ConvertEncoding( Encodings.UTF8 )
pw = ReplaceLineEndings( pw, "" ) // Shouldn't be line endings anyway, but just in case
static squeezerRX as RegEx
if squeezerRX is nil then
squeezerRX = new RegEx
squeezerRX.Options.ReplaceAllMatches = true
squeezerRX.SearchPattern = "(?mi-Us)(.)\\g1+"
squeezerRX.ReplacementPattern = "$1"
end if
pw = squeezerRX.Replace( pw )
// Substitute letters made from slashes
// (longer to shorter)
pw = pw.ReplaceAll( "\\/\\/", "w" )
pw = pw.ReplaceAll( "/\\/\", "m" )
pw = pw.ReplaceAll( "/-/", "h" )
pw = pw.ReplaceAll( "/_/", "u" )
pw = pw.ReplaceAll( "\\_\", "u" )
pw = pw.ReplaceAll( "\\_/", "u" )
pw = pw.ReplaceAll( "/_\", "u" )
pw = pw.ReplaceAll( "\\/", "v" )
pw = pw.ReplaceAll( "_\", "j" )
pw = pw.ReplaceAll( "_/", "j" )
pw = pw.ReplaceAll( "/_", "l" )
pw = pw.ReplaceAll( "\\_", "l" )
pw = pw.ReplaceAll( "/\", "a" )
dim chars() as string = pw.Split( "" )
dim rx as new RegEx
rx.Options.ReplaceAllMatches = true
// Turn the password into a pattern
for charIndex as integer = 0 to chars.Ubound
dim thisChar as string = chars( charIndex )
if allSubstitutions.InStr( thisChar ) = 0 then
// Won't be a substitution so replace it with its value
thisChar = "\\x{" + EncodeHex( thisChar ) + "}"
else
for subIndex as integer = 0 to substitutions.Ubound
dim thisSub as string = subPatterns( subIndex )
rx.SearchPattern = "[" + thisSub + "]"
rx.ReplacementPattern = "[" + thisSub.ReplaceAll( "\", "\\\" ) + "]"
thisChar = rx.Replace( thisChar )
if thisChar.Len <> 1 then exit
next
end if
chars( charIndex ) = thisChar.DefineEncoding( Encodings.UTF8 )
next
// Remove dups from the list
for i as integer = chars.Ubound downto 1
if chars( i ) = chars( i - 1 ) then
chars.Remove i
end if
next
dim pattern as string = "^.*" + join( chars, "+" ) + "+.*$" // Any of the characters may repeat
// Now see if this pattern is within the 10K.
dim r as string
rx.SearchPattern = pattern
try
dim match as RegExMatch = rx.Search( kTenThousandString )
if match <> nil then
r = match.SubExpressionString( 0 )
end if
catch err As RegExSearchPatternException
end try
return r
End Function
Are you trying to calculate the quality of the password? In that case, do you know about the entropy calculation?
Anyway, I usually use easy-to-type passwords, e.g. I prefer letters in a row or in some pattern based on the location of the keys. Think of “qwert”. While that’s probably not in any dictionary nor matched by your above efforts, it’s still a very bad choice as a password, I’d think.
What I’m trying to say is: Isn’t your work a bit futile, as you’ll still get a lot of false positives, i.e. believe “asdfgh” is a rather good pw when it’s not?
No, this has nothing to do with “quality” of the password. There is a list of the 10,000 most commonly used passwords so I’m creating a tool that a programmer can use to disallow those specific passwords, including parts and variations. Any additional checks for “quality” would be up to the programmer, although I may add tools for that too.