Enterprise Security Risk Management

The Challenge with Corner Cases

October 8, 2023Doug Leece

“All models are wrong, some are useful”, George Box

Attributed to a British statistician, this phrase is still being quoted almost 50 years later and has likely inspired aphorisms like “perfect is the enemy of good” and “80-20 rule” and “Better a diamond with a flaw than a pebble without”, Confucius?

Following up on that, perhaps just due to fear of publishing invalidated opinions still lingering from university, there are multiple examples accepting things as fact despite corner cases disproving the hypothesis or being so obscure no one previously considered them. We can speculate that no one ever pondered “what would happen if two black holes meet somewhere in the universe and decide to combine into one and we recorded it” but on September 14^th, 2015, this exact scenario did occur and confirmed Einstein’s theory on gravitational waves beyond refute. Ironically even Einstein himself wasn’t that confident in his theory and would periodically change his position on the matter, yet anyone using GPS has benefited from his understanding of the time/space relationship – flaws an all. So what does any of this have to do with security risk management in three words or less?

Strategic Business Alignment

One of my regular questions on the podcasts is how our guests help their clients identify the point of diminishing returns. Primarily due to my personal tendency to identify the handful of scenarios where a solution may not be effective, and I could use some ideas on accepting good versus the costly pursuit of perfection. Revisiting “strategic business alignment”, as risk professionals we need to accept that organizations often embark down a path without a complete understanding of how they will handle everything that comes up. The concept of minimum viable product was a Silicon Valley darling for years, can partially inform the risk assessment model. The MVP model could be cynically described as: get people using your software, fix things that really are an issue after enough customers report them rather than agonize over every possible use case, and hopefully don’t run out of money before becoming profitable. The Agile Alliance does point out that settling for just enough that people will buy something doesn’t make a product viable in the long run and we see see that regularly with cracks in the cloud infrastructure, financial meltdowns an so on.

Back to “all models are wrong”, if we take the MVP concept seriously, part of our effort will be to analyze why things are not going as planned, ideally looking for the root cause rather than applying duct tape and pushing on with the next release or growth initiative just to keep things on schedule. Schedules are important, but the courage to miss a date for safety, quality, or some other darn good reason will be rewarded in the long run.

I love deadlines. I like whooshing sound they make as they fly by. Douglas Adams

I would be the first to point out that a business productivity application flaw will have far less significant impact on society than a flaw in water treatment, electrical generation or a piece of medical equipment but can we address flawed models via resilience? As risk professionals we often possess the uncanny ability to identify one or two scenarios that an existing or proposed control will fail to address. At this point we have the choice of saying “this is unacceptable because …” or we can ask those that may no more about some aspects of the problem or the organziations capability to respond – appearantly even Einstein had his doubts and would speak with others in his field –.

Posing a question like “if scenario one came to pass, what is the most credible and most extreme impact?” in a roomful of subject matter experts will most likely result in numerous lengthy responses, some contradictory, but themes tend to emerge. Most certainly watch for those extremes, I had dinner recently with a respected ICS security expert and completely agree with the position that some outcomes, no matter how unlikely, are too significant to knowingly leave to chance. If we as risk professionals identify such a scenario I believe we should resist “damn the torpedoes” with everything we have if professional ethics mean anything. That said, in most cases, the worst possible outcome may be highly undesirable but recoverable. There is a generational impact level difference between a cardboard box for a C-Suite member and a nuclear wasteland or polluted water.

Many corporate boards list “cyber security risk” in their top 5, and reviewing a firewall or application security log for five minutes will confirm the threat is very real. That said, I know of no business that has decided to shut everything down because “things are just to challenging these days“, ironically many are openly evaluating if machine learning, process automation and cloud computing can give them marketplace advantage.

In a business world where many run toward the fire instead of from it, can we help those we serve balance the many enterprise risks, not just cyber, to give them the greatest likelyhood of a successful outcome? Tim recently recommended a book on becoming a trusted advisor, which includes a great deal of discussion on dealing with mistakes. Theoretically solving for all corner cases and missing the opportunity window ultimately doesn’t serve anyone.

One Human Error from Business Disruption at a National Scale?

July 3, 2023July 3, 2023Doug Leece

Those listening to the podcast episodes month over month may notice a theme emerging, identifying and working toward protecting a path to operational resilience is typically what matters most to an organization. For the second year in a row the Caffeinated Risk Summer show coincided with a widespread outage of a major Canadian business. On July 8th 2022 Rogers Communications reported a national network outage that saw millions without cell or internet service and thousands of retailers without the ability to accept Interac payments. June 25th, 2023 Suncor Energy Inc issued a press release confirming a cyber security incident that was obviously light on details beyond customer record safety but ensuing speculation pegged the impact at millions.

While many in the Calgary I.T. community know each other, details on the exact cause of the Suncor incident remain, as they should, tightly held so this post is focused on the publically observable outcomes. The Rogers and Suncor incidents are similar in timing and impact, early summer and payment card system availability, and potentially initial cause human error. While Rogers admitted the network outage was due to a mistake in the planned upgrade procedures, we have no insight into the actual cause of the Suncor incident nor shall we speculate. Instead, we can look at published data trends and government intelligence to complete the threat model, as Jack Jones and Jack Freund maintained in their seminal risk management text, “we often have more data than we think“.

The 2023 Cyberthreat Defense report was the basis for the Summer Show podcast and it is worth noting that the top two obstacles to cyberthreats were human factors. The Canadian Centre for Cyber Security lists numerous attack surface areas vulnerable to cyber threats including cyber crime. Cyber crime goes by various names such as phishing, ransomware, social engineering, business email compromise and so forth but the common element is a human inside the organization using the organization’s technology to engage with an adversarial force.

While some organizational leaders had been quick to assess human error as a staffing or skills issue, opting for ever more training and in some cases even threats of dismissal hopefully we are turning a corner on this legacy and rethinking our approach. ESRM takes a mission first focus on security prioritization focusing on business engagement and the Idaho National Labs CCE model has challenges us to look at each of those mission impacting scenarios, identify how cyber elements could play a part in disruption and reengineer around them. I am clearly a CCE fan, mentioning it on multiple episodes, buying copies of the book for my detection engineering teammates and sharing the program link with all unsuspecting folks who ask me about organizational resilience or operational technology security, but never mistake enthusiasm for the truth without testing. Whether that a software design flaw, process design flaw, or simply a stress induced cognitive error I believe we need to accept human error at some point in the system and design systems accordingly. The challenge of course is we cannot predict exactly where or how such errors will appear, therefore we need a different approach that “prevent everything” and “don’t screw up or your fired”.

The CCE book uses the term “hope and hygiene” as a failed security model often played out as compliance exercises, vulnerability scanning and simplistic user awareness training. Paraphrasing here, while such actions are important they neglect the time-tested reality that at some point in the future, a cyber related failure will happen, and the organization should be able to recover. The “all roads lead to Rome” idiom applied to resilience also shows up in the devops camps, very well summarized by luminary Mark Russinovich in a 2020 Microsoft blogpost and an off hand quote I overheard in an industry security summit this past winter who’s source shall remain anonymous due to subject sensitivity and my memory.

“Take a look at your network diagrams and all your maps of stuff. Close your eyes, put your thumb on something and say ‘XXXX now owns that’, and think through how you are going to get operations restored”

The digitalization genie is out of the bottle and we are increasingly dependent on interconnected supply chains, automation, cyber physical and virtualization systems for almost every aspect of our daily lives. This interconnectedness creates a list vulnerabilities that is approaching exponential, most of which will never come to pass, therefore identifying those key intersections of cyber element failure and cascading impacts become the brave new world security professionals must lead our organizations into. I am offering some awkward conversation starters, not as an affront to past leadership decisions but a chance to improve each of our security programs in meaningful ways going forward before we too fall victim.

Much of our defense posture relies on Active Directory controls and privileged account protection measures, how would we rebuild if we lost control of the corporate domain?
What would we do if an adversary re-encrypted all our backup systems and destroyed our active accounts databases?
We have ensured more than 95% of our workstations and servers are running a top-tier endpoint detection and response product, what would we do if an adversary were able to unhook that process?
What if there is a mistake in the next release of our custom system that we don’t pick up in UAT, how much could we stand to lose?
How long can we operate if our main WAN provider is unavailable for more than 8 hours?
How can we respond if an adversary takes control of our automated software installation platform to distribute their malware?

Admittedly these will not be easy conversations and every organization will need to do their own analysis. That said, let’s end this post on an optimistic note nothing is impossible once we are committed — even if it acceptance of loss. Consider the following:

There are many skilled and capable people working our ICT departments,
Cyber education is now mainstream, not a dark art,
Hardware and software quality is higher than it’s ever been while cost is going the other direction,
Organizations are investing in cyber security,
Rodgers did repair their nation wide outage in a couple days,
Interac did invest in network resilience,
Petro Canada point of sale services were restored in less than six days

Design Thinking & Security Controls

February 15, 2021February 16, 2021Doug Leece

During the green room chat before our first podcast episode, Rachelle Loyear and I began discussing “Design Thinking”. My personal experience was limited to a one day workshop with IBM a couple years back, but as you can hear in the podcast this is an area Rachelle has spent a lot of time exploring. Over the years I have worked with many high quality security products and for the most part the user experience,(UX), almost always felt like an after thought. This is not a slight against the companies that work very hard to bring us these products, and as techies we tend to want large volumes of information and lots of buttons on every screen but a recent personal experience has given me real cause for reflection on UX design, even for security tools.

The basic understanding of “Design Thinking” is best summed up in a quote from IDEO, the company that brought many of the current practices in this methodology forward over the last two decades.

“Design thinking is a human-centered approach to innovation that draws from the designer’s toolkit to integrate the needs of people, the possibilities of technology, and the requirements for business success.” Tim Brown, Executive Chair of IDEO

Do read Tim Brown’s books for the full understanding but essentially a good design will be desirable from a human perspective, technologically feasible and economically viable for the company. While most people think of products as design candidates, software applications have certainly adopted this focus on UX and a similar design thinking approach can be applied to services as discussed in “This is Service Design Thinking” by Marc Stickdorn

Venn diagram showing the intersection of FEASIBILITY
DESIRABILITY and VIABILITY

The “3I” model of “Inspiration, Ideation and implementation” was developed by IDEO 20 years ago and the definitions below are a bit of a composite of the various interpretations offered by different service providers.

Inspiration: Identifying the problem or opportunity that warrants a solution, primarily through considering the actual user of the product or service and the challenges they are facing with current offerings.

Ideation: This goes deeper than just brainstorming, promising ideas are further assessed with a multi-disciplinary team to develop fleshed out conceptual solutions, the best candidates can then be turned into testable prototypes.

Implementation: Prototypes are developed and tested with users, user feedback drives updated prototypes until finally moving from prototype to amarket place offering

The main premise behind design thinking is an interdisciplinary group working on a problem at the outset will develop new solutions that are more innovative and more likely to succeed than traditional R&D models. The challenge to companies more comfortable with the engineering followed by design approach is there is no one best way to move through the process and it may appear chaotic at times but the approach is much more mainstream now than when IDEO first started two decades ago. How we can help as security practitioners is to work within our organizations to ensure we are part of that interdisciplinary team when we hear terms like “design thinking” and “agile” associated with new projects. Without security specialists involved in these design and delivery activities the trend of last minute addon compromises is likely to continue.

To move this conversation from the academic to real life one could start with a review of Tim Brown’s short blog post on Empathy, which is one of the inputs into the inspiration phase listed above. Putting yourself in the position of the person interacting with a product or service. The empathy concept became very real recently when my partner, a physical therapy student, was required to spend 24 hours without the use of her dominant arm and perform day to day activities. Where feasible I supported by also forgoing the use of my right arm which made the challenges of modern computer security controls immediately obvious.

Long complex passwords are potentially impossible to type if a person has dexterity limits, for example try reaching shift/4 with two fingers to input the $ sign as a special character. The all to common “Ch@ngeMeN0w!” could be a very unpleasant onboarding password for a new employee or student. More critical accounts and remote access now typically require a secondary code from a smart phone app. Even with two hands I have struggled from time to time with responding quickly enough to Microsoft Authenticator validation requests.

Creating more inclusive, accessible work places and public services isn’t just a nice thing to do, it is actually a law in many countries within the world. No one at Caffeinated Risk is a lawyer but researching did uncover an interesting clause in the Americans with Disabilities Act of 1990, which does specify the need for information technology systems to be accessible through multiple means.

“An accessible information technology system is one that can be operated in a variety of ways and does not rely on a single sense or ability of the user. This is important because a system that provides output only in visual format may not be accessible to people who are blind or have low vision, and a system that provides information only in audio format may not be accessible to people who are deaf or hard of hearing. Some individuals with disabilities may also need accessibility-related software or peripheral devices in order to use systems that comply with Section 508.”

Section 508 mentioned in the text above is part of the Rehabilitation Act of 1973 and deals specifically with electronic information and technology. While governments may have been thinking data stored in electronic systems needed to be protected back in 1973, in 2021 there isn’t an enterprise any where that isn’t faced with that same requirement. To that end, we have created numerous information security policies which will include at least one policy on access control with passwords and MFA likely to be the defined requirements. Most password policies with also include very specific requirements pertaining to complexity and password length, account lock out thresholds and so forth.

It may be true that certain password complexity requirements and MFA solutions do not consider accessibility the need for securing access to electronic systems does not disappear. Although alternatives mechanisms such as biometrics are now commercially viable since they are included on both mobile devices as well as modern laptops availability does not always equal usability.

In Praise of Design Thinking:

Without participating the 24 hour exercise I am not sure I would have fully considered the impact many of our security controls might have on accessibility. Password logins are only one challenge, secure areas may not have badge readers at a practical height for someone in a wheel chair, the same could be said for key pads and pin based door locks. A number of ideas are already coming to mind on how we could solve some of these challenges but it will take experts in software interfaces, operating systems, physical design and policy creators to resolve these issues.

A quick search for accessibility features will show commercial operating systems like Windows do have a number of options for allow disabled people to interact with the operating system itself. I suspect the challenge will reside more with the applications running on top. For example, how many building access applications could make a similar claim? Full disclosure, I have not looked, so it would be great to identify any such offerings in the market and review how they met the design challenges.

As a call to action for our readers please feel free to comment on these three points to ponder. Depending on interest this may lead to more of this research and perhaps even a podcast guest with deep domain knowledge in this area.

How many organizations have provisions within their current information security policies to permit user authentication via methods other than passwords and token/time based multifactor authentication?

How many organizations have deployed a technology stack within their company that would facilitate secure access to enterprise resources without the use of a password?

How many organizations are now including accessibility features as requirements in their technology investments?

Supply Chain Risk Management – revisited?

January 30, 2021February 16, 2021Doug Leece

Since every level of the value chain now seems to require a complex web of suppliers who consume products and services of others with which they then provide products and services to their customers, supply chain risk management is yet another demand on the enterprise’s attention. Although cybersecurity risk within the supply chain was added to the NIST CSF in 2018, the recent events surrounding SolarWinds have taken us from debate and thought exercise to reconsidering the very tangible consequences of compromises in our core information and technology systems.

This is the first in a series of blog posts that will be undertaken by the Caffeinated Risk authors which will include both reflection on third party risk management programs implemented for enterprises and review of the most current guidance from organizations such as NIST, ISACA and ISA. Additionally we will be exploring potential mitigation approaches when the limits of standards and best practices leave residual risk unacceptably high.

Supply Chain Risk Management is now a specific NIST CSF category, (ID.SC), with five subcategories ranging from the more basic awareness and stakeholder agreement to recovery planning testing with suppliers. In addition to the core framework the CSF includes two additional components, tiers and profiles, which could support an organization defining the the appropriate level of risk management required for a specific capability. It is all too common for a blanket policy statement like “all systems will meet maturity level X.X” without consideration of their impact to the organization in the event the capability is compromised. While each organization is different, the blanket approach results in uneven levels of residual risk across the enterprises information and technology systems, often without a way to identify actual exposure since compliance to the defined level is what’s measured.

Effectively communicating risk to stakeholders is never simple so measurement against standards is often the agreed path. A lighthearted quote from a manager many years ago, “the best part about standards is there are so many to choose from”, didn’t make much sense when I was new in my career but decades later the irony has fully set in. On more than one occasion I have observed knowledgeable people debate control or security levels, later determining they were referring to different frameworks. NIST insiders may have also observed confusion at some point, as this guidance in implementing CSF tiers includes a differentiation with NIST security maturity levels.

“Tiers do not necessarily represent maturity levels. Organizations should determine the desired Tier, ensuring that the selected level meets organizational goals. Reduces cybersecurity risk to levels acceptable to the organization, and is feasible to implement, fiscally and otherwise.” NIST online learning

NIST CSF Supply Chain Risk Management Tiers

The tiers in the diagram above are described in detail by NIST CSF section 2.2 and can be used in both the assessment of current state and definition of a target state. The interesting nuance is the up front identification that each organization’s objectives will be unique and not all systems within the organization may need to have risk managed to the same level.

There is no need to rehash NIST CSF section by section, the resources are freely available to all. Instead, we are considering the supply chain security issue and how those risk profiles may appear in organizations within a credible but completely fictional scenario involving an I.T. managed service provider using a well established commercial software suite to monitor the networks and applications of it’s numerous customers.

Model of organizations within a supply chain

When an organization’s risk management practices fit into the NIST Tier 1:Partial category risk management is ad-hoc or reactive. Within the context of the cyber supply chain the majority of the people in this organization would be generally unaware of how their activities affect their clients and would not likely be monitoring their suppliers for any cyber security issues. This lack of awareness could easily compound as businesses source from other businesses with similar ad-hoc risk management programs.

If the I.T. managed service provider, (MSP), in this scenario has only a partial awareness of their impact in the supply chain they may overlook the cyber security practices of their input sources and focus more on meeting day to day operational requests from their clients. The customers of this MSP also in the NIST CSF tier 1 category are likely more focused on service level agreements, cost management and maintaining other parts of their business, essentially assuming everything is handled. Since many managed service provider contracts are outcome based a customer may not feel they need to be overly concerned with the software stack used by their MSP. A deeper dive into the COBIT 2019 and NIST 800-53 practices referenced in the ID.SC subtasks actually contradict this assumption and shine a light on where a tier 1 risk profile needs to begin improving.

“Identify and manage risk relating to vendors’ ability to continually provide secure, efficient and effective service delivery. This also includes the subcontractors or upstream vendors that are relevant in the service delivery of the direct vendor” COBIT 2019 APO10.04 Manage vendor risk

We can advance this fictional scenario by upgrading the risk management program tiers of both the I.T. MSP and Customer X to “Risk Informed”. The means the cybersecurity risk management practices are established at the management level but not effectively observed across the organization. “Risk Informed” can often take the form of written policies or even contractual clauses related to the ID.SC subtasks but interpretation and implementation is likely still missing on the front lines.

Due to the large amount of attention created by the SolarWinds compromise the I.T. workers at the fictional MSP may have taken it upon themselves to look into possible issues or they may have received a notification from the software vendor. Regardless of how the staff was advised, the response efforts were likely not well coordinated internally due to a lack of well understood roles and procedures.

Although FireEye did an excellent job of getting positive indicators out to the public quickly industry experts were quick to caution that not finding positive compromise indicators didn’t remove the need for deep investigation for anomalous behavior. These types of investigation take time and considerable skill, placing a lot of extra demand on I.T. and security staff. It would not be uncommon for the service provider not to initiate communication out to their clients due to ongoing investigation activities. The communication breakdown was not because something was being hidden but a lack of incident response processes that include stakeholder communication commitments. Similarly, Tier 2 customers may not have seen the need to reach out to their service provider to see if the software compromise is affecting or potentially affecting the service they are currently receiving. From their perspective, as long as the systems are up, everything must be okay.

To reach Tier 3: repeatable, the organization is aware of the cyber supply chain risks associated with the products and services it provides and that it uses.

Returning to our scenario Customer X has contracted the MSP to ensure their customer engagement solution is available 7X24 and responsive. This website is very important to both their brand reputation and revenue stream because service disruption and degradation drive customers to the competitor. Company X is also aware cybersecurity issues can appear, seemingly overnight, with technologies that were previously considered safe. They may have learned this lesson the hard way in 2014 when many websites were shutdown during the first few days of Heartbleed due to customer data actively in memory on those websites suddenly being accessible to anyone capable of running a python script.

Since Customer X is risk aware, their risk management process includes people within their organization who are accountable to know the customer engagement solution is being actively monitored by a third party, how that monitoring being done meets their needs and the level of cyber security capability/maturity of that external party. We can assume that the I.T. MSP was hired because they had a reasonably priced service offering and repeatable risk management practices which would be part of a tier 3 customer’s vendor selection criteria. This assurance may have been achieved through extensive vetting via questionnaire, process document reviews or even an onsite assessment.

If we are willing to accept that resilience is a much more pragmatic approach to cybersecurity risk management than insisting on prevention for all possible scenarios then NIST CSF Tier 3 is likely where we need all be aiming as a minimum goal for systems that meet the required criticality within the organization.