Discovery and Validation of Business Models: How B2B Startups can use Business Experiments

In recent years, a common approach and conventional wisdom has urged founders to create a business plan that describes the size of the opportunity, the targeted problem, and the planned solution (Blank, 2013). It assumes that the target market is known, and the business model is validated (Garvin, 2000). However, these conditions are often not met by startups, which leads to many startups failing when executing an assumption-based business model (Lynn et al., 1996).


Introduction
In recent years, a common approach and conventional wisdom has urged founders to create a business plan that describes the size of the opportunity, the targeted problem, and the planned solution (Blank, 2013). It assumes that the target market is known, and the business model is validated (Garvin, 2000). However, these conditions are often not met by startups, which leads to many startups failing when executing an assumption-based business model (Lynn et al., 1996).
To support founders in searching for a business model, frameworks were created with the idea of conducting experiments in business settings. Technological advances in the last decade have lowered market entry barriers and the cost of running business experiments dramatically (Kerr et al., 2014). This has made business experimentation more viable and simpler to execute.
Moreover, characteristics specific to the B2B market influence how business experiments are designed. An implication for a B2B business is that more money is generated by fewer customers (Croll & Yoskovitz, 2013). This implies that it is more difficult to provide statistical significance because of small sample sizes. Additionally, the role of decider and user might not come from the same person as usual with consumers, making it harder for a startup to sell a product to a company (Croll & Yoskovitz, 2013). Thus, more research is needed to determine how B2B startups can specifically use business experiments, allowing founders to learn about the business model. Recently, the COVID-19 pandemic has limited personal contact with customers and thus influenced the methodology of this research. These trends demand additional research in the field.
In line with Berglund, Dimov and Wennberg (2018) The true method of knowledge is experiment. William Blake Poet, painter and inventor "The Argument" (1788) Startups searching for a business model face uncertainty. This research aims to demonstrates how B2B startups can use business experiments to discover and validate their business model's desirability quickly and cost-effectively. The research study follows a design science approach by focusing on two main steps: build and evaluate. We first created a B2B-Startup Experimentation Framework based on well-known earlier frameworks. After that, we applied the framework to the case of the German startup heliopas.ai. The framework consists of four steps (1) implementation of a measurement system, (2) hypothesis development and prioritization, (3) discovery, and (4) validation. Within its application, we conducted business experiments, including online and offline advertisements, as well as interviews. This research contributes in several ways to the understanding of how B2B-startups can use business experiments to discover and validate their business models: First, the designed B2B-Startup Experimentation Framework can serve as a guideline for company founders. Second, the results were used to improve the existing business model of the German B2B startup heliopas.ai. Finally, applying the framework allowed us to formulate design principles for creating business experiments. The design principles used in the study can be further tested in future studies.
entrepreneurs, the goal of this research is twofold. First, the study provides empirical insights on the application of business experiments to the business model development process of B2B companies. Second, we investigate how to run and design these experiments. Berglund et al. (2018) recommended creating context-specific design principles in the form of pragmatic recommendations. Thus, this research focuses on extracting practical design principles that support entrepreneurs in improving their business experiment activities. This research takes a problemsolution approach aimed at extracting practical contributions for B2B startups, instead of focusing on enriching the existing theoretical body of literature. The study answers the following question: How can startups in a B2B market use business experiments to discover and validate the desirability of their business models quickly and cost-effectively?
To answer this question, we followed a two-step process proposed by March and Smith (1995), and tailored the research process by focusing on the case of a real life startup named heliopas.ai. The case of heliopas.ai suits this research well as it is searching for a business model that incorporates selling an application called WaterFox to farmers (a B2B context). The mission of the startup heliopas.ai is to provide farmers with accurate data about soil moisture combined with simple recommendations for more efficient field irrigation. The startup uses machine learning and multiple data sources such as satellite imagery and local weather databases to gather the data. No equipment or high-priced sensors are necessary to use the application WaterFox, an advantage deemed beneficial as it saves customers unnecessary costs and adoption efforts. A framework was created that will serve as a guideline to conduct tailored business experiments for heliopas.ai that consider the limitations of startups regarding time and money. These business experiments would help to discover and validate the desirability of their business model in a B2B market. As the startup is based in Karlsruhe, Germany, the initial market consisted of local farmers in Baden-Wuerttemberg and Rhineland-Palatinate.
This research proposes a B2B-Startup Experimentation Framework with a four-step solution that reduces uncertainty and improves a startup's business model. The framework builds on existing processes and principles, and combines them in one comprehensive framework that serves as a guideline for conducting B2B business experiments. By applying the framework to the context of heliopas.ai, the researchers were able to evaluate the proposed framework's applicability. Additionally during the research, the business and operations of heliopas.ai were adjusted due to the findings, resulting in better understanding of the suitability of channels, value proposition, customer jobsto-be-done, customer segment, and product performance.
This paper is structured as follows. The next section provides a brief review of theories and frameworks used to develop the B2B-Startup Experimentation Framework. In section 3, we lay out the methodology. In the consecutive parts, we develop the framework and apply it to the startup heliopas.ai. Next, we summarize the findings and further elaborate in the discussion section. Finally, we discuss the limitations of the study, practical implications for startups and researchers, and conclude the paper.

Discovery and Validation
Discovery marks the initial step in the search for a business model. The goal is to explore if the general direction of thought regarding a business model is correct and to gain more insights (Bland & Osterwalder, 2020). Discovery suits the early steps of experimentation. Since a startup operates under great uncertainty (Ries, 2011), decision making is done under ambiguity, with little to no knowledge about alternatives and consequences (Cooremans, 2012). Generally, validation activities ensure that customers' needs and the defined requirements are met (Albers et al., 2017). The validation process of a business model determines and ensures a correct direction of thought, and also confirms findings from the discovery step (Bland & Osterwalder, 2020). Thus, validation becomes the second step in the search for a business model (Bland & Osterwalder, 2020).

Business Model and Risk Factors
According to Brown and Katz (2009), an early business model entails three risk factors: desirability, feasibility, and viability. Desirability shows the risk of a business model regarding the market, demand, communication, and distribution. Feasibility defines the risk when a business cannot access key resources, perform key activities, or find key partners (Brown & Katz, 2009 Blank and Dorf (2012) is an iterative, customer-focused approach in the search for a business model. It incorporates business experiments and consists of two steps: customer discovery and customer validation. Customer discovery aims at finding initial customers by deriving testable hypotheses that collect possible experiment designs (Blank & Dorf, 2012), and finally conducting the experiments. These initial experiments determine if the envisioned value proposition matches a targeted customer segment. In the next step, the proposed solution is presented to customers to learn if it serves customer needs, and to assess customer willingness to pay for it. Customer validation requires applying the business model that results from the previous step (Trimi & Berbegal-Mirabent, 2012). The goal is to test whether the business model is repeatable and scalable. This is done by running more quantitative, high-fidelity experiments and acquiring actual sales, which will show how money spent in sales and marketing can generate revenue.

Lean Startup
The Lean Startup aims to reduce waste while creating a business model. It has three key principles: to replace planning with experimentation, the 'getting out of the building' approach by Blank, and lastly, agile development (Blank, 2013). The experimentation process is described by the Build-Measure-Learn feedback loop consisting of three steps: build, measure, and learn. In the step build, it is essential to create a Minimal Viable Product (MVP) quickly after identifying the most important hypotheses (Ries, 2011). The goal of building an MVP is to identify the proposed solution's potential (Kerr et al., 2014) and the target customers' willingness to pay for it. The measure step aims at collecting data that can verify or falsify the hypotheses made about product quality, price, and costs (Ries, 2011). In the learn step, the goal is to learn about the investigated hypotheses from collected data. The learning process shows whether an underlying hypotheses can be verified or not, and indicates if the MVP is a viable solution to the customer problem (Ries, 2011).

Four-Step Iterative Cycle
The Four-Step Iterative Cycle describes a structured procedure of business experimentation that undergoes sufficient revenue or requires too much cost to make a profit; that it won't be viable (Bland & Osterwalder, 2020). This research focuses on reducing the desirability risk in business models. It therefore focuses on the following business model components: customer segments, value proposition, channel, and customer relationship. Additionally, we explore a revenue model in terms of a customer's willingness to pay for heliopas.ai's application offer.

Business Experiments
Business experiments attempt to take a scientific approach for generating insights into a company's business model (Thomke, 2019). They reduce risk and uncertainty by yielding evidence regarding an underlying hypothesis (Bland & Osterwalder, 2020). Experiments can demonstrate a causal relationship by measuring the effect an action has on a situation (Hanington & Martin, 2012). Business experiments in startups are often run cheaply and quickly (Aulet, 2013). For this paper, business experiments can be distinguished based on how they are aligned with their purpose -discovery and validation -as in the previously provided definitions. Discovery experiments test if the general idea behind a concept is acceptable, intending to establish a proof-of-concept. Validation experiments are experiments with higher fidelity. They yield stronger evidence and require more resources, such as time, personnel or money.

Growth Hacking
Growth Hacking aims at fast and sustainable growth through activities in the area of market research, product development, and customer retention (Ellis, 2017). The Customer Acquisition Funnel is a core element of the Growth Hacking framework. It consists of five stages: acquisition, activation, retention, revenue, and referral (McClure, 2007). In the acquisition stage, the goal is to figure out through which channels users, customers, and visitors are coming from, in a way that results in value for a startup (McClure, 2007). Secondly, the activation element shows how many acquired users have a positive first impression of the product (McClure, 2007). Retention measures whether users keep using the product. The revenue stage measures customers' willingness to pay, whereas the referral stage measures if users enjoy the product enough to recommend it to a friend (McClure, 2007).

Methodology
To create a business experimentation framework for B2B startups and gain insights on how B2B startups can use business experiments to discover and validate their business model quickly and efficiently, this research applied a two-step research process based on "design science" insights, as suggested by March and Smith (1995). The two-step research process in design science consists of a build and evaluate step, which can be summarized as follows.
The build step for this paper undertakes a literature review to shape a framework based on existing knowledge and practical experiences of respected practitioners (Thomke, 2003;Ries, 2011;Blank, 2012;Ellis, 2017), as well as previous research conducted in this field (Thomke, 2003). Practical knowledge is very popular among entrepreneurs. Although it is not grounded in theory itself, it is considered a valuable source of knowledge in this field. the design, build, run, and analyze steps iteratively until it achieves desired outcomes. The first step design uses existing insights from observations and previous experiments to formulate testable hypotheses, and design suitable experiments (Thomke, 2003). In the build step, researchers build physical or virtual prototypes or models to conduct experiments (Thomke, 2003). The higher a prototype's fidelity and functionality, the stronger the generated evidence will be (Thomke, 2003). Subsequently, the experiment is run either in a more controlled laboratory setting or in a real-life setting, which produces higher external validity (Thomke, 2003). Finally, the results are analyzed by comparing them to an expected outcome. If the hypothesis addressed by the experiment is answered sufficiently, the experimentation cycle is stopped (Thomke, 2003). Otherwise, researchers reenter the design step with a modified experimental design, adjusted according to new insights gained in the process. Table 1 summarizes the presented frameworks and shows an initial comparison with the framework designed for this research.

Design and Application ofthe B2B-Startup Experimentation Framework
Based on the frameworks described in the theoretical part of this paper, we designed the B2B-Startup Experimentation Framework (B-SEF) and outline it in the following way. It consists of both a macro experimentation and micro experimentation framework (see Figure 1).
The macro experimentation framework consists of four steps. First, it involves designing a simple measurement system to collect data on acquiring and retaining new customers. The idea of implementing such a measurement system originates from the Growth Hacking methodology. Applying it for this research was feasible because the use case startup already has a vision for its business model and technology integrated into a Next, we apply the framework to the particular case of heliopas.ai, a real life startup that wants to improve its business model. This constitutes the evaluate step of the two-step process, which aims to show whether the created framework fulfills its purpose. Furthermore, the application allows researchers to deepen their knowledge about how to run business experiments empirically. Qualitative and quantitative data was collected during several business experiments. We used this empirical data to develop insights into heliopas.ai's business model. Also, we describe applying the framework and conducting business experiments, which resulted in formulating design principles that serve as recommendations for conducting business experiments. The design principles can be regarded as a basis for future research that focuses on further investigating the value of business experiments.  The B-SEF was applied to the startup heliopas.ai to gain insights into its business model and to empirically evaluate the framework's applicability. We note as important that restrictions of cost and time were present in this study, based on a budget of less than 100, and less than four weeks to design and run each experiment.
(1) Designing the Measurement System Growth Hacking relies on experiments, and thus must collect data. A common way to determine how to design a measurement system that suits the purpose of data collection is to use the Customer Acquisition Funnel, described above. Consecutively, we designed the customer journey for potential customers of heliopas.ai based on multiple metrics, which were defined and tracked. Table 1 provides an overview of these metrics, their definition, and their application in the Customer Acquisition Funnel stages. smartphone-application tested by selected customers. The data is used to calculate conversion rates and customer acquisition costs (CAC), as well as estimate customer lifetime value (CLV). Second, the Business Model Canvas (Osterwalder et al., 2010) is used to collect and prioritize hypotheses about the business.
Business experiments are conducted in the two steps discovery and validation (Bland & Osterwalder, 2020), thereby incorporating, specifically, discovery and validation experiments. By doing so, this research follows the recommendation of Blank and Dorf (2012) who suggest to treat the search for a business model as a two-step process of discovery and validation. In the discovery step, business researchers aim at gaining insights quickly and cost-effectively, as timing can be critical for a startup's success. As emphasized by Ries (2011), the goal is to learn quickly about the business model's desirability. In the validation step, researchers design experiments to gather more reliable evidence. By adding a control group and running experiments simultaneously, the effects of external variables will be reduced.
The micro experimentation framework is adapted from the Four-Step Iterative Cycle by Thomke (2003). All business experiments are conducted and presented in a structured manner by following a micro Discovery and Validation of Business Models: How B2B Startups can use Business Experiments Patrick Brecht, Daniel Hendriks, Anja Stroebele, Carsten H. Hahn, & Ingmar Wolff The data from the retention stage was measured by building an Excel sheet that processed data from the startup's database. This data was used to calculate daily retention metrics. Beneficial to this approach was that the researchers could manually filter users, since user names and further contact information were available.
The revenue stage consisted of metrics from paying users, who were paying to use the WaterFox application.
The referral stage was not tracked due to focus on the other stages. The collected data is presented in Figure 2. The three user categories are summarized as active users.
We used the absolute number metrics for a certain period to calculate the conversion rate between customer journey phases via app store product site impressions to app downloads and app downloads to registrations. We used the conversion rate to estimate Customer Acquisition Costs (CAC), since the customer journey could not be tracked after customers leave the landing page and are referred to the app store.

Discovery and Validation of Business Models: How B2B Startups can use Business Experiments Patrick Brecht, Daniel Hendriks, Anja Stroebele, Carsten H. Hahn, & Ingmar Wolff
The acquisition stage consisted of metrics on total traffic generated by sites related to WaterFox. We extracted traffic data on the Facebook landing-page from Facebook's analytics. We tracked traffic on the WaterFox web landing-page with Google's analytics.
The activation stage consisted of metrics from app store product site impressions in the Apple and Google Play app stores. Additionally, we tracked the downloads of the WaterFox application from both app stores and new user registrations in the WaterFox application. Likewise, App store product site impressions and downloads were tracked with the App Store Connect and Google Play Console.
For the retention stage, we defined three metrics. The users were split into the three categories, occasional user, standard user, and heavy user, based on frequency of signing into the application in the last seven days. Since small errors in data had a high impact (for example, activities of developers in the application inflating the data), it was necessary to get data first-hand that was adaptable and transparent. landing page to evaluate channel suitability. We calculated the Customer Activation Rate (CAR) from recorded data. CAR is defined as the number of active persons on the landing page, divided by the number of visitors. A threshold of 20 , a common value for experiments in a startup environment, was defined for CAR. Additionally, the calculated CAC is going to help to evaluate the channel's viability. Table 2 provides an overview of the discovery experiments conducted.
Facebook advertisement experiment. The Facebook advertisement experiment analyzed whether customers were pulled to the WaterFox application via advertisements on Facebook. In the Facebook advertisement manager, the target customer was set to the current persona. A customer journey was designed, leading customers from an advertisement on landing page, to the app store, and finally to the application. To measure all online activities on the landing page, we

Discovery and Validation of Business Models: How B2B Startups can use Business Experiments Patrick Brecht, Daniel Hendriks, Anja Stroebele, Carsten H. Hahn, & Ingmar Wolff (2) Hypothesis Collection and Prioritization
To collect and prioritize initial hypotheses, we used the common Business Model Canvas for heliopas.ai. Our focus on coming up with a desirable business model, drew upon the building blocks value proposition, customer segment, channels, customer relationships, and revenue model with greatest interest. We prioritized our hypothesis resulting in a focus on channels according to the founders' vision of selling their application online. This would allow them to distribute the app efficiently at a low CAC and easily reach early adopters. Hence, in the following section, our attention will turn to experiments exploring the channels.

(3) Discovery Experiments
For each discovery experiment, we tracked the number of impressions, clicks on the advertisement leading to the landing page, and download-button clicks on the LinkedIn advertisement experiment. The LinkedIn advertisement experiment investigated whether customers were pulled to the WaterFox application via advertisements on LinkedIn. Hence, a LinkedIn advertisement was designed to test the hypothesis. The potential customers were sent to a landing page that revealed detailed information about the value proposition of the WaterFox-application. The target audience was defined as persons interested in agricultural topics. The target audience was set to males between the age of 25-34 years, meant to represent the startup's current target customer persona. For this experiment, the customer journey from the previous experiment was again reused, with the only difference that now the customer started at the designed LinkedIn advertisement. Data was collected using LinkedIn's campaign manager connected to the landing page by implementing a JavaScript tag to count conversions on the landing page. The LinkedIn experiment ran from May 5 until May 10, 2020, with a budget of 30 in Germany, Austria, and Switzerland, resulting in 6 clicks on the advertisement, and 2 download-button clicks. The results meet the expectation threshold with a CAR of 33.33 .
Press article experiment. The press article experiment tested whether customers could be pulled via an article in an agricultural newspaper into the WaterFox application. The hypothesis was formed by interviews that the startup conducted with customers during the business experimentation process. To test the hypothesis, we sent a press release to several newspapers. To contact them, we used the network of the local startup accelerator for support. The press release contained important information about the WaterFox-application. Underneath the article, a link referred directly to the landing-page of the WaterFox application. Referrals from this link were tracked using Google Analytics. Manager. The advertisement was run for seven days from April 22 to April 28, 2020, with a budget of 31.20 resulting in 4.557 impressions, 5 clicks on the advertisement, and 0 download-button clicks on the landing-page leading to a CAR of 0 , which was below the set threshold of 20 . Therefore, the hypothesis was falsified, and we decided to try a different channel in the next discovery experiment.
Google advertisement experiments. The Google advertisement experiment investigated whether customers are pulled to the WaterFox application via Google Ads. It ran as a smart campaign, which means that bidding, targeting, and ad creation were automated by Google Ads (Google, 2020). The landingpage from the Facebook advertisement experiment was reused, providing customers with an almost identical customer journey. The experiment ran for three days from April 27 to April 29, 2020, with a budget of 27.31, resulting in 48.211 impressions, 89 advertisement clicks, and 4 download-button clicks. This resulted in a CAR of 4.49 , which is also lower than the expected 20 . Hence the hypothesis was falsified. The Google Ads manager provides further information about the keyword performance, types of devices by targeted customers, and advertisement networks. The highest CAR in the Google Display Network was 11.11 . Due to this performance indication, we decided to conduct a second Google advertisement in the Google Display Network.
The Google Display Network experiment used a display advertisement that was only shown on certain websites and not in Google Search. The experiment ran for three days from May 6 to May 7, 2020, with a budget of 29.98, resulting in 69.100 impressions, 539 advertisement clicks, and 8 download-button clicks. This led to a CAR of 0.15 , which was below the set threshold of 20 . Therefore, the hypothesis was falsified. Due to this result, we decided to explore other Table 3. Results of Discovery Experiments. statement included whether the customer actually worked in the job, how often it was completed in the last four weeks, and how much time and money were required. Additionally, interviewees were queried about several elements of their farm. This was done to place the provided information into the right context and to help avoid biased or misleading results. The interview was conducted via phone. We read out the statements about jobs to the participants and recorded their responses. Of the 34 contacts available from the previous brochure experiment, we contacted 31. Five contacts agreed to an interview. The remaining showed an unwillingness to be interviewed, mostly due to time pressure and a high workload, because seasonal workers were limited due to the COVID-19 pandemic. The interviews were encoded to categorize the answers and systematically extract the results. We present the results from the validation experiments briefly in the following section.

Results
We gained insights into the company's channels, value proposition, customer segment, and product performance. Table 3 summarizes the experimental results conducted in the discovery phase of the B-SEF. The collected data shows the Facebook and Google Ads did not reach the predetermined conversion threshold of 20 . In contrast, the LinkedIn advertisement and press article experiment exceeded the threshold. The cost of acquiring one registered user was 153.85 for LinkedIn and 4.13 for the press, based on the measurement system and data collected by running the experiments. We estimated the cost of running the press article experiment based on the average price of using a writing service from a freelancer on the website upwork.com. These results were valuable for the startup to evaluate the desirability of its business model as they showed how the startup can acquire new customers and how much it costs.
The distributed brochures did not result in any acquisitions. The follow-up interviews conducted to investigate the unresponsiveness revealed that customers have limited available time and chose not to allocate it to reading a brochure. Additionally, the interviews yielded insights into software and hardware usage, as well as willingness to pay for the product. These insights helped to evaluate how desirable certain elements of the business model were, such as the value proposition. Figure 3 summarizes key findings of the B-

Discovery and Validation of Business Models: How B2B Startups can use Business Experiments Patrick Brecht, Daniel Hendriks, Anja Stroebele, Carsten H. Hahn, & Ingmar Wolff
(https://www.topagrar.com) published the article on May 27, 2020. The press article resulted in 484 unique article reads, 91 landing-page visitors, and 25 download-button clicks. This leads to a CAR of 27.47 , which therefore meets our threshold expectations.

(4) Validation Experiments
After identifying suitable channels for the startup to acquire new users, our focus shifted from the discovery phase to the validation phase. In the following, we present two validation experiments, called brochure advertisement and validation interview.
Brochure experiment. The brochure advertisement experiment tested whether customers were willing to pay a price of 9 for the envisioned version of the WaterFox application and whether customers can be acquired via post. Again, a customer journey was set up. To test customer willingness to pay, two versions of the brochure were designed that differed in price. The control group received a brochure costing 3, while the test group received a 9 brochure. If the post-delivery channel was to be a suitable way of acquiring customers, it would be validated or refuted by running the validation interview experiment afterward.
To run this experiment, we needed the addresses of farmers to send the brochures. To solve this, we screened several websites and platforms for contacts. The sample size was 34, equally divided between test and control group. We sent the brochures to recipients at a cost of 47.73 for printing and sending. The brochure advertisement resulted in no responses from the contacted persons. Follow-up validation interviews with five contacted farmers (referred to as interview experiment in the following), revealed a possible reason for non-responses and disclosed valuable information for the business model.
Interview experiment. Besides trying to discover the cause for the non-responses to the brochures from potential customers, the goal of the interview experiment was to validate current understanding of problems and customer work involving farm irrigation management. More precisely, the experiment aimed to investigate the importance of certain tasks to target customers. Importance was defined as the frequency and effort of completing a task or enduring a burden. Additionally, our goal was to investigate the current usage of digital products for target customers. Job inquiries were formulated as statements. Each presented results which are discussed in the following.
The discovery experiments were run at various times, which led to extraneous variables not remaining constant. This was an acceptable circumstance of the discovery experiments as they were aimed at establishing proof of the channel's suitability in reaching target customers quickly. With a CAR of 27.47 , the press article experiment met our threshold expectations. A possible reason for this performance might be that farmers trust the information delivered by the newspaper top agrar, and are therefore more likely to visit the landing page and download the WaterFox application. If increased trust leads to more landing page Discovery and Validation of Business Models: How B2B Startups can use Business Experiments Patrick Brecht, Daniel Hendriks, Anja Stroebele, Carsten H. Hahn, & Ingmar Wolff SEF application in the startup heliopas.ai.  Since all interviewees were tested under all conditions, this was a within-subjects experiment (Price et al., 2017). It provided a high level of control over the extraneous variables, since participants in all control and test groups were the same. This was an advantage of the validation interview experiment's design.

Discussion ofResults and Proposed Design Principles
The novelty of the designed framework is that it was tailored to a real B2B startup. Though this in some ways might limit its generalizability, at the same time it also increases its suitability for this particular case of a startup company. By advising the process to begin with implementing a measurement system as the first step, this research stands in contrast to the business experimentation frameworks of Blank and Dorf (2012), Ries (2011), andThomke (2003). Our measurement system was adapted from the Growth Hacking framework. With heliopas.ai, it was justifiable to build a measurement system at first since the stage and the progress the founders were at with the startup was more advanced at the point of time of this research. Similar to Discovery and Validation of Business Models: How B2B Startups can use Business Experiments Patrick Brecht, Daniel Hendriks, Anja Stroebele, Carsten H. Hahn, & Ingmar Wolff visits and conversions, this article can also be used in future experiments as a reference.
The results of the brochure advertisement emphasize the importance of an existing channel to a customer. The post-delivery channel was not validated and hence the brochure experiment does not provide answers about customer willingness to pay. The validation interview showed that interviewees use different hardware and software, and the difficulty of integrating the WaterFox application into an existing customer workflow. One interviewee stated that they were annoyed by documenting information in several IT systems. Thus, it was an obstacle for helipas.ai to sell its product to other businesses and find innovators.
The result of the interview experiment reveals the benefit of a qualitative approach. This experiment yielded significant insights into customers' jobs, the suitability of the posting channel (used for the brochures), as well as customer willingness to pay. This study chose a threshold of 20 CAR for discovery experiments, a commonly used threshold to determine the success of an experiment in startup environments. However, we recommend relying not only on one metric, but also on monetary metrics, such as CAC or CLV, to assess the success or failure of an experiment.

Conclusion
This research proposed the B2B-Startup Experimentation Framework, a four-step solution for how startups can reduce uncertainty and improve their business model. The framework was tailored to the B2B startup case of heliopas.ai. The main contribution of this research lies in having applied a theoretical-based framework to extract insights into the applicability of the proposed framework. The B-SEF guides a B2B startup in how to conduct business experiments. The startup heliopas.ai gained important insights into customer segment, the value proposition, and its business model channel, and reduced uncertainty by following it. However, business experiments might not always be feasible, since requirements might not be available to properly execute them (see, channel for brochure advertisement).
A limitation of the current B-SEF framework is the focus on desirability aspects of the business model during its application. In contrast, other frameworks like the Customer Development Process (Blank & Dorf, 2012) have a process designed to explore and validate the entire business model. The focus on desirability might limit researchers' capabilities of evaluating the framework holistically, and therefore requires further research. It remains an open question whether the B-SEF is only applicable to B2B startups, which likewise leaves room for future research. We recommend applying the framework in a B2C startup to investigate its applicability in that market.
Although online advertisement is a quick and costeffective way to gain insights into the desirability of the business model, large amounts of data provided by online advertisement tools helped to conclude causality. Limitations arose when investigating why certain events or results occurred. The anonymity of persons was challenging in this case as we were not able to contact people for further questioning. For instance, the Google Display advertisement did not perform as expected and the plausible explanation was merely based on  (Brecht et al., 2019) that focused on business experiments for platform business models, the B-SEF focuses on B2B market business models. Compared to other frameworks, the B-SEF suggests concrete experiments. This positions our research uniquely among existing frameworks.
Even with restricted generalizability, these business experiments and the design principles derived can provide other startup founders with ideas about designing their own experiments (see table 4). The design principles from this research were formulated using the following structure: "to achieve X in situation Y, something like Z will help" (Berglund et al., 2018). These design principles are currently at hypotheses stage based on applying the framework and require further empirical research to confirm and validate (or refute) them statistically.
In contrast to other frameworks, this research also provided references on the performance of certain experiments, that is, practitioners can compare quantitative results of this research with their results and evaluate the findings. This comparison can be useful in practice when it is not always clear if an experiment has produced satisfactory results.
The separation of business experiments into discovery and validation experiments was proposed to equally satisfy scientific rigour as well as the entrepreneurial desire for speed and efficiency. This mindset was inspired by the Customer Development Process of Blank and Dorf (2012) and by recently published work on business experimentation by Bland and Osterwalder (2020).
As stated previously, businesses tend to be largely regulated and rational in their buying decisions. It is difficult to evaluate what impact customer willingness to adapt to a new product had, given that the customers in this case were other businesses instead of consumers. Based on experience, target customers (farmers) are only open to new products if value is delivered immediately. This makes it hard for startups to penetrate markets with an unfinished product. The limited number of customers in a B2B market such as the agriculture industry, in which the startup heliopas.ai operates, can be a challenge when conducting business experiments. Having available business contacts can be beneficial when running business experiments, which was seen in the validation Discovery and Validation of Business Models: How B2B Startups can use Business Experiments Patrick Brecht, Daniel Hendriks, Anja Stroebele, Carsten H. Hahn, & Ingmar Wolff assumptions.
Another limitation of this research is that only a narrow understanding of the causal relationship between variables could be gained. Also, the current pandemic poses another extraneous effect. The lack of seasonal workers might have influenced the amount of time farmers spent online and on social media activities. Therefore, running the same experiment at a different time of the year or in a different year might yield different results. This underlines the limitations of business experiments in general, which are usually run with a low budget and in a short period of time. Moreover, it is ambiguous if an increased budget or run-time would have led to similar results. Since the conducted discovery experiments focused on online experimentation, the framework is expectedly limited to businesses that can acquire customers and distribute their products online.
As a conclusion remark, we find that the main challenge is to design business experiments in a way that reveals underlying causality. This can be very challenging in a startup where the business and its operations are not yet defined. Furthermore, operating quickly and cost-effectively implies making trade-offs between the reliability and validity of results. Practitioners should consider a sequence of business experiments that are run to improve the company's learning effect, to better explain negative outcomes, and to use a mixed data collection approach. William Blake stated (1788) that experimentation is "the true method" of gaining insights. This also seems to hold true for business model validation in B2B startups. A systematic experimentation framework along with well-designed business experiments can reduce the need for resources such as time and money and help deal with uncertainty and risks.