Posts Tagged ‘Statistics’

Big Data, Big Science, Data Science — This triad of concepts exemplifies the new age of utilisation of data in large Volume by companies to produce information and insights for guiding their operations, such as in marketing, to perform more effectively and profitably. Yet Big Data also means that data exhibit great Variety (e.g., types and structures), and are generated and transformed in high Velocity. The data may be retrieved from internal or external sources. To be sure, non-business organisations also utilise Big Data and Data Science methods and strategies for a range of purposes (e.g., medical research, fraud detection), though our interest is focused here on marketing, inter-linked with sales and customer service, as well as retailing.

It is not quite easy to separate or draw the line between the concepts above because they are strongly connected and cover similar ideas. Big Data may seem to emphasise the properties of the data but it is tied-in with specialised technologies and techniques needed to store, process and analyse it. Likewise, Data Science (and Big Science) may imply greater emphasis on research strategies, scientific thinking, and analytic methods, but they are directed towards handling large and complex pools of data, namely Big Data. Nonetheless, we may distinguish Data Science by reference to occupation or position: Professionals recognized as “data scientists” are identified as distinct from all other business analysts or data analysts in this field – data scientists are considered the superior analysts, experts, and mostly, the strategists who also connect between the analytic domain and the business domain.

The Trend Lab VINT (Vision – Inspiration – Navigation – Trends), part of Sogeti network of experts (Netherlands), published an instructive e-book on Big Data. In the e-book titled “No More Secrets With Big Data Analytics” (2013), the team of researchers propose a logical linkage between these concepts while relating them to Big Business. Big Science was conceived already in the early 1960s (attributed to atomic scientist Alvin Weinberg) to describe the predicted rise of large-scale scientific projects. It was not associated necessarily with amount of data (typical contexts have been physics and life sciences). Big Data as a concept emerged nearly ten years ago and turned the spotlight on data. Data Science is introduced by VINT as the toolbox of strategies and methods that allows Big Data to bring us from Big Science to Big Business. Data Science is “the art of transforming existing data to new insights by means of which an organizsation can or will take action” (p. 33). Originally, Big Science emphasised a requirement of scientific projects that is true today with regard to Big Data projects: collaboration between researchers with different areas of expertise to successfully accomplish the research task.

  • The researchers of VINT note that some scientists disapprove of connotations of the word “big” and prefer to use instead the term “extreme” which is in accordance with statistical theory.

The VINT e-book cites a profile for the position of data scientist suggested by Chirag Metha (a former technology, design and innovation strategist at SAP). In the headline Metha stated that the role of a data scientist is not to replace any existing BI people but to complement them (p. 34; BI=Business Intelligence). He defined requirements from a data scientist in four areas: (a) deep understanding of data, their sources and patterns; (b) theoretical and practical knowledge of advanced statistical algorithms and machine learning; (c) strategically connecting business challenges with appropriate data-driven solutions; and (d) devise an enterprise-wide data strategy that will accommodate patterns and events in the environment and foresee future data needs of the organisation. Therefore, primary higher-level contributions expected from a data scientist include the capacity to bridge between the domains of business and data/analytics (i.e., translate business needs to analytic models and solutions and back to [marketing] action plans), and an overview of data sources and types of data, structured and unstructured, and how to combine them properly and productively.

The pressure on companies to implement data-driven marketing programmes is growing all the time. As one company becomes publicly commended for successfully using, for instance, feedback on its website and in social media to create better-tailored product offerings, it gains an advantage that puts its competitors under pressure to follow suit. It may also inspire and incentivize companies in other industries to take similar measures. Such published examples are increasing in number in recent years. Furthermore, companies are encouraged to apply individual-level data of customer interactions with them (e.g., personal information submitted online, stated preferences and tracking page visits and item choices made on viewed pages) in order to devise customized product offerings or recommendations for each customer. Already in the late 1990s the grocery retailer Tesco leveraged its business in the UK and gained a leading position by utilising the purchase and personal data of customers gathered through their loyalty Clubcard to generate offerings of greater relevance to specific customer segments they identified. Amazon developed its e-commerce business by recommending to individual customers books related to those they view or purchase based on similar books purchased by other customers and on customers’ own history of behaviour.

A key challenge facing many companies is to implement an integrative approach that enforces a single view of the customer across organisational functions and channels. Thus, marketing programmes and operations must be coordinated and share data with sales and customer service activities. Moreover, data of interactions with customers, and consumers overall (as prospects), need to be examined and incorporated across multiple channels — offline, online, and mobile. This is a mission of utmost importance for companies these days; ignoring or lagging behind on this mission could mean losing ground in a market and relevance to customers. This is because customers’ experience extends over different stages of a journey in their relationship with a company and across multiple alternative channels or touchpoints they may use to fulfill their objectives. They expect that data that become available to companies be employed to improve in some way their customer experience anywhere and anytime they interact with the company. For companies, it definitely requires that they not only gather but also analyse the data in meaningful and productive ways. Whether the interactions occur in-store, over the phone, on a company’s website, in social media networks, or through mobile apps, customers consequently expect those interactions in and between channels to be smooth and frictionless. As for companies, they need to be able to share and join data from the different channels to obtain a comprehensive view of customers and co-ordinate between channels.

  • The American leading pharmacy retailer Walgreens established a platform for monitoring, analysing and managing its inventory jointly across all of its outlets, over 8,000 physical stores and four online stores, so as to allow shoppers to find, purchase and collect products they need in as a seamless manner as possible. They integrate point-of-sale data for customers with data from additional sources (e.g., social media, third-party healthcare organisations) in order to improve patient care.
  • Procter & Gamble, which does not have direct access to sales data as retailers, created an independent channel of communication with consumers; with the help of Teradata, they use personal data provided by consumers online and other data (e.g., social media) to put forward more personalised product offerings for them.

An additional important aspect is the need to join different types of data, both structured (e.g., from relational customer databases) and unstructured (e.g., open-end text in blog posts and social media posts and discussions). Data that companies may utilise become ever more heterogeneous in type, structure and form, posing greater technical and analytical challenges to companies, but also offering better opportunities. Companies may also consider using digital images, voice tracks (i.e., not only for verbal content but also tone and pitch), and all sorts of traffic data (e.g., electronic, digital-online and mobile, and even human-physical traffic in-store). For example, suppose that a company identifies photo images posted by its customers online and recognizes that the images include objects of product items; it then may complement that information with personal data of those customers and various interactions or activities they perform online (e.g., company’s websites, social media) to learn more about their interests, perceptions, and preferences as reflected through images.

  • The US airliner JetBlue uses the Net Promoter Score (NPS) metric to trace suspected problems of customer satisfaction, and then utilise survey data and content from social media networks, blogs and other consumer-passenger communications to identify the possible source and nature of a problem and devise an appropriate fix (an award-winning initiative by Peppers & Rogers).

But there is reason for some concern. In a report titled “Big Data: The Next Frontier for Innovation, Competition, and Productivity” (2011), McKinsey & Co. Consulting Group cautioned of an expected shortage in highly advanced analytic professionals and data-proficient managers. They estimated that by 2018  organisations in the US alone could face a shortage of 140,000 to 190,000 people with “deep analytical skills”. Nonetheless, the report also predicts a shortage of 1.5 million managers and analysts “with the know-how to use the analysis” of Big Data and its effective application for decision-making.  The first part seems to refer to the professional-technical level whereas the second part points to utilisation of Big Data at the business level. Thus, McKinsey & Co. appear to be even more concerned by inadequate ability of companies at a managerial level to benefit from the business advantages, such as with marketing-related objectives, that Big Data can produce. Data Scientists may be counted in both categories of this forecast, but because they need to be simultaneously expert analysts and business-savvy they could belong more closely with managers.

However, the situation may not improve as quickly as sought. The problem may be that young people are not attracted, not encouraged, and are not educated and trained enough to obtain high proficiency and skills in the exact sciences of mathematics and statistics, at least not at a growing pace that the industry may require. This problem seems to be imminent particularly in Western countries. Popular areas of studies such as technology, computer sciences and business administration can not compensate for lack of sound knowledge and skills in mathematics and statistics as far as utilisation of Big Data in marketing in particular and management in general is concerned. Yet business students, MBAs included, are more inclined to stay away rather than embark on their courses and tasks in statistics and data analysis; and the number of graduates in exact sciences is not increasing fast enough (in some cases even decreasing).  Here are some figures indicative of the problem that may help to illuminate it:

  • In the latest PISA exams carried out by the OECD in 2012 for school students aged 15-16, seven out the ten top ranking countries (or economies) in math are from the Far East, including Shanghai and Hong-Kong of China, Singapore, Republic of Korea, and Japan. Three European countries close the top list: Switzerland, adjacent Lichtenstein, and the Netherlands. Their scores are above the mean OECD score (494), ranging between 523 and 613.
  • Western countries are nevertheless among the next ten countries that still obtain a score in math above the OECD mean score, including Germany, Finland, Canada, Australia, Belgium and Ireland. But the United Kingdom is in 26th place (score 494) and the United States is even lower, in the 36th place (481). Israel is positioned a bit further down the list (at 41st, score 466). [34 OECD members and 31 partner countries participated].

  • In Israel, the rate of high school students taking their matriculation exam in math at an enhanced level (4 or 5 units) has changed negatively in recent years. It ranged in the years 1998-2006 from 52% and up to 57% but since 2009 and until 2012 it dropped dramatically to 46% of those eligible to a matriculation certificate, according to a press release of the Israeli Central Bureau of Statistics (CBS). It is noted by CBS that this decrease occurs in parallel with an increase in the total number of students who obtain the certificate, but this suggests that effort was not made to train and prepare the additional students to a high level in mathematics.

  • In statistics published by UNESCO on the proportion of academic  graduates (ISCED levels 5 or 6 — equivalents of bachelor to PhD) in Science fields of study, we find that this proportion decreased from 2001 to 2012 in countries like Australia (14.2% to 9%), Switzerland (11.5% to 9%), Republic of Korea (9.8% to 8.5%), UK (17.4% to 13.7%), and Israel (11.7% to 8.5% in 2011).
  • This rate is stable in the US (8.6%) and Japan (though low at 2.9%), while in Finland it has been relatively stable (10%-11%) but shifting down lately. Nice rises are observed in Poland (5% to 8%), Germany (13% to 14.5%), and the Netherlands (5.7% to 6.5%); Italy is also improving (up from 7.5% to 8%). [Levels of ISCED scheme of 1997; a new scheme enters this year].

The notion received is that supply of math and science-oriented graduates may not get closer to meet market demand by companies and non-business organisations in the coming years; it could even get worse in some countries. Companies can expect to encounter continued difficulties to recruit well-qualified analysts with potential to become high-qualified data scientists, and managers with good data and analytics proficiency. Managers and data scientists may have to work harder to train analysts to a satisfying level. They may need to consider recruiting analysts from different areas of specialisation (e.g., computer programming, math and statistics, marketing), each with a partial skill set in one or two areas, continue to train them in complementary areas, and foremost oversee the work and performance of mixed-qualification teams of analysts.

Big Data and Data Science offer a range of new possibilities and business opportunities to companies for better meeting consumer needs and providing better customer experiences, functionally and emotionally. They are set to change the way marketing, customer service, and retailing are managed and executed. However, reaching the higher level of marketing effectiveness and profitability will continue to command large investments, not only in technology but also in human capital. This will be a challenge for qualified managers and data scientists to work together in the future to harvest the promised potential of Big Data.


Ron Ventura, Ph.D. (Marketing)


Read Full Post »

Dr. Diederik A. Stapel was a respected psychology scientist at Tilburg University in the Netherlands, specialising in social psychology. Until lately he held the position of dean of social and behavioural sciences at the university. Among his areas of research, he studied interactions between self-image and the perceptions people hold of stimuli in the world surrounding them (e.g., other people, objects, advertisements). Or so he wanted us to believe. He was recently expelled from his university because it was revealled this month by an enquiry committee that over more than ten years he had manipulated and fabricated data in his studies, putting into doubt at least 30 published papers. Although it remains unclear whether he falsified data for his doctoral thesis at Amsterdam University (the data have been destroyed), he voluntarily returned his title to the university following the interim report and his own admission of committing fraud and deception in his research work.

What a shame for Stapel, what an embarrassment for a whole field of theory and research in psychology. But foremost Stapel betrayed colleagues who researched and published papers with him as well as the doctoral students he supposedly “guided” towards their degrees. Imagine how it feels to find out that research work you have done and published with a colleague you trusted could be tainted by his tempering with data and become unworthy of citation. It should also be uncomfortable for researchers who cited those papers in their own work, planned experiments based on his findings, or relied on these findings to develop theory and support their own arguments. The dismay for fresh doctorates could be even more disturbing. When a researcher deceives in the way Stapel has done he destroys not only his or her career but adversely affects the life and work of many others around him.

Some of the work in which Stapel was involved concern themes in consumer behaviour, marketing and advertising. Thus the deception carried out by Stapel is also suspect of inflicting on those fields. His work alone and with colleages directly investigated issues associated with consumer judgement and thinking; Assimilation-Contrast effects and information processing; advertising and self-image; beauty in advertising; product comparison and evaluation, and more.

Investigation into Stapel’s fraud so far suggests that already in early stages of his career he manipulated data he collected so as to “adapt” the data to hypotheses he wanted to support. Thereafter he has become more brazen and effectively fabricated data for his experiments altogether. Stapel may have been dishonest but certainly not stupid or incompetent. It does require numeric skills and good understanding of data in order to simulate datasets that seem as though they were innocently and genuinely collected. Yet, he may have not been so skilled or confident in his ability to conceal the fabrication as he reportedly almost always refused to submit raw data to the inspection of colleagues and graduate students. There is no guarantee that even if other researchers analysed data Stapel did provide they would succeed in uncovering anomalies or regularities indicative of fraud. Still it is baffling how this charade could have lasted for so many years: Have colleagues with whom Stapel co-operated not been part in the process of collecting, processing and analysing the data? Were they not in position to raise questions about the findings he brought in? Nonetheless, where criticism and doubts did come up towards him, Stapel allegedly made different excuses for his refusal, often arrogantly, and thus was able to fend them off .

Researchers are not immune from faults in human judgement such as using evidence selectively to support their prior beliefs or hypotheses. The blame is likely to be in the increasingly competitive culture of the academic system. In a response to International Herald Tribune (New-York Times), psychologist Jonathan Schooler (University of California) suggested that the problem is in a culture in academia that allows researchers to “spin their work” in a way that portrays a nicer picture of their findings than it really is. In an honest perspective contributed by  psychology professor Joseph P. Simmons (Wharton School) he conjures: “We know the general tendency of humans to draw the conclusions they want to draw… With findings we want to see, we ask, Can I believe this? With those we don’t, we ask: Must I believe this?” (IHT, 3 Nov. 2011).

The IHT tells of critics of current practices in psychological research, particularly the practice of keeping data in near secrecy, who claim that there should be greater demand from researchers to share data with other researchers in the field, allowing them to analyse the data and come to their own conclusions. I doubt that this will fix methodological malpractices in this field and may instead increase animosity among researchers. Researchers can and should raise questions and criticism about design and analytical methods their colleagues apply and the conclusions they draw; that is part of academic life. But if researchers do not trust the authenticity of data collected, then confidence in relying on published papers and books might be seriously damaged. Vigilance is always advised but is should be confined to circles within the academic institute and its departments and among colleagues working together. In the competitive climate that prevails in academia, perpetuated by the race for publishing papers (“publish or perish”), I do not think it is fair to blame researchers who are reluctant to submit datasets willingly to colleagues they are not truly familiar with. No one really wants to help others who might publish a paper that contradicts their own work. It is important, however, that researchers working together in a group or a research programme, not necessarily in the same research institute, exchange data and let more than one person analyse the data in parallel.

A particularly disturbing finding of the enquiry committee claims that Stapel may have jeopardised the theses of his doctoral students. It is assessed that between 12 and 14 of the 20 theses he supervised could be flawed because they relied on fabricated data he provided his students for analysis. Stapel gained the dubious title of “lord of the data” because of his insistence not to disclose how and where exactly the data were obtained and not to provide raw data (which raises the question, what kind of data did he provide students for analysis). Even if any of his research students suspected something was wrong with data they received, one would need to have clear evidence and the confidence to make allegations of fraud against one’s supervisor; it can put a doctoral student in a very difficult dilemma. Making things worse, Stapel used to threaten and insult students who dared to raise doubts about the data — one such student talking as a witness to the committee, its report says, was accused by Stapel of putting into doubt the latter’s well-established reputation as a top researcher. The student also complained of receiving hints from Stapel that the student’s aspirations as a young researcher could be compromised (Dutch News.nl).

Stapel apparently intimidated also his colleagues who questioned his methods and his superior investigative talents. But eventually he came across three young researchers who either were smarter than he was or were fed up with his behaviour, probably both. They investigated one of his datasets and exposed his tactics to the head of Tilburg University (Dutch News.nl). As Stapel continued to get away with his deception, he became more bold, and possibly too complacent at the end. Was he alert to the risks involved in his conduct or did he overlook them? Did he get tired of hiding his tracks? And did he expect or even want to get caught at some point in time? Stapel will have much time now to analyse his behaviour by introspection. And perhaps we will be able to find answers to such questions in a book that Staple will write himself.

Ron Ventura, Ph.D., (Marketing)


“New scandal is latest to taint psychology field”, Benedict Carey, The International Herald Tribune, 3 Nov., 2011, p.1+4.

“Five questions about: The Diederik Stapel affair”, DutchNews.nl, 2 Nov. 2011,


“Diederik Stapel: The lying Dutchman”, The Washington Post Online (Opinion), The Achenblog by Joel Achenbach, 1 Nov. 2011,


An entry on Diederik Stapel in Wikipedia (with a link to the original report of the enquiry committee in Dutch).

Read Full Post »