College of Computer Studies
(632) 524-4611 Ext. 302
The College of Computer Studies
Advanced Research Institute for Computing (AdRIC)
Center for Empathic Human-Computer Interactions (CEHCI)
and Center for Language Technologies (CELT)
De La Salle University
cordially invite you to a lecture entitled
“Too many results! Focusing on Strong, Casual Relations in the Data”
Joseph E. Beck, Ph.D.
Friday, June 1, 2012
at the Andrew Gonzalez Hall – Room 1103
09:30 AM to 12:00 NN
(For inquiries and reservation, please contact Mr. Ervin Jay Samosa email: firstname.lastname@example.org)
Advances in storage and networking have led to an explosion of data available for analysis. This trend has had many positive impacts, and has greatly extended the scope and quality of analyses performed from data collected by intelligent tutoring systems. As we collect more types of data, researchers are capable of testing many more hypotheses than they were previously capable of. In addition, an increase in sample sizes results in greater statistical power, enabling greater sensitivity to detect small effects in the data. Although these advances have brought great benefits, there is also a definite cost in terms of an increase in the number of analyses that are reportable due to statistical “significance,” but are of marginal utility and may even be false. The reason for concern is due to arithmetic. First, as the number of variables collected grows, the number of testable relationships increases as variables, since each new variable can be tested against all of the existing variables in the database. Second, the ability to detect statistically effects increases according to sqrt(rows). This two effects are additive, and result in a vast increase in the number of significant relationships one can discover from the collected data. The problem arises when one considers the number of useful relationships in the data. Many, many variables will correlate with each other just due to random chance, or due to being associated merely by sharing a common cause. Discovering all of these chance associations is not exciting from a research standpoint, but by community standards, such results would be publishable, and it is not always immediately obvious from statistical hypothesis testing which results are of interest and which are not. Simply put, we do not want to be in a community where researchers are reporting every effect they discover that has a small p-value. This talk we discuss two better methods for finding relationships in the data that are of broader use to the community: those that are of high magnitude, and those that are causal rather than merely relational.
Profile of Speaker:
Dr. Joseph E. Beck received his PhD in Computer Science from the University of Massachusetts Amherst in May 2001. He has worked with the Computer Science Department of Worcester Polytechnic Institute since September 2007; first as a Research Scientist, then as Assistant Professor since July 2009. His areas of specialization include intelligent tutoring systems, educational data mining, and artificial intelligence.
Posted on: 05/25/2012