Information Processing

Pessimism of the Intellect, Optimism of the Will     Archive   Favorite posts   Twitter: @steve_hsu

Thursday, September 03, 2015

Don’t Worry, Smart Machines Will Take Us With Them: Why human intelligence and AI will co-evolve.


I hope you enjoy my essay in the new issue of the science magazine Nautilus (theme: the year 2050), which discusses the co-evolution of humans and machines as we advance in both AI and genetic technologies. My Nautilus article from 2014: Super-Intelligent Humans Are Coming.
Nautilus: ... AI can be thought of as a search problem over an effectively infinite, high-dimensional landscape of possible programs. Nature solved this search problem by brute force, effectively performing a huge computation involving trillions of evolving agents of varying information processing capability in a complex environment (the Earth). It took billions of years to go from the first tiny DNA replicators to Homo Sapiens. What evolution accomplished required tremendous resources. While silicon-based technologies are increasingly capable of simulating a mammalian or even human brain, we have little idea of how to find the tiny subset of all possible programs running on this hardware that would exhibit intelligent behavior.

But there is hope. By 2050, there will be another rapidly evolving and advancing intelligence besides that of machines: our own. The cost to sequence a human genome has fallen below $1,000, and powerful methods have been developed to unravel the genetic architecture of complex traits such as human cognitive ability. Technologies already exist which allow genomic selection of embryos during in vitro fertilization—an embryo’s DNA can be sequenced from a single extracted cell. Recent advances such as CRISPR allow highly targeted editing of genomes, and will eventually find their uses in human reproduction.

... These two threads—smarter people and smarter machines—will inevitably intersect. Just as machines will be much smarter in 2050, we can expect that the humans who design, build, and program them will also be smarter. Naively, one would expect the rate of advance of machine intelligence to outstrip that of biological intelligence. Tinkering with a machine seems easier than modifying a living species, one generation at a time. But advances in genomics—both in our ability to relate complex traits to the underlying genetic codes, and the ability to make direct edits to genomes—will allow rapid advances in biologically-based cognition. Also, once machines reach human levels of intelligence, our ability to tinker starts to be limited by ethical considerations. Rebooting an operating system is one thing, but what about a sentient being with memories and a sense of free will?

... AI research also pushes even very bright humans to their limits. The frontier machine intelligence architecture of the moment uses deep neural nets: multilayered networks of simulated neurons inspired by their biological counterparts. Silicon brains of this kind, running on huge clusters of GPUs (graphical processor units made cheap by research and development and economies of scale in the video game industry), have recently surpassed human performance on a number of narrowly defined tasks, such as image or character recognition. We are learning how to tune deep neural nets using large samples of training data, but the resulting structures are mysterious to us. The theoretical basis for this work is still primitive, and it remains largely an empirical black art. The neural networks researcher and physicist Michael Nielsen puts it this way:
... in neural networks there are large numbers of parameters and hyper-parameters, and extremely complex interactions between them. In such extraordinarily complex systems it’s exceedingly difficult to establish reliable general statements. Understanding neural networks in their full generality is a problem that, like quantum foundations, tests the limits of the human mind.
... It may seem incredible, or even disturbing, to predict that ordinary humans will lose touch with the most consequential developments on planet Earth, developments that determine the ultimate fate of our civilization and species. Yet consider the early 20th-century development of quantum mechanics. The first physicists studying quantum mechanics in Berlin—men like Albert Einstein and Max Planck—worried that human minds might not be capable of understanding the physics of the atomic realm. Today, no more than a fraction of a percent of the population has a good understanding of quantum physics, although it underlies many of our most important technologies: Some have estimated that 10-30 percent of modern gross domestic product is based on quantum mechanics. In the same way, ordinary humans of the future will come to accept machine intelligence as everyday technological magic, like the flat screen TV or smartphone, but with no deeper understanding of how it is possible.

New gods will arise, as mysterious and familiar as the old.

Leadership


I was asked recently to write something about my leadership style / management philosophy. As a startup CEO I led a team of ~35, and now my office has something like 350 FTEs. Eventually, hands on leadership becomes impossible and one needs general principles that can be broadly conveyed.
I have a “no drama” leadership style. We try to be as rational and unbiased as possible in making decisions, always working in the long term interests of the institution and to advance human knowledge. I ask that everyone on my team try to understand all sides of a difficult issue to the point that they can, if asked, effectively argue other perspectives. This exercise helps overcome cognitive biases. My unit tries to be entirely “transparent” -- we want other players at the university to understand the rationale and evidence behind our specific decisions. We want our resource allocations to be predictable, justifiable, and as free from petty politics as possible. Other units view members of my team as effective professionals who can be relied on to do the right thing.
One of the toughest aspects of my current job is the wide variety of things I have to look at -- technologies and research projects across the spectrum from biomedical to engineering to fundamental physics to social science and the humanities. Total NSF + DOE funding at MSU ranks in the top 10 (very close to top 5) among US universities.

The most important principle I advance to my senior staff is epistemic caution together with pragmatism.

See also this interview (startups) and Dale Carnegie: How to Win Friends and Influence People :-)

Monday, August 31, 2015

No genomic dark matter

Let me put it very simply: there is NO genomic "dark matter" or "missing heritability" -- it's merely a matter of sample size (statistical power) to identify the specific variants that account for the total expected heritability. The paper below (see also HaploSNPs and missing heritability) suggests that essentially all of the expected heritability can be accounted for once rare (MAF < 0.01) and common SNPs are taken into account. I suspect the small remaining gap in heritability is accounted for by nonlinear effects.

We don't yet know which specific variants are responsible for, e.g., population variation in height, but we expect that they can be found given sufficient statistical power. See Genetic architecture and predictive modeling of quantitative traits.
Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index

Nature Genetics (2015) doi:10.1038/ng.3390

We propose a method (GREML-LDMS) to estimate heritability for human complex traits in unrelated individuals using whole-genome sequencing data. We demonstrate using simulations based on whole-genome sequencing data that ~97% and ~68% of variation at common and rare variants, respectively, can be captured by imputation. Using the GREML-LDMS method, we estimate from 44,126 unrelated individuals that all ~17 million imputed variants explain 56% (standard error (s.e.) = 2.3%) of variance for height and 27% (s.e. = 2.5%) of variance for body mass index (BMI), and we find evidence that height- and BMI-associated variants have been under natural selection. Considering the imperfect tagging of imputation and potential overestimation of heritability from previous family-based studies, heritability is likely to be 60–70% for height and 30–40% for BMI. Therefore, the missing heritability is small for both traits. For further discovery of genes associated with complex traits, a study design with SNP arrays followed by imputation is more cost-effective than whole-genome sequencing at current prices.
From the paper (click for larger image):



... Under a model of neutral evolution, most variants segregating in the population are rare, whereas most genetic variation underlying traits is due to common variants18. The neutral evolutionary model predicts that the cumulative contribution of variants with MAF ≤θ to the total genetic variance is linearly proportional to θ, where θ is a MAF threshold. However, our observed results for height strongly deviated from this model (Fig. 4a), suggesting that height-associated variants have been under natural selection. Such deviation would be even stronger with whole-genome sequencing data because variation at rare sequence variants is less well captured by 1000 Genomes Project imputation than that at common variants (Fig. 3 and Supplementary Fig. 4). ... Equivalently, the neutral evolutionary model also predicts that variance explained is uniformly distributed as a function of MAF18, such that the variance explained by variants with MAF ≤0.1 equals that of variants with MAF >0.4. However, we observed that, although the variance explained per variant (defined as , with m being the number of variants) for rare variants was much smaller than that for common variants for both height and BMI (Supplementary Fig. 8), the variants with MAF ≤0.1 in total explained a significantly larger proportion of variance than those with MAF >0.4 (21.0% versus 8.8%, Pdifference = 9.2 × 10−7) for height (Fig. 4b and Supplementary Table 3), consistent with height-associated variants being under selection.

... Theoretical studies on variation in complex traits based on models of natural selection suggest that rare variants only explain a substantial amount of variance under strong assumptions about the relationship between effect size and selection strength19, 20, 21. We performed genome-wide association analyses for height and BMI in the combined data set (Online Methods) and found that the minor alleles of variants with lower MAF tended to have stronger and negative effects on height and stronger but positive effects on BMI (Fig. 4c). The correlation between minor allele effect and MAF was highly significant for both height (P < 1.0 × 10−6) and BMI (P = 8.0 × 10−5) and was even stronger for both traits in the data from the latest GIANT Consortium meta-analyses5, 22 (Fig. 4d); these correlations were not driven by population stratification (Supplementary Fig. 10). All these results suggest that height- and BMI-associated variants have been under selection. These results are consistent with the hypothesis that new mutations that decrease height or increase obesity tend to be deleterious to fitness and are hence kept at low frequencies in the population by purifying selection.
See also Deleterious variants affecting traits that have been under selection are rare and of small effect -- the results above support my conjecture from several years ago.

Sunday, August 30, 2015

Jiujitsu renaissance

John Danaher discusses his coaching philosophy. Danaher trained UFC champions Georges St. Pierre and Chris Weidman, among others.




Danaher student Garry Tonon on wrestling and jiujitsu. He's shown rolling with AJ Agazarm, a former All-Big10 wrestler and no-gi BJJ world champ.




These fights are from a no-time-limit submission tournament a few years ago, featuring the top brown belts in the world. Some of the matches lasted over an hour, others ended after only 5 or 10 minutes.  I like this style of competition much more than fighting for points.



Thursday, August 27, 2015

Trump on carried interest and hedge funds: "They didn't build this country."

Say what you want about Trump, he's one of the only candidates who isn't beholden to oligarch campaign contributors. Below he goes after the crazy tax break that hedge fund managers enjoy.
Bloomberg: ... “I know a lot of bad people in this country that are making a hell of a lot of money and not paying taxes,” Trump said in an interview with Time, in apparent reference to hedge fund and private equity fund managers. “The tax law is totally screwed up.”

"They're paying nothing and it's ridiculous," he added on CBS a few days later. “The hedge fund guys didn't build this country. These are guys that shift paper around and they get lucky." He went on: “They’re energetic, they’re very smart. But a lot of them, it’s like they’re paper pushers. They make a fortune, they pay no tax... The hedge funds guys are getting away with murder.”

Trump was apparently referring to carried interest. Most hedge funds and private equity funds are structured as partnerships where the fund managers serve as general partners and the investors as limited partners. Carried interest represents the fund managers’ share of the income generated by the fund, which is typically 20 percent of the fund’s profits at the end of the year. For most funds, this share of the profits, called an “incentive fee,” makes up most of the fund managers’ income, and, depending on the size and performance of the fund, it can stretch into the hundreds of millions of dollars. It’s largely what pays for 40,000 square foot mansions in Greenwich, Conn., and major league baseball teams and $100 million works of art. Under current tax rules, much of that incentive fee income is taxed at the long-term capital gains rate of 20 percent. If it was taxed as ordinary income, the top rate would be 39.6 percent. For hedge fund managers, the carried interest tax provision is something of a third rail, the one thing that unites them in furious opposition.

Monday, August 24, 2015

Man and Superman

These are some of my favorite panels from Frank Miller's graphic novel The Dark Knight Returns (1986). See also I Love Jack Kirby. Click for larger versions.



Education and Achievement Gaps

This recent talk by Harvard economist and education researcher Roland Fryer reviews studies of student incentives, charter schools, best educational practices, and their effects on achievement gaps.  Audio  Slides (the features in the image below are not clickable).


A very recent preprint on a study of parental incentives:
Parental Incentives and Early Childhood Achievement: A Field Experiment in Chicago Heights

Roland G. Fryer, Jr.
Harvard University and NBER

Steven D. Levitt
University of Chicago and NBER

John A. List
University of Chicago and NBER

August 2015

Abstract
This article describes a randomized field experiment in which parents were provided financial incentives to engage in behaviors designed to increase early childhood cognitive and executive function skills through a parent academy. Parents were rewarded for attendance at early childhood sessions, completing homework assignments with their children, and for their child’s demonstration of mastery on interim assessments. This intervention had large and statistically significant positive impacts on both cognitive and non-cognitive test scores of Hispanics and Whites, but no impact on Blacks. These differential outcomes across races are not attributable to differences in observable characteristics (e.g. family size, mother’s age, mother’s education) or to the intensity of engagement with the program. Children with above median (pre-treatment) non cognitive scores accrue the most benefits from treatment.

Saturday, August 22, 2015

Now go train jiujitsu: choked out terrorist edition


Spencer Stone (left) is a blue belt at Gracie Lisboa. He choked out the terrorist gunman on the Amsterdam-Paris train yesterday.
NYTimes: ... Alek Skarlatos, a specialist in the National Guard from Oregon vacationing in Europe with a friend in the Air Force, Airman First Class Spencer Stone and another American, Anthony Sadler, looked up and saw the gunman. Mr. Skarlatos, who was returning from a deployment in Afghanistan, looked over at the powerfully built Mr. Stone, a martial arts enthusiast. “Let’s go, go!” he shouted.

... In the train carriage, Mr. Stone was the first to act, jumping up at the command of Mr. Skarlatos. He sprinted through the carriage toward the gunman, running “a good 10 meters to get to the guy,” Mr. Skarlatos said. Mr. Stone was unarmed; his target was visibly bristling with weapons.

With Mr. Skarlatos close behind, Mr. Stone grabbed the gunman’s neck, stunning him. But the gunman fought back furiously, slashing with his blade, slicing Mr. Stone in the neck and hand and nearly severing his thumb. Mr. Stone did not let go.

The gunman “pulled out a cutter, started cutting Spencer,” Mr. Norman, the British consultant, told television interviewers. “He cut Spencer behind the neck. He nearly cut his thumb off.”

Mr. Skarlatos grabbed the gunman’s Luger pistol and threw it to the side. Incongruously, the gunman yelled at the men to return it, even as Mr. Stone was choking him. A train conductor rushed up and grabbed the gunman’s left arm, Mr. Norman recalled.

... Mr. Stone, wounded and bleeding, kept the suspect in a chokehold. “Spencer Stone is a very strong guy,” Mr. Norman said. The suspect passed out.

Wednesday, August 19, 2015

Lackeys of the plutocracy?


This essay is an entertaining read, if somewhat wrong headed. See here for an earlier post that discusses Steve Pinker's response to Deresiewicz's earlier article Don’t Send Your Kid to the Ivy League.
The Neoliberal Arts (Harpers): ... Now that the customer-service mentality has conquered academia, colleges are falling all over themselves to give their students what they think they think they want. Which means that administrators are trying to retrofit an institution that was designed to teach analytic skills — and, not incidentally, to provide young people with an opportunity to reflect on the big questions — for an age that wants a very different set of abilities. That is how the president of a top liberal-arts college can end up telling me that he’s not interested in teaching students to make arguments but is interested in leadership. That is why, around the country, even as they cut departments, starve traditional fields, freeze professorial salaries, and turn their classrooms over to adjuncts, colleges and universities are establishing centers and offices and institutes, and hiring coordinators and deanlets, and launching initiatives, and creating courses and programs, for the inculcation of leadership, the promotion of service, and the fostering of creativity. Like their students, they are busy constructing a parallel college. What will happen to the old one now is anybody’s guess.

So what’s so bad about leadership, service, and creativity? What’s bad about them is that, as they’re understood on campus and beyond, they are all encased in neoliberal assumptions. Neoliberalism, which dovetails perfectly with meritocracy, has generated a caste system: “winners and losers,” “makers and takers,” “the best and the brightest,” the whole gospel of Ayn Rand and her Übermenschen. That’s what “leadership” is finally about. There are leaders, and then there is everyone else: the led, presumably — the followers, the little people. Leaders get things done; leaders take command. When colleges promise to make their students leaders, they’re telling them they’re going to be in charge. ...

We have always been, in the United States, what Lionel Trilling called a business civilization. But we have also always had a range of counterbalancing institutions, countercultural institutions, to advance a different set of values: the churches, the arts, the democratic tradition itself. When the pendulum has swung too far in one direction (and it’s always the same direction), new institutions or movements have emerged, or old ones have renewed their mission. Education in general, and higher education in particular, has always been one of those institutions. But now the market has become so powerful that it’s swallowing the very things that are supposed to keep it in check. Artists are becoming “creatives.” Journalism has become “the media.” Government is bought and paid for. The prosperity gospel has arisen as one of the most prominent movements in American Christianity. And colleges and universities are acting like businesses, and in the service of businesses.

What is to be done? Those very same WASP aristocrats — enough of them, at least, including several presidents of Harvard and Yale — when facing the failure of their own class in the form of the Great Depression, succeeded in superseding themselves and creating a new system, the meritocracy we live with now. But I’m not sure we possess the moral resources to do the same. The WASPs had been taught that leadership meant putting the collective good ahead of your own. But meritocracy means looking out for number one, and neoliberalism doesn’t believe in the collective. As Margaret Thatcher famously said about society, “There’s no such thing. There are individual men and women, and there are families.” As for elite university presidents, they are little more these days than lackeys of the plutocracy, with all the moral stature of the butler in a country house.

Neoliberalism disarms us in another sense as well. For all its rhetoric of freedom and individual initiative, the culture of the market is exceptionally good at inculcating a sense of helplessness. So much of the language around college today, and so much of the negative response to my suggestion that students ought to worry less about pursuing wealth and more about constructing a sense of purpose for themselves, presumes that young people are the passive objects of economic forces. That they have no agency, no options. That they have to do what the market tells them. A Princeton student literally made this argument to me: If the market is incentivizing me to go to Wall Street, he said, then who am I to argue?

I have also had the pleasure, over the past year, of hearing from a lot of people who are pushing back against the dictates of neoliberal education: starting high schools, starting colleges, creating alternatives to high school and college, making documentaries, launching nonprofits, parenting in different ways, conducting their lives in different ways. I welcome these efforts, but none of them address the fundamental problem, which is that we no longer believe in public solutions. We only believe in market solutions, or at least private-sector solutions: one-at-a-time solutions, individual solutions.

The worst thing about “leadership,” the notion that society should be run by highly trained elites, is that it has usurped the place of “citizenship,” the notion that society should be run by everyone together. Not coincidentally, citizenship — the creation of an informed populace for the sake of maintaining a free society, a self-governing society — was long the guiding principle of education in the United States. ...

Crossfit Games 2015

Some great highlights.

Friday, August 14, 2015

Pinker on bioethics

Progress in biomedical research is slow enough. It does not need to be slowed down even further.
Boston Globe: A POWERFUL NEW technique for editing genomes, CRISPR-Cas9, is the latest in a series of advances in biotechnology that have raised concerns about the ethics of biomedical research and inspired calls for moratoria and new regulations. Indeed, biotechnology has moral implications that are nothing short of stupendous. But they are not the ones that worry the worriers.

... A truly ethical bioethics should not bog down research in red tape, moratoria, or threats of prosecution based on nebulous but sweeping principles such as “dignity,” “sacredness,” or “social justice.” Nor should it thwart research that has likely benefits now or in the near future by sowing panic about speculative harms in the distant future. These include perverse analogies with nuclear weapons and Nazi atrocities, science-fiction dystopias like “Brave New World’’ and “Gattaca,’’ and freak-show scenarios like armies of cloned Hitlers, people selling their eyeballs on eBay, or warehouses of zombies to supply people with spare organs. Of course, individuals must be protected from identifiable harm, but we already have ample safeguards for the safety and informed consent of patients and research subjects.

Some say that it’s simple prudence to pause and consider the long-term implications of research before it rushes headlong into changing the human condition. But this is an illusion.

First, slowing down research has a massive human cost. Even a one-year delay in implementing an effective treatment could spell death, suffering, or disability for millions of people.

Second, technological prediction beyond a horizon of a few years is so futile that any policy based on it is almost certain to do more harm than good. Contrary to confident predictions during my childhood, the turn of the 21st century did not bring domed cities, jetpack commuting, robot maids, mechanical hearts, or regularly scheduled flights to the moon. This ignorance, of course, cuts both ways: few visionaries foresaw the disruptive effects of the World Wide Web, digital music, ubiquitous smartphones, social media, or fracking. ...

Tuesday, August 11, 2015

Explain it to me like I'm five years old

An MIT Technology Review reporter interviewed me yesterday about my Nautilus Magazine article Super-Intelligent Humans Are Coming. I had to do the interview by gchat because my voice is recovering from a terrible cold and too much yakking with brain scientists at the Allen Institute in Seattle.

I realized I need to find an explanation for the thesis of the article which is as simple as possible -- so that MIT graduates can understand it ;-)

Let me know what you think of the following.
1. Cognitive ability is highly heritable. At least half the variance is genetic in origin.

2. It is influenced by many (probably thousands) of common variants (see GCTA estimates of heritability due to common SNPs). We know there are many because the fewer there are the larger the (average) individual effect size of each variant would have to be. But then the SNPs would be easy to detect with small sample size.

Recent studies with large sample sizes detected ~70 SNP hits, but would have detected many more if effect sizes were consistent with, e.g., only hundreds of causal variants in total.

3. Since these are common variants the probability of having the negative variant -- with (-) effect on g score -- is not small (e.g., like 10% or more).

4. So each individual is carrying around many hundreds (if not thousands) of (-) variants.

5. As long as effects are roughly additive, we know that changing ALL or MOST of these (-) variants into (+) variants would push an individual many standard deviations (SDs) above the population mean. Such an individual would be far beyond any historical figure in cognitive ability. 
Given more details we can estimate the average number of (-) variants carried by individuals, and how many SDs are up for grabs from flipping (-) to (+). As is the case with most domesticated plants and animals, we expect that the existing variation in the population allows for many SDs of improvement (see figure below).
For references and more detailed explanation, see On the Genetic Architecture of Cognitive Ability and Other Heritable Traits.

Monday, August 10, 2015

Tomorrowland

I watched this on the flight back from Asia. It's a kid movie but it operates at more than one level. The girl robot Athena is really fun.




Saturday, August 08, 2015

Deep Learning in Nature

When I travel I often carry a stack of issues of Nature and Science to read (and then discard) on the plane.

The article below is a nice review of the current state of the art in deep neural networks. See earlier posts Neural Networks and Deep Learning 1 and 2, and Back to the Deep.
Deep learning
Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Nature 521, 436–444 (28 May 2015) doi:10.1038/nature14539 
Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
The article seems to give a somewhat, er, compressed, version of the history of the field. See these comments by Schmidhuber:
Machine learning is the science of credit assignment. The machine learning community itself profits from proper credit assignment to its members. The inventor of an important method should get credit for inventing it. She may not always be the one who popularizes it. Then the popularizer should get credit for popularizing it (but not for inventing it). Relatively young research areas such as machine learning should adopt the honor code of mature fields such as mathematics: if you have a new theorem, but use a proof technique similar to somebody else's, you must make this very clear. If you "re-invent" something that was already known, and only later become aware of this, you must at least make it clear later.

As a case in point, let me now comment on a recent article in Nature (2015) about "deep learning" in artificial neural networks (NNs), by LeCun & Bengio & Hinton (LBH for short), three CIFAR-funded collaborators who call themselves the "deep learning conspiracy" (e.g., LeCun, 2015). They heavily cite each other. Unfortunately, however, they fail to credit the pioneers of the field, which originated half a century ago. All references below are taken from the recent deep learning overview (Schmidhuber, 2015), except for a few papers listed beneath this critique focusing on nine items.

1. LBH's survey does not even mention the father of deep learning, Alexey Grigorevich Ivakhnenko, who published the first general, working learning algorithms for deep networks (e.g., Ivakhnenko and Lapa, 1965). A paper from 1971 already described a deep learning net with 8 layers (Ivakhnenko, 1971), trained by a highly cited method still popular in the new millennium. Given a training set of input vectors with corresponding target output vectors, layers of additive and multiplicative neuron-like nodes are incrementally grown and trained by regression analysis, then pruned with the help of a separate validation set, where regularisation is used to weed out superfluous nodes. The numbers of layers and nodes per layer can be learned in problem-dependent fashion.

2. LBH discuss the importance and problems of gradient descent-based learning through backpropagation (BP), and cite their own papers on BP, plus a few others, but fail to mention BP's inventors. BP's continuous form was derived in the early 1960s (Bryson, 1961; Kelley, 1960; Bryson and Ho, 1969). Dreyfus (1962) published the elegant derivation of BP based on the chain rule only. BP's modern efficient version for discrete sparse networks (including FORTRAN code) was published by Linnainmaa (1970). Dreyfus (1973) used BP to change weights of controllers in proportion to such gradients. By 1980, automatic differentiation could derive BP for any differentiable graph (Speelpenning, 1980). Werbos (1982) published the first application of BP to NNs, extending thoughts in his 1974 thesis (cited by LBH), which did not have Linnainmaa's (1970) modern, efficient form of BP. BP for NNs on computers 10,000 times faster per Dollar than those of the 1960s can yield useful internal representations, as shown by Rumelhart et al. (1986), who also did not cite BP's inventors. [ THERE ARE 9 POINTS IN THIS CRITIQUE ]

... LBH may be backed by the best PR machines of the Western world (Google hired Hinton; Facebook hired LeCun). In the long run, however, historic scientific facts (as evident from the published record) will be stronger than any PR. There is a long tradition of insights into deep learning, and the community as a whole will benefit from appreciating the historical foundations. 
One very striking aspect of the history of deep neural nets, which is acknowledged both by Schmidhuber and LeCun et al., is that the subject was marginal to "mainstream" AI and CS research for a long time, and that new technologies (i.e., GPUs) were crucial to its current flourishing in terms of practical results. The theoretical results, such as they are, appeared decades ago! It is clear that there are many unanswered questions concerning guarantees of optimal solutions, the relative merits of alternative architectures, use of memory networks, etc.

Some additional points:

1. Prevalence of saddle points over local minima in high dimensional geometries: apparently early researchers were concerned about incomplete optimization of DNNs due to local minima in parameter space. But saddle points are much more common in high dimensional spaces and local minima have turned out not to be a big problem.

2. Optimized neural networks are similar in important ways to biological (e.g., monkey) brains! When monkeys and ConvNet are shown the same pictures, the activation of high-level units in the ConvNet explains half of the variance of random sets of 160 neurons in the monkey's inferotemporal cortex.

Some comments on the relevance of all this to the quest for human-level AI from an earlier post:
.. evolution has encoded the results of a huge environment-dependent optimization in the structure of our brains (and genes), a process that AI would have to somehow replicate. A very crude estimate of the amount of computational power used by nature in this process leads to a pessimistic prognosis for AI even if one is willing to extrapolate Moore's Law well into the future. [ Moore's Law (Dennard scaling) may be toast for the next decade or so! ] Most naive analyses of AI and computational power only ask what is required to simulate a human brain, but do not ask what is required to evolve one. I would guess that our best hope is to cheat by using what nature has already given us -- emulating the human brain as much as possible.

If indeed there are good (deep) generalized learning architectures to be discovered, that will take time. Even with such a learning architecture at hand, training it will require interaction with a rich exterior world -- either the real world (via sensors and appendages capable of manipulation) or a computationally expensive virtual world. Either way, I feel confident in my bet that a strong version of the Turing test (allowing, e.g., me to communicate with the counterpart over weeks or months; to try to teach it things like physics and watch its progress; eventually for it to teach me) won't be passed until at least 2050 and probably well beyond.
Relevant remarks from Schmidhuber:
[Link] ...Ancient algorithms running on modern hardware can already achieve superhuman results in limited domains, and this trend will accelerate. But current commercial AI algorithms are still missing something fundamental. They are no self-referential general purpose learning algorithms. They improve some system’s performance in a given limited domain, but they are unable to inspect and improve their own learning algorithm. They do not learn the way they learn, and the way they learn the way they learn, and so on (limited only by the fundamental limits of computability). As I wrote in the earlier reply: "I have been dreaming about and working on this all-encompassing stuff since my 1987 diploma thesis on this topic." However, additional algorithmic breakthroughs may be necessary to make this a practical reality.
[Link] The world of RNNs is such a big world because RNNs (the deepest of all NNs) are general computers, and because efficient computing hardware in general is becoming more and more RNN-like, as dictated by physics: lots of processors connected through many short and few long wires. It does not take a genius to predict that in the near future, both supervised learning RNNs and reinforcement learning RNNs will be greatly scaled up. Current large, supervised LSTM RNNs have on the order of a billion connections; soon that will be a trillion, at the same price. (Human brains have maybe a thousand trillion, much slower, connections - to match this economically may require another decade of hardware development or so). In the supervised learning department, many tasks in natural language processing, speech recognition, automatic video analysis and combinations of all three will perhaps soon become trivial through large RNNs (the vision part augmented by CNN front-ends). The commercially less advanced but more general reinforcement learning department will see significant progress in RNN-driven adaptive robots in partially observable environments. Perhaps much of this won’t really mean breakthroughs in the scientific sense, because many of the basic methods already exist. However, much of this will SEEM like a big thing for those who focus on applications. (It also seemed like a big thing when in 2011 our team achieved the first superhuman visual classification performance in a controlled contest, although none of the basic algorithms was younger than two decades: http://people.idsia.ch/~juergen/superhumanpatternrecognition.html)

So what will be the real big thing? I like to believe that it will be self-referential general purpose learning algorithms that improve not only some system’s performance in a given domain, but also the way they learn, and the way they learn the way they learn, etc., limited only by the fundamental limits of computability. I have been dreaming about and working on this all-encompassing stuff since my 1987 diploma thesis on this topic, but now I can see how it is starting to become a practical reality. Previous work on this is collected here: http://people.idsia.ch/~juergen/metalearner.html
See also Solomonoff universal induction. I don't believe that completely general purpose learning algorithms have to become practical before we achieve human-level AI. Humans are quite limited, after all! When was the last time you introspected to learn about the way you learn you learn ...? Perhaps it is happening "under the hood" to some extent, but not in maximum generality; we have hardwired limits.
Do we really need Solomonoff? Did Nature make use of his Universal Prior in producing us? It seems like cheaper tricks can produce "intelligence" ;-)

Tuesday, August 04, 2015

Seattle: quantum thermalization and genomic prediction

I'll be at the Institute for Nuclear Theory at the University of Washington tomorrow to discuss quantum thermalization in heavy ion collisions. Some brief slides.


On Thursday I'll be at the Allen Institute for Brain Science to give a talk (video and slides):
Title:  Genetic Architecture and Predictive Modeling of Quantitative Traits

Abstract: I discuss the application of Compressed Sensing (L1-penalized optimization or LASSO) to genomic prediction. I show that matrices comprised of human genomes are good compressed sensors, and that LASSO applied to genomic prediction exhibits a phase transition as the sample size is varied. When the sample size crosses the phase boundary complete identification of the subspace of causal variants is possible. For typical traits of interest (e.g., with heritability ~ 0.5), the phase boundary occurs at N ~ 30s, where s (sparsity) is the number of causal variants. I give some estimates of sparsity associated with complex traits such as height and cognitive ability, which suggest s ~ 10k. In practical terms, these results imply that powerful genomic prediction will be possible for many complex traits once ~ 1 million genotypes are available for analysis.

Sunday, August 02, 2015

Brooklyn with palm trees

Third wave coffee in Niles Canyon.





Saturday, August 01, 2015

Crossing the Pacific

So long, Hong Kong...



Foo Camp!



Someone is mining ether!

Tuesday, July 28, 2015

HaploSNPs and missing heritability

By constructing haplotypes using adjacent SNPs the authors arrive at a superior set of genetic variables with which to compute genetic similarity. These haplotypes tag rare variants and seem to recover a significant chunk of heritability not accounted for by common SNPs.

See also ref 32: Yang, J. et al. Estimation of genetic variance from imputed sequence variants reveals negligible missing heritability for human height and body mass index. Nature Genetics, submitted
Haplotypes of common SNPs can explain missing heritability of complex diseases (http://dx.doi.org/10.1101/022418)

While genome-wide significant associations generally explain only a small proportion of the narrow-sense heritability of complex disease (h2), recent work has shown that more heritability is explained by all genotyped SNPs (hg2). However, much of the heritability is still missing (hg2 < h2). For example, for schizophrenia, h2 is estimated at 0.7-0.8 but hg2 is estimated at ~0.3. Efforts at increasing coverage through accurately imputed variants have yielded only small increases in the heritability explained, and poorly imputed variants can lead to assay artifacts for case-control traits. We propose to estimate the heritability explained by a set of haplotype variants (haploSNPs) constructed directly from the study sample (hhap2). Our method constructs a set of haplotypes from phased genotypes by extending shared haplotypes subject to the 4-gamete test. In a large schizophrenia data set (PGC2-SCZ), haploSNPs with MAF > 0.1% explained substantially more phenotypic variance (hhap2 = 0.64 (S.E. 0.084)) than genotyped SNPs alone (hg2 = 0.32 (S.E. 0.029)). These estimates were based on cross-cohort comparisons, ensuring that cohort-specific assay artifacts did not contribute to our estimates. In a large multiple sclerosis data set (WTCCC2-MS), we observed an even larger difference between hhap2 and hg2, though data from other cohorts will be required to validate this result. Overall, our results suggest that haplotypes of common SNPs can explain a large fraction of missing heritability of complex disease, shedding light on genetic architecture and informing disease mapping strategies.


The excerpt below is my response to an excellent comment by Gwern:
Your summary is correct, AFAIU. Below is a bit more detail about the 4 gamete test, which differentiates between a recombination event (which breaks the haploblock for descendants of that individual; recombination = scrambling due to sexual reproduction) and a simple mutation at that locus. The goal is to impute identical blocks of DNA that are tagged by SNPs on standard chips.
Algorithm to generate haploSNPs 
... Given two alleles at the haploSNPs and two at the mismatch SNP, a maximum of four possible allelic combinations can be observed. If all four combinations are observed, this indicates that a recombination event is required to explain the mismatch, and the haploSNP will be terminated. If, however, only three combinations are observed, the mismatch may be explained by a mutation on the shared haplotype background. These mismatches are ignored and the haploSNP is extended further. We note that this approach can produce a very large number of haploSNPs and very long haploSNPs that could tag signals of cryptic relatedness. ...

>> This estimated heritability is much closer to the full-strength twin study estimates, showing that a lot of the 'missing' heritability is lurking in the rarer SNPs << 
This was already suspected by some researchers (including me), but the haploSNP results provide support for the hypothesis. It means that, e.g., with whole genomes we could potentially recover nearly all the predictive power implied by classical h2 estimates ...

Sunday, July 26, 2015

Greetings from HK


Meetings with BGI, HKUST, and financiers. Will stop in SV and Seattle (Allen Institute) on the way back.

Thursday, July 23, 2015

Drone Art



I saw this video at one of the Scifoo sessions on drones. Beautiful stuff!

I find this much more pleasing than fireworks. The amount of waste and debris generated by a big fireworks display is horrendous.

Monday, July 20, 2015

What is medicine’s 5 sigma?

Editorial in the Lancet, reflecting on the Symposium on the Reproducibility and Reliability of Biomedical Research held April 2015 by the Wellcome Trust.
What is medicine’s 5 sigma?

... much of the [BIOMEDICAL] scientific literature, perhaps half, may simply be untrue. Afflicted by studies with small sample sizes, tiny effects, invalid exploratory analyses, and flagrant conflicts of interest, together with an obsession for pursuing fashionable trends of dubious importance, [BIOMEDICAL] science has taken a turn towards darkness. As one participant put it, “poor methods get results”. The Academy of Medical Sciences, Medical Research Council, and Biotechnology and Biological Sciences Research Council have now put their reputational weight behind an investigation into these questionable research practices. The apparent endemicity of bad research behaviour is alarming. In their quest for telling a compelling story, scientists too often sculpt data to fit their preferred theory of the world. ...

One of the most convincing proposals came from outside the biomedical community. Tony Weidberg is a Professor of Particle Physics at Oxford. ... the particle physics community ... invests great effort into intensive checking and rechecking of data prior to publication. By filtering results through independent working groups, physicists are encouraged to criticise. Good criticism is rewarded. The goal is a reliable result, and the incentives for scientists are aligned around this goal. Weidberg worried we set the bar for results in biomedicine far too low. In particle physics, significance is set at 5 sigma—a p value of 3 × 10–7 or 1 in 3·5 million (if the result is not true, this is the probability that the data would have been as extreme as they are). The conclusion of the symposium was that something must be done ...
I once invited a famous evolutionary theorist (MacArthur Fellow) at Oregon to give a talk in my institute, to an audience of physicists, theoretical chemists, mathematicians and computer scientists. The Q&A was, from my perspective, friendly and lively. A physicist of Hungarian extraction politely asked the visitor whether his models could ever be falsified, given the available field (ecological) data. I was shocked that he seemed shocked to be asked such a question. Later I sent an email thanking the speaker for his visit and suggesting he come again some day. He replied that he had never been subjected to such aggressive and painful attack and that he would never come back. Which community of scientists is more likely to produce replicable results?

See also Medical Science? and Is Science Self-Correcting?

To answer the question posed in the title of the post / editorial, an example of a statistical threshold which is sufficient for high confidence of replication is the p < 0.5 x 10^{-8} significance requirement in GWAS. This is basically the traditional p < 0.05 threshold corrected for multiple testing of 10^6 SNPs. Early "candidate gene" studies which did not impose this correction have very low replication rates. See comment below for what this implies about the validity of priors based on biological intuition.

I discuss this a bit with John Ioannidis in the video below.


Sunday, July 19, 2015

Technically Sweet

Regular readers will know that I've been interested in the so-called Teller-Ulam mechanism used in thermonuclear bombs. Recently I read Kenneth Ford's memoir Building the H Bomb: A Personal History. Ford was a student of John Wheeler, who brought him to Los Alamos to work on the H-bomb project. This led me to look again at Richard Rhodes's Dark Sun: The Making of Hydrogen Bomb. There is quite a lot of interesting material in these two books on the specific contributions of Ulam and Teller, and whether the Soviets came up with the idea themselves, or had help from spycraft. See also Sakharov's Third Idea and F > L > P > S.

The power of a megaton device is described below by a witness to the Soviet test.
The Soviet Union tested a two-stage, lithium-deuteride-fueled thermonuclear device on November 22, 1955, dropping it from a Tu-16 bomber to minimize fallout. It yielded 1.6 megatons, a yield deliberately reduced for the Semipalatinsk test from its design yield of 3 MT. According to Yuri Romanov, Andrei Sakharov and Yakov Zeldovich worked out the Teller-Ulam configuration in conversations together in early spring 1954, independently of the US development. “I recall how Andrei Dmitrievich gathered the young associates in his tiny office,” Romanov writes, “… and began talking about the amazing ability of materials with a high atomic number to be an excellent reflector of high-intensity, short-pulse radiation.” ...

Victor Adamsky remembers the shock wave from the new thermonuclear racing across the steppe toward the observers. “It was a front of moving air that you could see that differed in quality from the air before and after. It came, it was really terrible; the grass was covered with frost and the moving front thawed it, you felt it melting as it approached you.” Igor Kurchatov walked in to ground zero with Yuli Khariton after the test and was horrified to see the earth cratered even though the bomb had detonated above ten thousand feet. “That was such a terrible, monstrous sight,” he told Anatoli Alexandrov when he returned to Moscow. “That weapon must not be allowed ever to be used.”
The Teller-Ulam design uses radiation pressure (reflected photons) from a spherical fission bomb to compress the thermonuclear fuel. The design is (to quote Oppenheimer) "technically sweet" -- a glance at the diagram below should convince anyone who understands geometrical optics!




In discussions of human genetic engineering (clearly a potentially dangerous future technology), the analogy with nuclear weapons sometimes arises: what role do moral issues play in the development of new technologies with the potential to affect the future of humanity? In my opinion, genetic engineering of humans carries nothing like the existential risk of arsenals of Teller-Ulam devices. Genomic consequences will play out over long (generational) timescales, leaving room for us to assess outcomes and adapt accordingly. (In comparison, genetic modification of viruses, which could lead to pandemics, seems much more dangerous.)
It is my judgment in these things that when you see something that is technically sweet, you go ahead and do it and you argue about what to do about it only after you have had your technical success. -- Oppenheimer on the Teller-Ulam design for the H-bomb.
What is technically sweet about genomics? (1) the approximate additivity (linearity) of the genetic architecture of key traits such as human intelligence (2) the huge amounts of extant variance in the human population, enabling large improvements (3) matrices of human genomes are good compressed sensors, and one can estimate how much data is required to "solve" the genetic architecture of complex traits. See, e.g., Genius (Nautilus Magazine) and Genetic architecture and predictive modeling of quantitative traits.

More excerpts from Dark Sun below.

Enthusiasts of trans-generational epigenetics would do well to remember the danger of cognitive bias and the lesson of Lysenko. Marxian notions of heredity are dangerous because, although scientifically incorrect, they appeal to our egalitarian desires.
A commission arrived in Sarov one day to make sure everyone agreed with Soviet agronomist Trofim Lysenko's Marxian notions of heredity, which Stalin had endorsed. Sakharov expressed his belief in Mendelian genetics instead. The commission let the heresy pass, he writes, because of his “position and reputation at the Installation,” but the outspoken experimentalist Lev Altshuler, who similarly repudiated Lysenko, did not fare so well ...
The transmission of crucial memes from Szilard to Sakharov, across the Iron Curtain.
Andrei Sakharov stopped by Victor Adamsky's office at Sarov one day in 1961 to show him a story. It was Leo Szilard's short fiction “My Trial as a War Criminal,” one chapter of his book The Voice of the Dolphins, published that year in the US. “I'm not strong in English,” Adamsky says, “but I tried to read it through. A number of us discussed it. It was about a war between the USSR and the USA, a very devastating one, which brought victory to the USSR. Szilard and a number of other physicists are put under arrest and then face the court as war criminals for having created weapons of mass destruction. Neither they nor their lawyers could make up a cogent proof of their innocence. We were amazed by this paradox. You can't get away from the fact that we were developing weapons of mass destruction. We thought it was necessary. Such was our inner conviction. But still the moral aspect of it would not let Andrei Dmitrievich and some of us live in peace.” So the visionary Hungarian physicist Leo Szilard, who first conceived of a nuclear chain reaction crossing a London street on a gray Depression morning in 1933, delivered a note in a bottle to a secret Soviet laboratory that contributed to Andrei Sakharov's courageous work of protest that helped bring the US-Soviet nuclear arms race to an end.

Thursday, July 16, 2015

Frontiers in cattle genomics


A correspondent updates us on advances in genomic cattle breeding. See also Genomic Prediction: No Bull and It's all in the gene: cows.
More than a million cattle in the USDA dairy GWAS system (updated with new breeding value predictions weekly), as cost per marker drops exponentially: https://www.cdcb.us/Genotype/cur_freq.html
The NM$ (Net Merit in units of dollars) utility function for selection is more and more sophisticated (able to avoid bad trade-offs from genetic correlations): http://www.ars.usda.gov/research/publications/publications.htm?SEQ_NO_115=310013
Cheap genotyping has allowed mass testing of cows, and made it practical to use dominance in models and to match up semen and cow for dominance synergies and heterosis (the dominance component is small compared to the additive one, as usual: for milk yield 5-7% dominance variance, 21-35% additive): http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0103934
[Note: additive heritability for the traits cattle breeders work on is significantly lower than for cognitive ability.]
Matching mates to reduce inbreeding (without specific markers for dominance effects) by looking at predicted ROH: http://www.ars.usda.gov/research/publications/publications.htm?SEQ_NO_115=294115
Identifying recessive lethals and severe diseases: http://aipl.arsusda.gov/reference/recessive_haplotypes_ARR-G3.html http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0054872
For humans, see Genetic architecture and predictive modeling of quantitative traits.

Monday, July 13, 2015

Productive Bubbles

These slides are from one of the best sessions I attended at scifoo. Bill Janeway's perspective was both theoretical and historical, but in addition we had Sam Altman of Y Combinator to discuss Airbnb and other examples of 2 way market platforms (Uber, etc.) that may be enjoying speculative bubbles at the moment.

See also Andrew Odlyzko (Caltech '71 ;-) on British railway manias for specific cases of speculative funding of useful infrastructure: herehere and here.



Friday, July 10, 2015

Rustin Cohle: True Detective S1 (HBO)



"I think human consciousness is a tragic misstep in evolution. We became too self-aware. Nature created an aspect of nature separate from itself. We are creatures that should not exist by natural law. We are things that labor under the illusion of having a self; an accretion of sensory experience and feeling, programmed with total assurance that we are each somebody, when in fact everybody is nobody."
"To realize that all your life—you know, all your love, all your hate, all your memory, all your pain—it was all the same thing. It was all the same dream. A dream that you had inside a locked room. A dream about being a person. And like a lot of dreams there's a monster at the end of it."
More quotes. More video.

Matthew McConaughey on the role:






McConaughey as Wooderson in Dazed and Confused:

Blog Archive

Labels