Today’s cars with more 100 million lines of code

Driverless cars will only compound this issue: the increasingly complex programming for cars.

New high-end cars are among the most sophisticated machines on the planet, containing 100 million or more lines of code. Compare that with about 60 million lines of code in all of Facebook or 50 million in the Large Hadron Collider.

“Cars these days are reaching biological levels of complexity,” said Chris Gerdes, a professor of mechanical engineering at Stanford University.

The sophistication of new cars brings numerous benefits — forward-collision warning systems and automatic emergency braking that keep drivers safer are just two examples. But with new technology comes new risks — and new opportunities for malevolence.

The article then goes on to discuss two issues: hacking this complex software and regulating it (with the recent VW case serving as a good example). I’d rather the article goes three different directions rather than just highlight what could go wrong:

  1. How exactly do car makers and programmers make sure this all works together? How many people are involved in this? Who coordinates it all? Just putting this all together is quite a task.
  2. Say more about the complexity compared to other items. Based on what was said here, it sounds like this is the most complex mechanical object the typical person interacts with.
  3. The move to driverless cars may just only up the ante. Or, can some of this be reduced if you start with no driver and a fully autonomous system? New codes can tend to simply be built on top of older codes as pieces change but starting anew may make things easier.

Frankly, much of our lives these days is dependent on complex and/or long computer codes. If all that knowledge suddenly disappeared for some reason (perhaps an interesting starting point for a sci fi story), we would have some problems.

Facebook to hold pre-ASA conference

Last year’s ASA meetings included some special sessions on big data and Facebook is hosting a pre-conference this year at the company’s headquarters.

VentureBeat has learned that Facebook is to hold an academics-only conference in advance of the American Sociological Association 2014 Annual Meeting this August in San Francisco.

Facebook will run shuttles from the ASA conference hotel to Facebook’s headquarters in Menlo Park, Calif. According to the company’s event description, the pre-conference focuses on “techniques related to data collection with the advent of social media and increased interconnectivity across the world.”…

According to the event schedule, Facebook will give a demo of its tools and software stack at the conference…

There seems to be a great demand for sociologists who can code. Corey now spends a lot of time hiring fellow sociologists, according to his article. It is also the case in other big companies. In one interview conducted with the London School of Economics, Google’s Vice President Prabhakar Raghavan claimed that he just couldn’t hire enough social scientists.

This is a growing area of employment for sociologists who would benefit from getting access to proprietary yet amazing data and would also have to negotiate different structures in the private technology world versus academia.

Using a sociological approach in “e-discovery technologies”

Legal cases can generate a tremendous amount of documents that each side needs to examine. With new searching technology, legal teams can now go through a lot more data for a lot less money. In one example, “Blackstone Discovery of Palo Alto, Calif., helped analyze 1.5 million documents for less than $100,000.” But within this discussion, the writer suggests that these searches can be done in two ways:

E-discovery technologies generally fall into two broad categories that can be described as “linguistic” and “sociological.”

The most basic linguistic approach uses specific search words to find and sort relevant documents. More advanced programs filter documents through a large web of word and phrase definitions. A user who types “dog” will also find documents that mention “man’s best friend” and even the notion of a “walk.”

The sociological approach adds an inferential layer of analysis, mimicking the deductive powers of a human Sherlock Holmes. Engineers and linguists at Cataphora, an information-sifting company based in Silicon Valley, have their software mine documents for the activities and interactions of people — who did what when, and who talks to whom. The software seeks to visualize chains of events. It identifies discussions that might have taken place across e-mail, instant messages and telephone calls…

The Cataphora software can also recognize the sentiment in an e-mail message — whether a person is positive or negative, or what the company calls “loud talking” — unusual emphasis that might give hints that a document is about a stressful situation. The software can also detect subtle changes in the style of an e-mail communication.

A shift in an author’s e-mail style, from breezy to unusually formal, can raise a red flag about illegal activity.

So this second technique gets branded as “sociological” because it is looking for patterns of behavior and interaction. If you wondered how the programmers set up their code in order to this kind of analysis, it sounds like some academics have been working on the problem for almost a decade:

[A computer scientist] bought a copy of the database [of Enron emails] for $10,000 and made it freely available to academic and corporate researchers. Since then, it has become the foundation of a wealth of new science — and its value has endured, since privacy constraints usually keep large collections of e-mail out of reach. “It’s made a massive difference in the research community,” Dr. McCallum said.

The Enron Corpus has led to a better understanding of how language is used and how social networks function, and it has improved efforts to uncover social groups based on e-mail communication.

Any sociologists involved in this project to provide input on what the programs should be looking for in human interactions?

This sort of analysis software could be very handy for sociological research when one has hundreds of documents or sources to look through. Of course, the algorithms might have be changed for specific projects or settings but I wonder if this sort of software might be widely available in a few years. Would this analysis be better than going through one by one through documents in coding software like Atlas.Ti or NVivo?