Theory of Computation survey results


This page contains information regarding the online survey about theory of computation courses, which was held from March 16 to April 1, 2012. There were 77 complete submissions; shortcuts to results:
- Survey set-up
- Summary of results, discussion, and conclusions
- Raw question answers with percentages, sorted by question (exported from LimeSurvey)
- List of ToC topics ordered on being essential according to the survey respondents


Introduction

Theory of Computation (ToC) is generally considered to be among the core courses in a computer science curriculum. There are, however, differences between universities how ToC, or its topics, is incorporated in the degree programme, and also within one university the course offerings can vary over they years due to various circumstances. Variations include offering a single ToC course only, to have also advanced courses with more in-depth coverage of one or more ToC topics, or spreading it out over several courses, such as including automata in a compilers course or complexity with knowledge representation languages, and whether it is suitable for undergraduate or postgraduate curricula.
In addition, there are many anecdotes, such as that it is being threatened to be removed from the curriculum because the material is deemed too hard---reflected in comparatively low pass rates and correspondingly being flagged by the university's structures---or due to low enrolment numbers, among others. This raises several questions regarding the validity of such claims and anecdotes, including what, then, the prevalent curriculum structure is regarding ToC courses and topics, in which it ideally is or should be taught, and whether issues with course offerings are widespread or not. To this end, the current status of theory of computation courses in Computer Science curricula were examined. Here we focus specifically on opinions of ToC, which have been collected through an open survey.


Survey set-up

General set-up
The method chosen for this survey is that of an open online questionnaire for a two-week time period. Both individual email invitations are to be sent to people inside and outside UKZN, including an announcement on the DL mailing list and CAIR members list, and social media will be used as well (Facebook and Google+ accounts), but not the author's blog (which current students read). Respondents can leave their email to be contacted for further, in-depth questions and feedback.

The survey software used is LimeSurvey, because it is free survey software that has the necessary feature of conditional questions and extensive branching. Analysis of the results will be done mainly with the built-in LimeSurvey features and MS Excel.

Survey Questions
The survey itself consists of eight main questions, with conditional questions depending on the answer provided by the respondent. They are summarised here (see results for details).
  1. Should Theory of Computation [roughly: formal languages, automata, complexity] be a course in a CS programme?
    If yes: core or elective, undergrad or postgrad, when in the programme (undergrad (1, 2, 3), hons(4), MSc, PhD)?
  2. Is it taught at your university?
    1. If no: was it, but cancelled? if yes: Why?
    2. If yes: core or elective, when in the programme (undergrad (1, 2, 3), hons(4), MSc, PhD), is it secure/solidly in the programme or threatened to be cancelled, does it cause 'problems' such as being flagged for low pass rates or negative student evaluations? Participation (indication): estimate of average amount of enrolled students in each course offering over the past 3 years (<10; 10-30; 31-60; 61-100; more than 100), is the amount stable or in decline (stable, slight decline, strong decline (>50% fewer students in past 5 years)). First-time pass rate (<20, 20-40, 41-60, 61-80, 80-100%).
  3. Did you ever teach, or are currently teaching, Theory of Computation or Formal languages and automata only or Computability, complexity only?
  4. Did you do Theory of Computation in your degree? If yes: which year of the programme? If no: do you miss it/regret not having done so?
  5. Do you do research in Theory of Computation topics? Do you use Theory of Computation topics in research or at work?
  6. Our survey of syllabi indicates some differences across universities. Please Indicate for the following topics whether you consider the topic 'essential', suitable for an 'extended version' of a ToC course, or 'peripheral/may be skipped'.
    Deterministic Finite Automata (DFA); Non-Deterministic Finite Automata (NFA, epsilon-NFA); Equivalences & conversion DFA, NFA, epsilon-NFA; State minimization; Moore Machines; Mealy Machines; Regular expressions; Equivalences & conversion RE and automata; Regular grammars (RG); Converting RG NFAs; Pumping lemma for Regular languages; Properties of Regular languages Context-Free Languages; Context-free Grammars (definition, ambiguity, simplification, derivations); Chomsky normal form, hierarchy; Push-down Automata (PDA, deterministic and nondeterministic); Pumping lemma for CFLs; Decision properties of CFLs; Closure properties of CFLs; Equivalences & conversion PDA, CFG; Church-Turing thesis; The Turing Machine (basics); TM acceptors; TM transducers; 'Programming tricks' for TM (storage, tracks); Multi-tape TM; Non-deterministic TM; Problems a computer cannot solve; Proving undecidability; Reductions; Diagonalization language; Recursively Enumerable, non-RE, RE but not recursive languages; Universal Turing Machine; Rice's theorem; Cook's theorem; Post correspondence problem; Examples of some undecidable problems/languages; Show problem to be decidable/undecidable (RE, non-RE, RE but not Rec.); Halting problem; P, NP, NP-complete, NP-hard, co-NP; Polynomial time reductions; Show problem to be P, NP, NP-complete, NP-hard, co-NP; Some well-known NP problems (e.g., TSP, SAT, Node cover); PSPACE, EXPTIME, ...; 'Hot/fun topics' and their complexity classes (e.g., games).
  7. Comments the respondent wishes to make, name of the organisation where the respondent works/studies, email (optional)
Limitations of the survey set-up
There are several limitations of the set up of the survey that is expected to affect the responses. First, the distribution of the invitations is skewed toward colleagues who work in similar fields, both regarding individual emails and the distribution lists, most of whom use, or know they benefit from, ToC topics in one way or another in their work. Second, it is an open survey, hence, it is likely to exhibit the tendency that those who are more passionate about ToC will be the main proportion of respondents. Third, the questions set is limited. Notably, we ask principally about the one ToC course option, not about whether, and if so how, ToC topics are or should be split up and taught piecemeal across courses (e.g., automata in a compilers course). Also, other questions could have been included--such as whether they use simulation software and which textbook--but the aim was to keep the survey short so as to get a general impression and not also delve into the methods and techniques of instruction.


Results of the online survey

Characterisation of respondents

The number of completed surveys was 77, of which 58 filled in their affiliation. The respondents are mainly employed at universities and research institutes, and at a few companies. Of the respondents, 12 indicated an academic affiliation in South Africa (SUN, UKZN, UNISA, UP, Wits) and thus the majority of respondents were from around the world, including universities of Illinois, Rutgers [US], Waterloo [CA], Bolzano [IT], Dresden [DE], Geneva [CH], Southampton [UK], Ben Gurion [IL], Bahia Blanca [AR], CENATAV [CU], Pontificia [CL/BR], Sao Paolo, Rio de Janeiro, Sergipe, Espirito Santo, and 8 others from [BR], Macau [CN] and Indonesia [ID]. At least three respondents are employed in industry (Boeing, Google, and SRI International).
35 respondents indicated to have taught ToC, 37 formal languages and automata only, 25 computability and complexity, and 16 closely related courses such as algorithms, logic, and compilers. 41% or the respondents (32) conducts research in ToC topics and 72% (56) use it in research or other work (including 2 from the aforementioned companies).
The majority of completed responses were received after the (delayed) posting of the invitation on the DL mailing list (21 before, 56 after).
The spread of respondents is too broad to merit statistical analyses of the type to check whether responses vary significantly by country, by continent, or by language.

Answers relating to ToC

The raw question answers with percentages, sorted by question (exported from LimeSurvey) are online available, with some minor anonymisation (no affiliation, no email, and when an organisation was mentioned in any of the comments fields, this was replaced with "[Uni1]" etc). Anonymised answers to each survey question in per-question format with answers by respondent as a spreadsheet are available upon request; the following paragraphs summarise the results.

Basic ToC course statistics

76 of the 77 respondents are of the opinion ToC should be in programme, 74 (or 96% of the answers) have it currently in the programme, and 82% had it in their degree programme; of those 14 who did not do ToC, 86% misses not having had that opportunity. The remaining respondent did not do ToC, does not miss it not having done so, said that it should not be in the programme, and never uses it at work.

Considering when it should be taught in the degree programme, and taking note of the 10 comments describing it has been divided over several semesters/courses, there was a clear tendency for undergraduate compared to honours, MSc, or PhD (ratio yes/no 0.69 for undergrad versus 0.45 for honours and 0.33 for MSc/PhD), and the 3rd year in particular (ratio yes/no year1 0.13, year2 1.03, and year3 1.57), which roughly corresponds to when the respondents themselves did ToC, being 5% in year 1, 31% in year 2, 39% in year 3, 16% in year 4/honours, 6% at MSc level and 3% during their PhD studies. Looking at the responses when it is actually taught at the respondent's university, it shows 44% (31 out of 71) in year 3, 30% in year 2, and 14% year 4/honours, where it is a core course in 67 of the 74 cases (90%), and secure in the programme for 57 out of 66 responses (86%).

Only few reasons were provided for "under threat/other", being that it has been removed from some specialisations but not all (in part, due to CS vs IS tension), or threatened due to low enrolment numbers, resulting in one case ToC being taught only every other year. Enrolment numbers vary greatly, from <10 students (6% of respondents) to classes with >100 students (16%), with the main 31-60 students/year (40%) and then 11-30 students (24%). 25% (15 of 60) record a slight decline in enrolment (only 1 noted a strong decline, which is not consistent with data from other respondents with the same recorded affiliation).

Issues with the course

Concerning the stories about issues with a ToC course, the data provided by the respondents do substantiate this to some extent. While 44% of the respondents answered that there are no issues and everything runs smoothly, 32% (20 out of 63) note it causes problems in the academic system each year and another 24% reported that management/student affairs has gotten used to the fact there are problems, i.e., a slight majority of respondents faces issues. Several respondents provided additional information regarding the issues in the comments fields, three of whom mentioned low pass rates, four noted that students struggle because they do not see the usefulness of ToC for their career, two mentioned it also depends on the quality of the teacher, and two face issues due to low enrolment numbers.

Regarding the data, we considered three variables present in the result set that may influence there being issues: pass rates, class size, and content of the course. Course content was extrapolated from the answers given to the ToC topics, where more topics denoted as "essential" is assumed to result in a heavier course load, which need not be the case, and no correlation was found. Data on pass rates, participation, and issues are shown in Table 1. First, note that in 25 of the 56 responses (45%) first-time pass rates remain below 60% and with 80% of the respondents, the pass rate remains below 80%. Second, there is no clear trend between pass rate, class size and issues, except that pass rate 41-60% and class size 31-60 raise comparatively more issues with 72% and 64% of the reported instances; n is to small to draw any conclusions for the other combinations. The correlation between pass rate and issues is 0.79. Assessing correlation for class size is meaningless, as the reasons for issues are known to be different, given the information provided by the respondents: issues and low class size are due to low enrolment, whereas larger class sizes have other, not documented, issues (although with the limited data of this survey, the correlation between pass rate and class size is 0.58, hence deserves further investigation).

Table 1. Reported issues (i), count of pass rates (pr), count of class size (cs), and where two variables hold. (Note: in some cases, insufficient data was available; e.g., for i&pr<20, one had no issues, one left the 'issues' question blank, and one did have issues each year)
No. of i for the pr count of pr count of cs No. of i for the cs
pr <20% 1 3 4 1 cs <10
pr 21-40% 4 4 15 6 cs 11-30
pr 41-60% 13 18 25 16 cs 31-60
pr 61-80% 7 20 8 3 cs 61-100
pr 81-100 4 11 10 7 cs >100












ToC topics

The final set of main questions concerned the topics that should be part of a ToC course. 46 were listed and for each [essential/extended/peripheral/no answer] could be given. The responses were analysed in two different ways: calculating the percentage of a response value out of the total responses given and examine where, say, >70% has a value 'essential', and by assigning values to the responses (essential = 3, extended = 2, peripheral = 1) and ordering topics according to the average over the given answers. The order of the topics differ slightly, but positions vary by a few places only; or: there is a consensus about which topics are important irrespective of the value and cut-off points. See the complete ordered list of ToC topics.

In short, the top 15 'essential' topics in descending order of average, ranging from 2.99 to 2.68 (or: 99-72% of the answers had 'essential'), are: Regular expressions; Turing Machine; Deterministic Finite Automata (DFA); Context-free Grammars; Non-Deterministic Finite Automata (NFA, epsilon-NFA); Equivalences & conversion RE and automata; Problems a computer cannot solve; Halting Problem; Properties of Regular languages; Examples of some undecidable problems/languages; Computability and decidability; Church-Turing Thesis; Regular Grammars; P, NP, NP-complete, NP-hard, co-NP; Universal Turing Machine; and Equivalences & conversion DFA, NFA, epsilon-NFA.

Opinions vary regarding extended or peripheral topics, which can be observed by comparing the tail end of the topics ordered according to average and those with the highest percentages answered with 'In an extended course'. In descending order of average, ranging from 2.36 to 1.75 (in the range of 54-16% 'essential'), the bottom 15 topics are: Pumping lemma for CFLs; State minimization; Decision properties of CFLs; Closure properties of CFLs; Cook's theorem; Diagonalization language; Post correspondence problem; PSPACE, EXPTIME, ...; Multi-tape TM; Rice's theorem; TM transducers; 'programming tricks' for TM (storage, tracks); 'hot/fun topics' and their complexity classes (e.g., games); Moore Machines; Mealy Machines.

Topics that received most (50-60%) 'In an extended course' are: TM transducers; Post correspondence problem; PSPACE, EXPTIME, ...; 'hot/fun topics' and their complexity classes (e.g., games); 'programming tricks' for TM (storage, tracks); and Cook's theorem.

Four respondents used the comments field to add other topics, being: (1) weighted automata and transducers with applications in a basic course and tree automata in an extended course, (2) Quantum Turing Machine, (3) Savitch Theorem PSPACE-completeness, and (4) parallel complexity classes, PRAM NC P-complete.


Discussion of the survey results

The responses of the survey--77 being a substantial amount for an open survey--show an overwhelming agreement about the need for a ToC course in a CS degree programme, regardless whether they have done the course themselves, teach or have taught it or a similar course, or conduct research in it, or do not us it at all (12 answered "no" to research and "no" to use). As such, it re-confirms ToC's rightful place in the curriculum (regardless whether it is offered as one course or spread over multiple courses).

On the one hand, on an absolute scale, that some 56% faces issues with ToC courses is substantial. On the other hand, one could argue that it is 'only' about over half of the respondents who noted persistent issues with the course, and therefore it deserves a comparative analysis to uncover what the other half does so as to not have such issues. Comments in the survey and offline by survey respondents suggest it may help to demonstrate better the link to applicability of ToC topics in the students' prospective career, have experienced good teachers, and appropriate preparation in prior courses to increase the pass rates. Perhaps class size also affects it (which it is known to do in the general case), but this will have to be investigated properly for ToC. Further, it might be related to the quantity and depth of material covered in a ToC course with respect to nominal course load. It has to be noted that occasionally even with a 80-100% pass rate and no low enrolment the 'gotten used to the issues' was selected, and vv., with a 41-60% pass rate that everything runs smoothly, indicating that having issues might also be relative to a particular university culture and expectations.

Concerning topics of a ToC course, it is perceived decidedly that formal languages, automata, Turing machines, complexity, computability and decidability themes form part of one coherent offering, but that the detail of the sub-topics covered may vary. For instance, including Turing machines in a basic ToC course, but transducers, storage and tracks only in an extended or advanced ToC course, including DFAs and NFAs, but not Mealy and Moore machines, and CFGs, but not decision and closure properties.

One has to recollect the survey announcement and (self-)characterisation of the respondents. It is perhaps a bit skewed toward ToC as necessity and computability and decidability among the most important topics resulting from where the survey was announced, which is yet to be cross-checked with a survey on ToC syllabi around the world. On the upside, the respondents come from each continent of the world and from many different universities, therewith providing a valuable snapshot of various aspects of ToC, in particular regarding the basic ToC course statistics, perceived importance of topics, and data about prevalence of issues.


Conclusions

The survey responses from all over the world show an overwhelming agreement that ToC should be taught and a majority prefers to have it in the 2nd or 3rd year in an undergraduate programme. It is taught at most of the institutions that the respondents are affiliated with, and the course is mostly solidly in the programme as a core course. About half of the respondents note there are issues with the course, for various reasons, including, but not limited to, low pass rates and low enrolment (although enrolmentenrolment numbers vary greatly). Roughly half observe first-time pass rates below 60%, and for only 20% the pass rate exceedEnrolments 80%. Whilst noting that several respondent spread ToC content over more than one course or integrate it with other courses, there is agreement on the typical topics that are considered as essential to ToC, covering regular and context-free languages, automata, Turing machines, undecidability, computability and complexity, where the subtopics covered vary.
The survey indicates some points for further inquiry, such as comparing the 'what should be in the course' with extant ToC course syllabi at different universities (as approximation of which topics are actually covered), a more detailed survey on the issues and on what the differences in teaching or in educational environment are to shed light on why about half have issues and the others do not.


Acknowledgements
I am grateful for all the people who took the time to fill in the survey, and especially those who provided additional offline/email feedback and information. I would like to thank Leonard Els of the SSCU for setting up the survey software.