A "data problem"
On Jan 8, Fritz Newmeyer gave a very interesting talk at the University of Washington about the lack of evidence for a particular parameter from Principles and Parameters theory. As I understood it, the main points of his talk were first that when parameters from P&P theory are tested against a wide variety of languages, the correlations they are meant to capture tend not to hold up, but also that it can be very difficult to say this for sure, because of course multiple parameters can interact, obscuring the functioning of the parameter of interest from the purview relatively superficial surveys.
In this context, Newmeyer mentioned what he characterized as a "data problem": Every descriptive linguist and every typologist is working from their own interpretation of such fundamental concepts as "adjective" or "subject" or "case". This problem struck me as just the kind of problem that a full-fledged cyberinfrastructure for our field could (and eventually should) address. Furthermore, there are at least two ways in which cyberinfrastructure can help here. First is through standardization. To the extent that resources like the GOLD ontology catch on, linguists can at least "opt in" to linking their terminology to the ontology, and this should improve comparability across studies.
The second is through publication and aggregation of data: If the linguists that Newmeyer refers to are empirical linguists, then their definitions of these concepts ought to be grounded in linguistic facts (primarily facts about distribution of formatives or meanings of utterances). If the data behind analyses were published along with the analyses (in accessible, standards-compliant ways, with the relevant annotations included), then it ought to be possible to algorithmically check the compatibility of different uses of the same term, or at least for the interested linguist to "drill down" to get more information about the use of the terms in that particular work.