On the grounds that "danielpipes.org is clean, consistently formatted, carefully edited and larger than WSJ," a team of three authors from Google and Stanford University have used my website to explore a possible connection between linguistic syntax and web mark-up. To put it more technically,
Spanning decades, Pipes' editorials are mostly in-domain for POS taggers and tree-bank-trained parsers; his recent (internet-era) entries are thoroughly cross-referenced, conveniently providing just the mark-up we hoped to study via uncluttered (printer-friendly) HTML.
Valentin I. Spitkovsky, Daniel Jurafsky, and Hiyan Alshawi, the authors of "Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing," show how web mark-up can be used to advance the state-of-the-art in unsupervised dependency parsing. They presented their discovery of a strong correlation with hierarchical syntactic structure last month at the 48th Annual Meeting of the Association for Computational Linguistics in Uppsala, Sweden. This finding could have broad implications for natural language processing problems, with applications extending well beyond parsing.
Comment: It's an honor to have DanielPipes.org selected for this study, and the honor goes primarily to Grayson Levy, who initiated the idea for this website in 2000, brought it online at the end of that year, and has overseen it ever since. (August 1, 2010)
Feb. 15, 2014 update: DanielPipes.org is also cited in "Domain biased Bilingual Parallel Data Extraction and its Sentence Level Alignment for English-Hindi Pair," by Deepa Gupta, Vani Raveendran and Rahul Kumar of Bangalore writing in the Research Journal of Applied Sciences, Engineering and Technology 7(6), p. 1002.
Daniel pipes corpus (website: https://www.danielpipes.org/) is yet another data set which is a collection of articles that describe Middle East. Originally written in English, it has been translated to 25 other languages including Hindi. For Hindi alone, there exist 322 articles which make approximately 6761 sentence pairs.
Mar. 21, 2014 update: In a study by the Czech group HindEnCorp, Ondřej Bojar, Vojtěch Diatka, Pavel Straňák, Aleš Tamchyna, and Daniel Zeman relied on the 322 articles at DanielPipes.org translated into Hindi.
Mar. 1, 2017 update: In "Problems Encountered in Translating Islamic Related Texts from English into Arabic," Academic Research International, March 2017, Bader S. Dweik and Hiyam M. Khaleel explain their goal in the abstract:
This study aims to explore the problems that experienced translators in Jordan face when translating ideological Islamic-related texts from English into Arabic. To achieve this purpose, the researchers have designed a translation test consisting of 10 extracts with ideological content written by Muslim and non-Muslim writers. A purposive sample of 16 translators was selected to perform the test. The researchers have analyzed the results of the test qualitatively.
They then explain which texts to use for the source language:
The researchers developed a test which embodied two parts. While the first comprised the demographic data of the participants, the second was dedicated to translating statements derived from books, articles and websites (written by Muslim or non-Muslim writers). Three of them were non-Muslim orientalists, known for their controversial and influential writings about Islam, namely George Sale, Bernard Lewis, and Daniel Pipes. The three Muslim ideologists whose ideas about Islam were influential and controversial for both Muslims and non-Muslims, were Abu Alaa Maududi, Sayyed Qutb, and Ali Shariati. The two other writers were Muslim American scholars and writers, namely Imam Zaid Shakir (a specialist in Islamic spirituality) and Laila Ahmed (a specialist in Islam and Islamic feminism).
In case you're curious about the study's results:
The study reveals that the translators have faced the following six problems when rendering the texts; inability to deal with the ideological implications; the ambiguity of some words; the differences between source language (SL) and target language (TL) cultures; the translators' semantic and syntactic mediation; lack of knowledge and the inadequacy of dictionaries.
Apr. 1, 2018 update: Maitry Shukla and Harsh Mehta, "A Survey of Word Reordering Model in Statistical Machine Translation," International Journal on Future Revolution in Computer Science & Communication Engineering4 (2018): 349-53, makes mention of DanielPipes.org.
June 1, 2018 update: So far, my contributions to linguistic have all been via this website, DanielPipes.org. Now, Verena Constanze Jäger in a Ph.D. thesis submitted to the Johannes Gutenberg-Universität in Mainz, "Expressions of Non-Epistemic Modality in American English: A Corpus-Based Study on Variation and Change in the 20th Century," quotes a sentence of mine on television. It's an exchange with Robert MacNeil on the PBS Newshour dated June 20, 1990:
MR-MacNeil: Daniel Pipes, how do you assess the military dangers at the moment?
MR-PIPES: I must say I'm much more optimistic. I really don't see where a war can take place.
July 19, 2018 update: Soravit Changpinyo, Hexiang Hu, and Fei Sha mention DanielPipes.org in their article, "Multi-Task Learning for Sequence Tagging: An Empirical Study," published at the University of Southern California's Student Computing Facility.
Mar. 21, 2019 update: In his MS thesis in Computer Science at COMSATS University Islamabad, "Development of Large Scale English-Urdu Machine Translation Corpus for Statistical and Neural Machine Translation Systems," Moodser Hussain uses DanielPipes.org as one of his main online sources.