如何:查询包含指定单词集的句子(LINQ)(Visual Basic)

此示例演示如何在文本文件中查找包含每个指定单词集匹配项的句子。 尽管本示例中硬编码了搜索词数组,但它也可以在运行时动态填充。 在此示例中,查询返回包含单词“Historically”、“data”和“integrated”的句子。

示例:

Class FindSentences

    Shared Sub Main()
        Dim text As String = "Historically, the world of data and the world of objects " &
        "have not been well integrated. Programmers work in C# or Visual Basic " &
        "and also in SQL or XQuery. On the one side are concepts such as classes, " &
        "objects, fields, inheritance, and .NET Framework APIs. On the other side " &
        "are tables, columns, rows, nodes, and separate languages for dealing with " &
        "them. Data types often require translation between the two worlds; there are " &
        "different standard functions. Because the object world has no notion of query, a " &
        "query can only be represented as a string without compile-time type checking or " &
        "IntelliSense support in the IDE. Transferring data from SQL tables or XML trees to " &
        "objects in memory is often tedious and error-prone."

        ' Split the text block into an array of sentences.
        Dim sentences As String() = text.Split(New Char() {".", "?", "!"})

        ' Define the search terms. This list could also be dynamically populated at run time
        Dim wordsToMatch As String() = {"Historically", "data", "integrated"}

        ' Find sentences that contain all the terms in the wordsToMatch array
        ' Note that the number of terms to match is not specified at compile time
        Dim sentenceQuery = From sentence In sentences
                            Let w = sentence.Split(New Char() {" ", ",", ".", ";", ":"},
                                                   StringSplitOptions.RemoveEmptyEntries)
                            Where w.Distinct().Intersect(wordsToMatch).Count = wordsToMatch.Count()
                            Select sentence

        ' Execute the query
        For Each str As String In sentenceQuery
            Console.WriteLine(str)
        Next

        ' Keep console window open in debug mode.
        Console.WriteLine("Press any key to exit.")
        Console.ReadKey()
    End Sub

End Class
' Output:
' Historically, the world of data and the world of objects have not been well integrated

查询的工作原理是先将文本拆分为句子,然后将句子拆分为包含每个单词的字符串数组。 对于每个数组,该Distinct方法将删除所有重复的单词,然后查询接着对单词数组和Intersect数组执行wordsToMatch操作。 如果相交数与 wordsToMatch 数组的计数相同,将在单词中找到所有单词并返回原始句子。

在调用 Split中,标点符号用作分隔符,以便将其从字符串中删除。 如果你没有不这样做,则假如你有一个字符串 “Historically,”,该字符串不会与 wordsToMatch 数组中的“Historically”匹配。 可能需要使用其他分隔符,具体取决于源文本中找到的标点符号类型。

编译代码

创建 Visual Basic 控制台应用程序项目,其中包含 Imports System.Linq 命名空间的语句。

另请参阅