Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

mongodb - Spring Data Mongo - apply unique combination fields in embedded document

I'm working on Spring Boot v2.1.3.RELEASE & Spring Data Mongo. In this example, I want to apply uniqueness on email & deptName. The combination of email & deptName must be unique and is there any way to get email out since its repeating in each array object ?

I tried below, but it's not working !

@CompoundIndexes({
    @CompoundIndex(name = "email_deptName_idx", def = "{'email' : 1, 'technologyEmployeeRef.technologyCd' : 1}")
})

Sample Data

{
    "_id" : ObjectId("5ec507c72d8c2136245d35ce"),
    ....
    ....
    "firstName" : "John",
    "lastName" : "Doe",
    "email" : "[email protected]",
    .....
    .....
    .....
    "technologyEmployeeRef" : [ 
        {
            "technologyCd" : "[email protected]",
            "technologyName" : "Advisory",
            ....
            .....
            "Status" : "A"
        }, 
        {
           "technologyCd" : "[email protected]",
           "technologyName" : "Tax",
           .....
           .....
           "Status" : "A"
       }
    ],
    "phoneCodes" : [ 
        "+352"
    ],
    ....
    ....
}

Technology.java

@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
@Document
public class Technology {
    @Indexed(name = "technologyCd", unique = true, sparse = true)
    private String technologyCd;

    @Indexed(name = "technologyName", unique = true, sparse = true)
    private String technologyName;
    private String status;
}

EmployeeTechnologyRef.java

@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
public class EmployeeTechnologyRef {
    private String technologyCd;
    private String primaryTechnology;
    private String status;
}

Employee.java

@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
@Document
@CompoundIndexes({
    @CompoundIndex(name="emp_tech_indx", def = "{'employeeTechnologyRefs.primaryTechnology' : 1, 'employeeTechnologyRefs.technologyCd' : 1}" ,unique = true, sparse = true)
})
public class Employee {
    private String firstName;
    private String lastName;
    private String email;
    private List<EmployeeTechnologyRef> employeeTechnologyRefs;
}

I used below code but its not giving me any error of duplicate. How can we do this ?

Technology java8 = Technology.builder().technologyCd("Java").technologyName("Java8").status("A").build();
Technology spring = Technology.builder().technologyCd("Spring").technologyName("Spring Boot2").status("A").build();
List<Technology> technologies = new ArrayList<>();
technologies.add(java8);
technologies.add(spring);

technologyRepository.saveAll(technologies);

EmployeeTechnologyRef t1 = EmployeeTechnologyRef.builder().technologyCd("Java").primaryTechnology("Y")
        .status("A")
        .build();
EmployeeTechnologyRef t2 = EmployeeTechnologyRef.builder().technologyCd("Spring").primaryTechnology("Y")
        .status("A")
        .build();
List<EmployeeTechnologyRef> employeeTechnologyRefs = new ArrayList<>();
employeeTechnologyRefs.add(t1);
employeeTechnologyRefs.add(t2);
employeeTechnologyRefs.add(t1);

Employee employee = Employee.builder().firstName("John").lastName("Kerr").email("[email protected]")
        .employeeTechnologyRefs(employeeTechnologyRefs).build();
employeeRepository.save(employee);
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In MongoDB, a unique index ensures that a particular value in a field is not present in more than one document. It will not guarantee that a value is unique across an array within a single document. This is explained here in the MongoDB Manual where it discusses unique multikey Indexes.

Thus, a unique index will not satisfy your requirement. It will prevent seperate documents from containing duplicate combinations, but it will still allow a single document to contain duplicate values across an array.

The best option you have is to change your data model so as to split the array of technologyEmployeeRef objects into separate documents. Splitting it up into separate documents will allow you to use a unique index to enforce uniqueness.

The particular implementation that should be taken for this data model change would depend upon your access pattern (which is out of the scope of this question).


One such way this could be done is to create a TechnologyEmployee collection that has all of the fields that currently exist in the technologyEmployeeRef array. Additionally, this TechnologyEmployee collection would have a field, such as email, which would allow you to associate it with a document in the Employee collection.

Sample Employee Document

{
  ....
  ....
  "firstName" : "John",
  "lastName" : "Doe",
  "email" : "[email protected]",
  .....
  .....
  .....
}

Sample EmployeeTechnology Document

{
  "email" : "[email protected]",
  "technologyCd" : "Java",
  "technologyName" : "Java8",
  ....
  .....
  "status" : "A"
}

Index in EmployeeTechnology collection

{'email' : 1, 'technologyCd' : 1}, {unique: true}

The disadvantage of this approach is that you would need to read from two collections to have all of the data. This drawback may not be a big deal if you rarely need to retrieve the data from both collections at the same time. If you do need all the data, it can be sped up through use of indexes. With the indexes, it could be furthered sped up through the use of covered queries.


Another option is to denormalize the data. You would do this by duplicating the Employee data that you need to access at the same time as the Technology data.

Sample Documents

[
  {
    ....
    "firstName" : "John",
    "lastName" : "Doe",
    "email" : "[email protected]",
    .....
    "technologyCd" : "Java",
    "technologyName" : "Java8",
    ....
    "status" : "A"
  },
  {
    ....
    "firstName" : "John",
    "lastName" : "Doe",
    "email" : "[email protected]",
    .....
    "technologyCd" : "Spring",
    "technologyName" : "Spring Boot2",
    ....
    "status" : "A"
  }
]

In this MongoDB blog post,they say that

You’d do this only for fields that are frequently read, get read much more often than they get updated, and where you don’t require strong consistency, since updating a denormalized value is slower, more expensive, and is not atomic.


Or as you've already mentioned, it may make sense to leave the data model as it is and to perform the check for uniqueness on the application side. This could likely give you the best read performance, but it does come with some disadvantages. First, it will slow down write operations because the application will need to run some checks before it can update the database.

It may be unlikely, but there is also a possibility that you could still end up with duplicates. If there are two back-to-back requests to insert the same EmployeeTechnology object into the array, then the validation of the second request may finish (and pass) before the first request has written to the database. I have seen a similar scenario myself with an application I worked on. Even though the application was checking for uniqueness, if a user double-clicked a submit button there would end up being duplicate entries in the database. In this case, disabling the button on the first click drastically reduced the risk. This small risk may be tolerable, depending on your requirements and the impact of having duplicate entries.


Which approach makes the most sense largely depends on your access pattern and requirements. Hope this helps.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...